[00:05] bsmith093: I've got no upstream to upload a grab of this [00:06] it's > 650GB so far [02:37] it's at least 3TB [03:07] ok then whoever owns an ISP please grab this massive thing http://bofh.nikhef.nl/events/ [03:09] eh, it's not *that* big ;) [04:15] also it's already kind of on it's way [04:15] OHM/ is almost completely ingested to IA and I got HAR/ laying about, "just" need to upload 'em [08:07] is last.fm still dying? [08:07] that is, really going down [08:08] I'm in denial [08:28] WAH? Last.fm dying? [08:59] bsmith093: I'm also checking your website [09:02] arkiver: i have completely forgotten ... what website? [09:02] bsmith093: http://bofh.nikhef.nl/events/ [09:03] going really fast [09:03] around 50-100 links per second [09:03] biggest file yet is 44357368943 bytes [09:03] arkiver: oh right that... so what are you checking it with? I'd love to know how its going that fast [09:04] Xenu [09:04] I'm using that one now for everything I do [09:04] I'm first checking every website I'm going to download [09:04] and then download the individual links [09:05] that way archiving a website is going A LOT faster [09:05] (did full warhammeronline.com in less then 45 minutes...) [09:11] bsmith093: already discovered almost 100000 links [09:13] arkiver: so how do you knoe how big its going to get beforehand? with Xenu? [09:13] know [09:13] I see it has already discovered 100000 urls [09:13] and I can see the size of files and folders it has already crawled [09:14] biggest file yet is around 40 GB [09:14] http://bofh.nikhef.nl/events/overig/28c3-bonustracks/queergeekspanelhq.mov [09:17] 110000 [09:19] 140000 [09:19] wow [09:19] still rising... [09:23] bsmith093: 160000 urls... [09:39] bsmith093: almost done now [09:52] arkiver: if it was an ftp site, I'd just dump it into filezilla and check the queue size. [09:52] bsmith093: ah, yeah [09:52] bsmith093: still far from finished... [09:52] discovered a lot more urls [09:52] so much easier [09:52] over 300000 urls now [09:53] btw I can do the biggest part of the website [09:53] what are the specs of the thing you're running this on [09:53] the computer I'm using now? [09:54] Intel Core i5-4570 [09:54] NVIDIA GeForce GTX 760 [09:54] 16 GB RAM [09:54] 128 GB SDRAM [09:55] those are the most important number I think [09:55] ummm, whats sdram? [09:55] oh oh oops [09:55] SSD I mean [09:55] 128 GB SSD [09:56] 16 GB DDR3 SDRAM (=RAM) [09:56] ah, well that kicks the crap out of my Dell vostro 1710 2GB ram 320 Gb hd setup [09:56] checking an average of 81 links per second [09:56] ah yeah [09:56] for this you do need a lot of ram [09:56] I also got 17 TB of external space here [09:58] in *what*, a rack mounted server cluster?!?! [09:59] lol [09:59] just 6 harddrives sitting next to each other [09:59] :P [10:44] storage is cheap nowadays [12:05] still going... [12:05] 800000 links now [14:01] etsi.org/deliver/ done. [14:01] Took 25 hours total [14:01] https://web.archive.org/web/20131228131746/http://www.etsi.org/deliver/ [14:01] 94853 files [14:02] 31-33 GB [14:14] www.ftp-sites.org done [14:14] took 4:30 minutes [19:32] arkiver: I did https://archive.org/details/etsi_standards earlier this year [19:32] * xmc nods [19:35] xmc: great! now we have both... :) [19:35] I think it is also important to have the website for those documents saved [19:35] :) [19:35] yeah [21:58] bsmith093: took a little longer then expected but i think it is finished now... [22:18] bsmith093: making sure it's finished... [22:27] this item should be moved to cdbbsarchive collection: https://archive.org/details/cdrom-maximum-cd-2007-12 [22:27] for a min i thought it was one of my items [22:28] since its copying my way of uploading of [22:28] *it [22:29] bsmith093: starting to calculate how big the website is... and that is EXTREMELY BIG [22:34] so i dd the 2007-12 of maximum pc cd [22:34] and the md5sum is the same [22:35] who owns mithrandiragain@myopera.com email address? [22:37] he is in archiveteam from what i can tell by his uploads