[00:46] http://arts.nationalpost.com/2014/01/10/the-book-closes-on-a-golden-age/ [03:42] FINISHED --2014-01-13 20:42:32-- [03:42] Total wall clock time: 6d 4h 42m 3s [03:42] Downloaded: 70061 files, 269G in 5d 11h 20m 3s (596 KB/s) [03:42] ha ha, these never stop being hilarious. [04:05] DFJustin, "I’m happy to see a lot of [these] books go to the landfill" that guy says… that attitude should rot in hell [04:05] … umm to be blunt [04:05] I hope he gives the books to IA :) [04:06] would be a better home for them [04:06] than someone who thinks like that [04:06] [05:21] A rant? In #archiveteam? Well I never [07:47] kyan__: jesus that guy is bitter [08:59] Look what they made me do, Steiner... https://archive.org/search.php?query=creator%3A%22Rudolf+Steiner%22 [09:04] "Instead, he’s been driving into the city three days a week to manage the store": three days a week opening? I doubt that's an effective way to do commerce [12:07] I have a personal interest in erotic literature and the sites that collect them. I am looking to expand my collection and this neatly dovetails with artchieteams interest in these sites. Being a programmer puts me in the positionof being able to do something: specifically I want to make an up to date archive of asstr.org. [12:08] I've attempted this earlier in the year using wget on the http version of the site, but the results I got were only partial and it was a bit of a clusterfuck. [12:09] I'm wondering if any y'all got advice on how I should go about this. For reference my primary os is ubuntu raring 13.04 and I'm a ruby programmer, but I'm open to anything that will work better than wget [12:11] of course if I can get this to happen successfully, I'd be happy to put up a torrent - thought alas my regular upstream is only 100KB/s [12:13] also: I'm highly interested in doing the same for literotica.com and storiesonline.net [12:22] so, if you want stuff to be in the Interner Archive, you'll need to make WARCs of it- wget-warc is probably the easiest one: http://www.archiveteam.org/index.php?title=Wget , but you should also take a look at http://www.archiveteam.org/index.php?title=ArchiveBot which greatly simplifies things once it is running [12:23] dashcloud, thanks, having a look at those links now [12:25] ok, so looks like wget may be the way to go after all... well however I decide to do it I'll make it output warcs to make it easier for y'all [12:26] looking at archivebot, it seems like it's for smaller sites? my (limited) dump that I've already done is 400,000 files+ [12:30] eugh that's right, they hide all the www folders if you access it via ftp [12:32] hmm, as long as I take a dump of the ftp site _and_ the web site then it should be complete [12:45] I might sleep on this and try to attack the problem tomorrow night. But this is something I'm really keen on, so expect to see me back here ;) [15:15] for ftp sites I've had best results with lftp [16:58] Downloaded: 394892 files, 37G in 2d 22h 18m 51s (154 KB/s) [16:58] FINISHED --2014-01-14 12:03:47-- [16:58] Total wall clock time: 7d 4h 56m 16s [16:58] Thank youuuuuuuuuuu slow connection. [17:07] you're welcome! [17:18] It's nvg.org. [17:18] It's a really nice collection of other FTP sites, as per my blog entry. [17:36] http://www.operationwardiary.org/ - transcribe WWI British regimental diaries [17:43] so, I'm probably going to split off the CD-ROM cover disks into a separate sub-collection of cdbbsarchive, or maybe up into a under-software one [17:46] there are many indeed, and a lot of effort worth some visibility [17:54] personally I would have it as a sub-collection of cdbbsarchive, and move pc_cdrom to under software [17:55] SketchCow: great news! I have a working DVD drive hooked up now, so I can continue making images of a bunch of shareware-ish discs I have laying around, probably [17:55] SketchCow: relatedly, I think the few discs that I already uploaded aren't in that collection yet [17:55] Oh good. [17:56] shall I dig up the URLs? [17:56] actually, that was easier than I expected [17:57] https://archive.org/details/InteraktiefSpellenDeel1, https://archive.org/details/ComputerEasyMagazineDiscFebruary2003, https://archive.org/details/CompuKidsCDRom2001 [17:57] interaktiefspellen is a little broken, but I managed to mend most of the damage [17:59] (as noted in the description, also) [17:59] turns out that whiteboard marker and white stickers are actually a really good way to make several damaged discs mostly readable again when there's foil damage [17:59] severely * [18:28] hello [18:46] https://archive.org/details/coverdiscs [20:57] SketchCow: just know i have more coverdisks on cdbbsarchive [20:58] ok looks like game.exe is all in coverdisk collection [20:59] also if your putting tons of my stuff in to coverdisk can i have access [20:59] cause if its not going to be in cdbbsarchive then i will not have access to it [21:07] You have it [21:14] thanks [21:16] SketchCow: right now i'm grabbing cnn all access podcasts [21:17] that podcast has ended in 2008 so its going up first [21:19] also based on the xml data for cnn videos i think are mostly under 20mb [21:20] also i was able to push it up to 640x360 video res [21:22] i'm also grabbing stuff like student news podcast [22:44] relevant to SketchCow 's latest blog post about FTP sites: https://www.piratepad.ca/p/old-ftp-list a list of noteworthy FTP sites from 1996 (from the book Internet Games Directory, also available on IA) [22:49] I'm happy to use this as a hitlist. [23:16] Is Alex Handy around [23:16] That'd be.. handy. [23:16] * ersi lets himself out [23:28] probably all the URLs in that book (which was way ahead of its time and actually included an ebook made up of HTML pages) would probably be good archive candidates [23:49] SketchCow: could you please move http://archive.org/details/2013.09.bbs.bajer.cz into ftpsites ? thx [23:51] You got it. [23:51] Want access? [23:51] that'd be great [23:52] Done [23:52] rad [23:52] now uploading ftp.redcom.ru, 20G [23:53] Let's blow people away [23:54] wooooo [23:55] also I have ftp.oracle.com ftp.funet.fi ftp.3gpp.org [23:55] I decided, if it isn't obvious, we're just going to download all FTP sites. [23:55] sounds good [23:55] Fuck it, let's dupe it [23:55] We'll move to per-ip scanning soon [23:55] We're like that. [23:55] why not [23:56] 5.8G vim [23:56] root@teamarchive0:/1/FTPSITE/ftp.icm.edu.pl/pub# du -sh vim [23:56] vim. [23:56] VIM. [23:56] 6gb. [23:57] dag [23:57] That shit cray [23:57] I'm sure the rest are equivalently insane. [23:59] hm, where the fuck did my grab of ftp.qwest.net go [23:59] qwest recently-bought-by-centurylink [23:59] or did I not get that in time [23:59] :|