#archiveteam 2014-01-14,Tue

↑back Search

Time Nickname Message
00:46 πŸ”— DFJustin http://arts.nationalpost.com/2014/01/10/the-book-closes-on-a-golden-age/
03:42 πŸ”— SketchCow FINISHED --2014-01-13 20:42:32--
03:42 πŸ”— SketchCow Total wall clock time: 6d 4h 42m 3s
03:42 πŸ”— SketchCow Downloaded: 70061 files, 269G in 5d 11h 20m 3s (596 KB/s)
03:42 πŸ”— SketchCow ha ha, these never stop being hilarious.
04:05 πŸ”— kyan_lapt DFJustin, "IҀ™m happy to see a lot of [these] books go to the landfill" that guy saysҀ¦ that attitude should rot in hell
04:05 πŸ”— kyan_lapt Ҁ¦ umm to be blunt
04:05 πŸ”— kyan_lapt I hope he gives the books to IA :)
04:06 πŸ”— kyan_lapt would be a better home for them
04:06 πŸ”— kyan_lapt than someone who thinks like that
04:06 πŸ”— kyan_lapt </rant>
05:21 πŸ”— SketchCow A rant? In #archiveteam? Well I never
07:47 πŸ”— joepie91 kyan__: jesus that guy is bitter
08:59 πŸ”— Nemo_bis Look what they made me do, Steiner... https://archive.org/search.php?query=creator%3A%22Rudolf+Steiner%22
09:04 πŸ”— Nemo_bis "Instead, heҀ™s been driving into the city three days a week to manage the store": three days a week opening? I doubt that's an effective way to do commerce
12:07 πŸ”— blit I have a personal interest in erotic literature and the sites that collect them. I am looking to expand my collection and this neatly dovetails with artchieteams interest in these sites. Being a programmer puts me in the positionof being able to do something: specifically I want to make an up to date archive of asstr.org.
12:08 πŸ”— blit I've attempted this earlier in the year using wget on the http version of the site, but the results I got were only partial and it was a bit of a clusterfuck.
12:09 πŸ”— blit I'm wondering if any y'all got advice on how I should go about this. For reference my primary os is ubuntu raring 13.04 and I'm a ruby programmer, but I'm open to anything that will work better than wget
12:11 πŸ”— blit of course if I can get this to happen successfully, I'd be happy to put up a torrent - thought alas my regular upstream is only 100KB/s
12:13 πŸ”— blit also: I'm highly interested in doing the same for literotica.com and storiesonline.net
12:22 πŸ”— dashcloud so, if you want stuff to be in the Interner Archive, you'll need to make WARCs of it- wget-warc is probably the easiest one: http://www.archiveteam.org/index.php?title=Wget , but you should also take a look at http://www.archiveteam.org/index.php?title=ArchiveBot which greatly simplifies things once it is running
12:23 πŸ”— blit dashcloud, thanks, having a look at those links now
12:25 πŸ”— blit ok, so looks like wget may be the way to go after all... well however I decide to do it I'll make it output warcs to make it easier for y'all
12:26 πŸ”— blit looking at archivebot, it seems like it's for smaller sites? my (limited) dump that I've already done is 400,000 files+
12:30 πŸ”— blit eugh that's right, they hide all the www folders if you access it via ftp
12:32 πŸ”— blit hmm, as long as I take a dump of the ftp site _and_ the web site then it should be complete
12:45 πŸ”— blit I might sleep on this and try to attack the problem tomorrow night. But this is something I'm really keen on, so expect to see me back here ;)
15:15 πŸ”— balrog for ftp sites I've had best results with lftp
16:58 πŸ”— SketchCow Downloaded: 394892 files, 37G in 2d 22h 18m 51s (154 KB/s)
16:58 πŸ”— SketchCow FINISHED --2014-01-14 12:03:47--
16:58 πŸ”— SketchCow Total wall clock time: 7d 4h 56m 16s
16:58 πŸ”— SketchCow Thank youuuuuuuuuuu slow connection.
17:07 πŸ”— xmc you're welcome!
17:18 πŸ”— SketchCow It's nvg.org.
17:18 πŸ”— SketchCow It's a really nice collection of other FTP sites, as per my blog entry.
17:36 πŸ”— sep332 http://www.operationwardiary.org/ - transcribe WWI British regimental diaries
17:43 πŸ”— SketchCow so, I'm probably going to split off the CD-ROM cover disks into a separate sub-collection of cdbbsarchive, or maybe up into a under-software one
17:46 πŸ”— Nemo_bis there are many indeed, and a lot of effort worth some visibility
17:54 πŸ”— DFJustin personally I would have it as a sub-collection of cdbbsarchive, and move pc_cdrom to under software
17:55 πŸ”— joepie91 SketchCow: great news! I have a working DVD drive hooked up now, so I can continue making images of a bunch of shareware-ish discs I have laying around, probably
17:55 πŸ”— joepie91 SketchCow: relatedly, I think the few discs that I already uploaded aren't in that collection yet
17:55 πŸ”— SketchCow Oh good.
17:56 πŸ”— joepie91 shall I dig up the URLs?
17:56 πŸ”— joepie91 actually, that was easier than I expected
17:57 πŸ”— joepie91 https://archive.org/details/InteraktiefSpellenDeel1, https://archive.org/details/ComputerEasyMagazineDiscFebruary2003, https://archive.org/details/CompuKidsCDRom2001
17:57 πŸ”— joepie91 interaktiefspellen is a little broken, but I managed to mend most of the damage
17:59 πŸ”— joepie91 (as noted in the description, also)
17:59 πŸ”— joepie91 turns out that whiteboard marker and white stickers are actually a really good way to make several damaged discs mostly readable again when there's foil damage
17:59 πŸ”— joepie91 severely *
18:28 πŸ”— asie hello
18:46 πŸ”— SketchCow https://archive.org/details/coverdiscs
20:57 πŸ”— godane SketchCow: just know i have more coverdisks on cdbbsarchive
20:58 πŸ”— godane ok looks like game.exe is all in coverdisk collection
20:59 πŸ”— godane also if your putting tons of my stuff in to coverdisk can i have access
20:59 πŸ”— godane cause if its not going to be in cdbbsarchive then i will not have access to it
21:07 πŸ”— SketchCow You have it
21:14 πŸ”— godane thanks
21:16 πŸ”— godane SketchCow: right now i'm grabbing cnn all access podcasts
21:17 πŸ”— godane that podcast has ended in 2008 so its going up first
21:19 πŸ”— godane also based on the xml data for cnn videos i think are mostly under 20mb
21:20 πŸ”— godane also i was able to push it up to 640x360 video res
21:22 πŸ”— godane i'm also grabbing stuff like student news podcast
22:44 πŸ”— dashcloud relevant to SketchCow 's latest blog post about FTP sites: https://www.piratepad.ca/p/old-ftp-list a list of noteworthy FTP sites from 1996 (from the book Internet Games Directory, also available on IA)
22:49 πŸ”— SketchCow I'm happy to use this as a hitlist.
23:16 πŸ”— SketchCow Is Alex Handy around
23:16 πŸ”— ersi That'd be.. handy.
23:16 πŸ”— * ersi lets himself out
23:28 πŸ”— dashcloud probably all the URLs in that book (which was way ahead of its time and actually included an ebook made up of HTML pages) would probably be good archive candidates
23:49 πŸ”— xmc SketchCow: could you please move http://archive.org/details/2013.09.bbs.bajer.cz into ftpsites ? thx
23:51 πŸ”— SketchCow You got it.
23:51 πŸ”— SketchCow Want access?
23:51 πŸ”— xmc that'd be great
23:52 πŸ”— SketchCow Done
23:52 πŸ”— xmc rad
23:52 πŸ”— xmc now uploading ftp.redcom.ru, 20G
23:53 πŸ”— SketchCow Let's blow people away
23:54 πŸ”— xmc wooooo
23:55 πŸ”— xmc also I have ftp.oracle.com ftp.funet.fi ftp.3gpp.org
23:55 πŸ”— SketchCow I decided, if it isn't obvious, we're just going to download all FTP sites.
23:55 πŸ”— xmc sounds good
23:55 πŸ”— SketchCow Fuck it, let's dupe it
23:55 πŸ”— SketchCow We'll move to per-ip scanning soon
23:55 πŸ”— SketchCow We're like that.
23:55 πŸ”— xmc why not
23:56 πŸ”— SketchCow 5.8G vim
23:56 πŸ”— SketchCow root@teamarchive0:/1/FTPSITE/ftp.icm.edu.pl/pub# du -sh vim
23:56 πŸ”— SketchCow vim.
23:56 πŸ”— SketchCow VIM.
23:56 πŸ”— SketchCow 6gb.
23:57 πŸ”— xmc dag
23:57 πŸ”— SketchCow That shit cray
23:57 πŸ”— SketchCow I'm sure the rest are equivalently insane.
23:59 πŸ”— xmc hm, where the fuck did my grab of ftp.qwest.net go
23:59 πŸ”— xmc qwest recently-bought-by-centurylink
23:59 πŸ”— xmc or did I not get that in time
23:59 πŸ”— xmc :|

irclogger-viewer