[00:53] i'm finally finding old digital avenue files [01:02] :o [01:05] there like long ads [01:05] 90 seconds to 3 mins [01:11] i'm attempting to re-acquire the textfiles dump [01:11] i have the extracted files on a spare drive but not the 7z [01:12] also, the extant copy is compromised as my antivirus got carried away with the source code/etc. to MS-DOS era viruses stored in the dump [01:12] (not to mention moving ~8gb of plaintext from one drive to another would NOT be fun) [01:14] http://archive.org/details/textfiles-dot-com-2011 This guy [01:19] on a side note, i love the wiki's page on the Library of Alexandria [01:19] thankfully, one of the larger-scale implications of the internet's existance is that information destruction on that scale is now substantially more difficulty [01:19] *difficult [01:36] oh my god [01:36] http://keygenjukebox.com/ [01:39] kennethre: http://keygenmusic.net/ ;) [01:39] four times as much [01:39] sorry, 3 times * [01:40] no player though [01:40] still quite cool [01:40] well, that's what xmplay is for :) [01:40] xmplay is magic [01:40] I'm actually a bit sad that there's no xmplay for Linux [01:40] I love it to bits [01:40] (no pun intended) [01:41] too bad there's no keygen radio [01:41] like rainwave.cc but for keygens [01:43] * joepie91 bookmarked rainwave [01:44] hm [01:44] considering setting up a keygen radio [01:44] anyone here that would like to volunteer in encoding all of keygenmusic.net into mp3? :P) [01:44] :) * [01:47] if I could I would [01:47] but I don't have the upstream [01:47] to actually upload them afterwards [01:55] www.2600.com is 13gb+ [02:33] remember this old fossil http://www.htdig.org/ [02:46] so i think 2600.com is to big for me [02:46] its still growing at 14gb+ [02:47] is there an automated way to mirror a pipermail archive w/metadata ? [02:56] Lord_Nigh, possibly, gimme a link so I can check [03:02] http://bluegrasspals.com/pipermail/dectalk/ [03:04] I got a script that can grab all that but no warc data [03:04] I used it for opensolaris [03:21] so i think i'm going to stop this archive of 2600.com [03:21] its too much data for me [03:22] Yeah 2600 is huge, many, many years of data. I wouldn't be surprised if the site was over 100gb [03:23] its mostly audio files that make it so big [03:23] i stopped it and deleting it [03:27] D: [03:28] its just too big for me to do a site dump of [03:29] here is the code for my script: [03:29] website="www.2600.com" [03:30] wget $website --mirror --warc-file=$website-$(date +%Y%m%d) --warc-cdx -e robots=off --warc-header="operator: Archive Team" --warc-max-size=1G -E -o wget.log [03:47] I want an xmplay for android :( [04:41] http://www.huffingtonpost.com/2013/04/15/condom-challenge-snorting-condoms-videos_n_3085258.html [04:41] must archive all these videos [04:44] ... [12:20] http://seclists.org/fulldisclosure/2013/Apr/28 [12:20] http://www.mymodernmet.com/profiles/blogs/dragon-ball-z-makankosappo-kamehameha/ [13:19] (if ersi says something something illegal re. #archiveteam I will murder him) [13:45] that would be illegal, so I wont. What if I get a DMCA [13:46] you're murdered [15:17] http://www.geocities.com/bswadener/humor/umac606.htm [15:17] how the hell is that still up [15:18] heh, site:geocities.com shows 193K results in google [15:18] what the shit [15:18] I guess Yahoo can't even DELETE stuff right [15:19] should probably grab what's still up there huh [15:20] probably... would need to compile a list though [15:20] Google's a good start at least [15:22] also calling the project name: geobitties [15:25] anyway I need sleep before insanity sets in [15:29] there are some sites still up, it seems to be accounts where they also had a domain name hosted through geocities, i.e. paying customers [19:05] yes [19:50] balrog, Got any other cool tools like plowshare [23:17] balrog: There's a bunch of Geocities sites up still. They had paid hosting as well, if I'm not mistaken. [23:17] I don't think those were killed. But most were of course not paid for [23:34] here's a post about that: http://contemporary-home-computing.org/1tb/archives/3022 [23:34] i'm looking for a way to grab urls from google [23:35] but without autoboting [23:35] *blocking [23:44] i'm mirroring fucking google [23:45] i'm already at the 500+ links [23:56] Does anyone actually use NutchWAX? It doesn't have warc support