[03:22] just saw the archiveteam defcon talk, and it was awesome- is that the best one to show to people who wonder what it is and what they do? [03:31] It's a good one. [03:34] so is the soy sauce available for order yet? [03:40] It will take years. [03:53] http://www.yagisawa-s.co.jp/ is the company. [04:08] SketchCow: this seems like a silly thing to ask, but are you interested in the bits of stage6.com I managed to download a few years back? [04:11] some stats at http://wegetsignal.org/stage6.php [04:14] Yes [04:19] i only managed to get metadata for 4.24% of all of the videos that were up there, and the acutal videos for only 1.17% :( [04:24] Understood. [04:24] How big? [04:24] I mean the final [04:25] 304G for the videos and less than 1G for the metadata (which currently lives in a mysql db) [04:25] Oh, that's nothin' [04:33] I don't remember if the site had comments on the video pages, but if so, I don't have that. the metadata I have is id, original url, title, upload date, user, description, and tags. (which at the time was all I was concerned with) [04:36] did you guys backup starwars forums? [04:46] looks like the database dump is only 7.2M with extended inserts and 12M without extended inserts [04:51] it looks like stage6 was hacked a few months before the shutdown [04:52] At approximately 16:00 GMT on February 9, 2008, Stage6 was hacked. People that visited the front page of the website were redirected to multiple shock sites. [04:52] Several thousand user accounts, that were used to upload videos between December 7, 2007 and February 10, 2008, are thought to have been compromised by the attack. [04:53] it thought users account wore hacked on December 7, 2007 [04:53] :P [04:54] there was also only 3 days after it was announced that would shut down that it was [04:59] vreel was another site that died too [04:59] vreel was set to replace stage6 it looks like [05:01] i was downloading videos for at least 5 days since the announcement [05:02] and I had to alter my script a few times because they changed some things on the back end to make it more difficult for you to download the videos [05:02] ok [05:03] so the videos could still be downloaded even thought the site may have been shut down [05:04] well, at the end, you had to get an access token from the video page in order to download the video. (before that, if you knew the video ID number you wanted to download, you could just get the video file directly) [05:05] i just hope youtube never goes down [05:06] it will, be it in 5 years or 50, it will. [05:07] do we start now? [05:07] maybe a good idea [05:07] since there is so much data [05:07] we could since the content that is already there isn't really going to change [05:08] start with the older videos and slowly go up [05:08] well youtube isn't incremental [05:08] so you'll need to brute-force the possible values [05:08] that's what i was thinking [05:08] and scrape them, if it 404's go back to it in a few years and try again [05:09] also if we find a way to not download 10 versions of the same video from 10 users that would also be good [05:09] i wonder if they reuse ids of videos that were removed, after some cooling-off period [05:09] and we don't really need to download every quality version [05:10] i know [05:10] but the highest version on youtube would be nice [05:11] so it can down encode in years to come [05:11] and when higher quality options become available again? [05:11] i had problems after going for the giant robot project in one day :/ not sure if it was my end or if they were blocking attempts from youtube-dl sequentially hitting the playlist [05:12] at least we are downloading youtube and not youporn [05:12] i'm sure someone out there has that covered [05:13] yes [05:57] glad to see that phantomjs was already in the tools section, if anyone can school me in that that would be neat [06:54] ok, last night tar ran out of memory again. I just added even more swap (16G of swap and 4G of ram) and we'll see how it goes [06:54] tar is currently up to 9490M virtual [06:56] and back down to 5444M [07:05] 3 more days then my bandwidth refreshes [07:13] up to 6690M again, and finally outputting data [07:13] heh [07:13] Coderjoe: what are you tarring? [07:14] i was curious about how well all of the friendster data I have compresses when in one big tarball. so i untarred them all and am doing that now, with a filelist to pull in files in a certain order, which should compress the best [07:15] heh [07:15] that filelist is 4735134182 bytes [07:15] sheesh [07:16] the uncompressed tarball is expected to be 1452086937600 bytes [07:18] mmm [07:18] I have no idea how big mine is uncompressed [07:20] hrm [07:20] my center channel is very very quiet [07:20] it it hunting rabbit? [07:20] not sure [07:20] I can't hear what it's saying [07:21] loose cable [07:44] SketchCow: found a video of you talking about textfiles.com back in 2000 [07:44] at defcon 8 [07:44] backing it up [08:19] wget is now 2.5GB memory [08:59] hrm... [08:59] I think friendster.000023000-000240000.tar.gz is misnamed... [08:59] 000 023 000 - 000 240 000 [09:03] btw, I'm running a program I wrote to run through the deflate stream and break in the debugger when it hits one of a few invalid conditions. Then I'll have the offset to the start of the block, and be able to see what the file looks like there [11:36] do you guys take old fan fics? [11:37] i only ask cause i got a old dragon ball z fan fic from 2002 [11:51] godane: paper or electronic? [11:54] electronic [11:54] you should put it on archive.org [12:00] godane: archiveteam is not a donation bin. we're volunteer firefighters. archive.org is the donation bin [12:01] ok [12:01] i think all of it is on archive.org [13:54] Hehe.. my instructables.com wget thread is up to 3GB memory use and it's downloaded 13.5GB of data [13:56] ersi, I started getting instructables, ended up with ~40gb I think, before I had a power outage and never resumed it [13:56] Jebus [13:57] I think my machine will kill itself before getting that much data.. seeing how it's slurping up 3GB memory right now [14:57] you know you didn't archive something right when during extraction, 8157 errors are returned [15:08] errors? [15:41] ersi: "cannot create directory" [15:45] >_> [15:45] bet it's inode related or something? [16:30] ersi: windows related. [16:30] "HERP DERP IT R HEER NOT THAR" [16:31] You know how windows is like at times, right? [17:32] lol [17:33] well, enjoy your fail mr windows guy [18:27] I was scanning old photos, and found this cigar box, mostly old photos of family, at the bottom ? A picture of my uncles and grandpa in kkk uniforms, 8000 dollars, and 2 oz's of weed. [18:28] you should make that a comic and become famous on reddit [18:29] This is very good idea [18:29] isn't there a site to make rage comics ? [18:30] CETUSA.org [18:30] Could someone please heretrix/wget that thing. [18:31] Diversify Your Staff, Hire An Asian Women To Pose On Your Website [18:54] SketchCow: on it. [19:09] SketchCow: looks like it's down. all the paths for the css and stuff are given as absolute paths, do you mind that? [19:09] i dont remember who was archiving twit.tv, but here it is the wiki dump http://code.google.com/p/wikiteam/downloads/list [19:09] /foo.css [19:09] so it won't work if you host it under some subdir. [19:11] never mind... [19:11] ace http://buttcoin.org/bitcointalk-forums-hacked-bill-cosby-pimping-new-cosbycoins�-to-all-the-members-breaking [19:13] oh man, that's pretty funny [19:13] good old-school hack [19:16] lul [19:19] jesus its filenames are annoying. [19:24] Could somebody try and throw heretrix after cetusa.org? wget does a pretty bad job at making sense of their pretty bad rewriting rules [19:36] <- giving up on cetusa.org until it's been determined if heretrix could mirror it more painlessly [22:15] ShetchCow: it looks like defcon didn't get the videos and audio right until defcon 13 [22:16] watch defcon 8 and 12 videos of you [22:17] that shit is haaaaard [22:17] *whine [22:25] certainly [22:44] didn't really get it right this year either [22:44] the audio should have just been straight from the mic he was wearing, rather than mixed from both it and the mic at the podium [22:49] yeah, it was like that in the auditorium too [23:03] That got on my nerves, and I was running that room half the time. Couldn't get the audio guys to choose one. [23:22] :( [23:22] goekesmi: you were in charge of audio in that room this year? [23:24] Nope. I'm a stage manager for defcon. I can vaguely influence the A/V side of the operation. [23:26] ah neat