[00:02] I shall resume seeding the geocities torrent [00:04] SketchCow, if RIAUG was only doing PD stuff, you won't need nibble copier [00:06] I know, right [00:06] But still, happy to help [00:06] Wait, they link the broken torrent [00:11] :\ [00:17] im downloading 641gb for nothing? :\ [00:17] whats the correct one [00:18] The correcht one overlays it [00:18] it has "fixed" in the name [00:18] otherwise same title [00:21] http://thepiratebay.org/torrent/6350414/Geocities_-_The_PATCHED_Torrent should've been it [00:21] wait, the patched one 404s [00:29] http://thepiratebay.se/torrent/6353395/Geocities_-_The_PATCHED_Torrent [00:31] Feedback needed. [00:31] Adding links on main TOSEC page to ready-for-love pages. [00:31] http://archive.org/details/tosec [00:34] I think "latest version" is likely redundant in that table. What other information might be helpful there? [00:46] Decision made. It'll be a link to the item page, and then the collection. [01:37] boom, just hit 400000 repos today [01:39] does google reader have a channel yet? [01:41] i just found this: download full text of articles from GR, 1000 at a time [01:41] https://productforums.google.com/forum/#!msg/reader/fMLNWm-sHDg/iaEVjOWdcaUJ [01:42] http://www.flickr.com/photos/37796451@N00/sets/72157624564641788/with/4821129260/ [03:04] does anyone have software for backing up https://bitcointalk.org/? [03:32] those are some really lovely photographs SketchCow, thanks [03:41] looks like underground gamer has all issues of secret service magazine [03:53] i posted on g4 forums [03:54] telling them its being archived [04:06] Whoever has access to delete pages on the wiki: I just cleared a bunch of spam pages, should be safe to delete them [04:40] Great [07:04] So, the Punchfork guy ignored my tweets back and left it at that. [07:04] So current signs point to "fuck that guy" [07:04] Upload begins tonight. 2tb [07:04] * BlueMax salutes [07:38] Say I were to get a news program to do a story on us.. [07:38] How many of you would run a knife through my heart? [07:43] No one, unless you're bluemax [07:46] "So we're archiving all these websites and it's illegal" [07:55] well you don't come out and say that outright, you fool [07:56] if they ask you might want to say something like "is it illegal for a librarian to make a photocopy?" [07:56] but there's nothing to gain by bringing it up yourself [07:57] (I was imitating BlueMax) [07:57] I'm sure you know this [07:57] right [08:04] * ersi chuckles [08:12] omf_: blog.thephoenix.com is done: https://www.dropbox.com/sh/npw4q5v57jylvzm/NKPOxQnxz1 [08:17] I can give you an ftp for that instead. [08:18] oh, that'd be good [08:24] He's going to FTP it and I'll throw it into the system. [08:25] ia is voracious as a woodchipper [08:26] sometimes you gotta change the truck or add some more fuel, but for the most part om nom nom wngngggggffff wngggffffff [08:27] lol [08:27] Guys, my suggestion about the THQ forums, I think it got missed [08:27] I don't know how to grab it due to the fact it has the weird login page with the age request stuff. [08:28] I'd kind of hope we can get the user generated photos from the games too - saints row 3 at least does photo uploads, I don't know about other games. [08:33] SketchCow: upload complete. [08:34] That small, huh? [08:34] Poor thephoenix [08:34] Its 2GB unpacked [08:35] 32,000 files, I believe [08:35] lots of icky asps files [08:35] *aspx [08:49] now we just need to find a way to grab the rest of thephoenix without it ending prematurely [08:50] if you can provide a script, I'll rip it [08:52] getting a lot of 504s on posterous, btw [09:25] http://archive.org/details/archiveteam_blog_thephoenix [09:27] is there a page or a room dedicated to the phoenix project? [09:27] no in regards to dedicated room [09:27] no(t yet) [09:28] okay I will start it up join #theashes [09:28] sorry, I have not helped with an archive team project in a while [09:42] sep332: good find [09:46] Just noticed a comment on the Geocities torrent: "Good download, done in 2 mins..." [10:06] lol [11:11] Korea Blogs uploading. 476gb or something. [11:11] :O [11:14] http://archive.org/details/yahoo_korea_blogs [11:42] doing another download of the off topic forum [11:42] cause i'm crazy [11:42] just the first 3 pages anyways [11:42] trying to make sure i have the all posts of that at least [11:52] I hacked together a wget lua script based off the vbulletin.lua test script and am running it across every board at the moment, I'll upload what I get too (script is untested so who knows what I'll get ¯\_(ツ)_/¯) [11:53] although I might skip off topic due to tis size and the fact you are doing it [12:05] Cameron_D: i did most of the forums [12:06] yeah, but hey, an extra copy can't hurt :P [12:06] images maybe a problem but i saved about 12gb of exterinal images [12:06] agree [12:11] http://archive.org/details/yahoo_korea_blogs_20130318044714 [12:13] "Since we’re talking about records we should also mention the largest and smallest torrents on The Pirate Bay. The largest active torrent is an archive of the late Geocities.com, that was shut down by Yahoo in 2010. small-torrentThe 641.32 GB torrent is currently battling for its survival with just one seeder." [12:13] That the IA seeder then? [12:13] -- https://torrentfreak.com/the-pirate-bays-oldest-torrent-is-revolution-os-130317/ [12:25] yeah I mentioned that earlier [12:29] BlueMax: ¬_ [12:30] * BlueMax throws SmileyG out the window. [12:48] so i'm like doing 2 downloads at once right now [12:48] trying to get all attack of the show and x-play clips in hd [14:55] omf_: IA already has all of the spritzer tweets from Oct 2011. [15:07] swebb, is that the twitter stream grab? [15:07] yup [15:08] I got so many projects to run over with that data [15:09] So the Twitter TOS says that you can collect the stream yourself and use it yourself, but you can't redistribute the tweets to anyone. [15:10] It is a stupid TOS if you ask me, but that's what they're going with. [15:10] I know [15:10] They are dumb about it. Every restriction they put on twitter lowers its value [15:10] The IA is receiving the stream as well but they can't distribute the data at this time either. I believe that they're waiting for twitter to change their TOS or go out of business before releasing the data to the public. [15:11] IA gets the full thing like the Library of Congress? [15:11] no, there was no official agreement between Twitter and IA, so IA only has the free stream, not the full firehose like the LOC has. [15:12] The LOC has everything back to the first tweet. [15:12] I sorta doubt the LOC has private accounts, heh [15:12] Only big-time research-y people have access to the LOC archive, I hear. [15:12] It's hard for them to index all of the data due to its size. [15:13] so searching the archive is actually quite a tough task. [15:13] well, it's not like you can easily search wayback either [15:13] The twitter firehose currently generates 2TB of data/day. [15:13] All the press says no one has access because the LOC cannot setup infrastructure to do it. A few professor I went to school with had their requests for access denied [15:13] It wasn't soo large earlier, but it's a large amount of data now. [15:14] They've given access to big-time researchers. [15:14] how [15:14] Government, and newspapers, I hear. [15:14] copies on hard drives [15:14] If they have a date range, they can get the data, but searching for a term or a user's tweets is not as easily done. The archive is split up based on date. [15:15] please someone back up g4tv twitter: https://twitter.com/g4tv [15:15] I should know, I used to work for Gnip and I was the one in charge of transferring the legacy twitter corpus to the LOC from Gnip. [15:15] godane, I am on it [15:15] thanks [15:16] swebb, what is it like working for a company that mainly deals in data [15:16] worked* [15:16] They worked out of the Amazon cloud so they dumped everything into S3 at the time. [15:16] Smart move [15:17] is it just sequencial data? [15:17] The hard part of working there were the people in the financial industry on the east-coast who used twitter as a signal to buy/sell stocks. They needed the data there instantly and Gnip used a tcp/http based delivery mechanism which introduced latency when sending the stream across the US from West coast to the East coast. The financial guys said that a few seconds of latency was too slow. [15:18] and then a users profile is the stamp for that tweet? [15:18] All tweets are archived similarly to how I'm archiving them. They're stored in a new file every minute and that minute's worth of tweets is bzip2'ed and archived after being processed. [15:19] Every tweet has a tweet ID. [15:19] It's a sequential ID and the spritzer stream has 1-2% of the full firehose (a mod-100, like filter on the tweet ID) [15:20] who can access the spritzer? [15:20] who has full firehose access? [15:20] Twiter has another annoying thing that made archiving the twitter stream a dog. Anyone archiving the twiter stream had to honor 'deletes' from the users. [15:20] so even the LoC does that? [15:21] :/ [15:21] anyone can access the spritzer stream. [15:21] curl -s http://stream.twitter.com/1.1/statuses/sample.json -uusername:password [15:22] Companies with access to the full stream are usually financial companies (stock market), or social media monitoring companies. [15:22] Advertising companies [15:22] the LOC [15:22] and a few others. [15:22] Adobe [15:22] i found more episodes of the screen savers [15:22] wut, that's weird (Re Adobe) [15:22] what was left of it since end of 2004 into 2005 [15:23] Wall St Journal [15:23] CNN [15:23] these rips will sadly not be the original asf files [15:23] godane: hmm why? [15:24] there convert by g4rewind [15:24] since they're no longer on g4? [15:24] one guy converts stuff and always makes it very big [15:24] these are full episodes [15:25] there are no good full episodes even if there on the website [15:25] i save the full episodes dialup versions of the screen savers from like 2000 [15:51] are you sure there isn't steganographic data hidden in the files? O_o [15:52] you could be archiving someone's pr0n stash in there lol [16:39] nothing wrong with that [16:41] ????? [16:48] !!!!! [16:56] is anyone here? [16:59] Yes, why would you ask that on IRC? Especially with 100+ nicks in here? [17:09] Intensive: no, no one is wrong [17:09] ersi: I'm not sure what you are talking about. No one is talking. :) [17:17] i m new on this chat room [18:08] Yahoo! Messages Tracker URL please [18:13] There's no tracker. There's #BurnTheMessenger as project channel [18:54] https://twitter.com/kwiens/status/313721129346953217 [18:55] fuckin government getting all up in my shit [18:56] o_I [19:11] I am downloading the internet census 2012. [19:11] 563gb [19:11] I love that at 7mb/sec, it'll still take a day [19:11] YEeaaaahahahahhhhhhhhhh [19:26] god lord [19:26] good rather [20:04] Doing an interview with HuffPo about "fuck, a lot of sites are going down" [20:11] sounds like fun [20:50] the second dot com boom? [20:50] myspace next is mah bet. [20:54] i wish myconfinedspace.com would go out of "business" so i could poach the domain name :P [20:54] and put up photographs of "CONFINED SPACE - ENTRY PERMIT AND PROCEDURES REQUIRED" signs from various factories and such [21:06] Huffpo almost done [21:48] 498,725,624,046 KB [21:48] That's how big we currently are. [21:54] jeepers [22:02] http://archive.org/details/archiveteam_punchfork [22:02] My Little Punchfork. [22:02] To Jeff Miller, CEO of Punchfork: Come at me, bro [22:06] so, he got back to you again? [22:06] So, nobody came forward with 2TB of space for punchfork [22:06] So it's going into the archive. [22:06] It's acceptable, we'll scramble and spirit away the material if needed [22:06] But for now, I'll just get it in there. [22:07] No, he never got back [22:07] But HuffPo is going to contact him [22:07] ha ha ha ha [22:08] So while I wouldn't call Punchfork "disposable", I'd not be sad if it got deleted in some sort of threat matrix [22:08] But I think it's time to see if I can call one of these fuckers out [22:20] What is this? http://www.youtube.com/watch?v=_hjPte3BLVc [22:23] i think the one and only comment says it all really. [22:24] lol [22:24] woop woop woop off-topic siren [22:25] Gets klined from freenode, starts spamming other networks [22:25] I'm trying to run the Posterous warrior, but it looks like it's getting blocked (as mentioned in the note) [22:25] What is this? http://www.youtube.com/watch?v=_hjPte3BLVc [22:33] this is certainly fascinating: http://seclists.org/fulldisclosure/2013/Mar/166 (they scanned the entire IPv4 block using embedded devices with default telnet passwords) [22:34] and released the entire archive of results to the public [22:43] Hello. There's a yahoo-messages project on the warrior (still in beta). If a few people could run that to test it... #BurnTheMessenger [22:55] just joined yahoo messages [22:58] Yahoo Messagers: I forgot one small but important thing. That's added now, so please update. [22:58] Sorry. :) [22:58] how do I update? just stop/restart the project? [22:59] Yes. [23:01] OK, hope that worked :) [23:01] And with that, I'm gone for now. Bye. [23:08] I'm downloading the IPv4 block scan as we speak. [23:08] 89888.9 / 583096.0 MB Rate: 18038.8 / 2656.5 KB Uploaded: 145369.0 MB [15%] 2d 4:48 [ R: 1.62] [23:08] InternetCensus2012 [23:08] Coming along. [23:10] can I run multiple warriors with the same nickname? [23:12] i'm getting a lot of this: Rate limited. Waiting for 300 seconds... [23:29] amerrykan, you definitely can [23:29] which project are you working on and getting those rate limited errors? [23:29] yahoo messages [23:30] SketchCow, is that a torrent? in any case, what's the URL? [23:30] ohh, ok; i don't know anything about the yahoo messages project [23:30] * GLaDOS SQL injects amerrykan into #BurnTheMessenger [23:31] i have six concurrent items running, and they all report the rate limited message :\ [23:31] i see others are hammering away on the leaderboard [23:32] same, 3 concurrent, and I just got the RL message twice [23:32] I asked in the BTM channel if it was deliberate or not but I got no answer as of yet [23:33] clicking the Website link gives me a Yahoo Error 999 [23:33] so i think we're being throttled? [23:33] same message, probably this: [23:34] "This problem may be due to unusual network activity coming from your Internet Service Provider." [23:35] then I guess I want to work on Posterous in the meantime, how do I contribute without getting banned? [23:36] try not to have too many concurrent items going at once [23:36] I had all 6 running and I got hammered quickly [23:36] ~2-3 should be fine [23:36] amerrykan: you should be fine now [23:37] GLaDOS: how so? [23:37] Someone within Posterous set something up for users with AT user agents so they don't get banned [23:37] The warriors get said user agent [23:39] OK, seems to be working [23:50] hooray for taking advantage of the handicapped! [23:53] what is an "AT" user agent? [23:53] assistive technology. like screenreaders for the blind [23:54] ohhhhh, yeah [23:54] GLaDOS is talking about AT = Archiveteam [23:54] Yeah [23:54] oh [23:54] ohh [23:54] heh [23:54] "Now with 100% less guilt!" [23:54] i suppose in #archiveteam, "AT" might mean that, yes