#archiveteam 2013-03-18,Mon

↑back Search

Time Nickname Message
00:02 πŸ”— Cameron_D I shall resume seeding the geocities torrent
00:04 πŸ”— Cowering SketchCow, if RIAUG was only doing PD stuff, you won't need nibble copier
00:06 πŸ”— SketchCow I know, right
00:06 πŸ”— SketchCow But still, happy to help
00:06 πŸ”— Cameron_D Wait, they link the broken torrent
00:11 πŸ”— chronomex :\
00:17 πŸ”— S[h]O[r]T im downloading 641gb for nothing? :\
00:17 πŸ”— S[h]O[r]T whats the correct one
00:18 πŸ”— chronomex The correcht one overlays it
00:18 πŸ”— chronomex it has "fixed" in the name
00:18 πŸ”— chronomex otherwise same title
00:21 πŸ”— Cameron_D http://thepiratebay.org/torrent/6350414/Geocities_-_The_PATCHED_Torrent should've been it
00:21 πŸ”— Cameron_D wait, the patched one 404s
00:29 πŸ”— S[h]O[r]T http://thepiratebay.se/torrent/6353395/Geocities_-_The_PATCHED_Torrent
00:31 πŸ”— SketchCow Feedback needed.
00:31 πŸ”— SketchCow Adding links on main TOSEC page to ready-for-love pages.
00:31 πŸ”— SketchCow http://archive.org/details/tosec
00:34 πŸ”— SketchCow I think "latest version" is likely redundant in that table. What other information might be helpful there?
00:46 πŸ”— SketchCow Decision made. It'll be a link to the item page, and then the collection.
01:37 πŸ”— WiK boom, just hit 400000 repos today
01:39 πŸ”— sep332 does google reader have a channel yet?
01:41 πŸ”— sep332 i just found this: download full text of articles from GR, 1000 at a time
01:41 πŸ”— sep332 https://productforums.google.com/forum/#!msg/reader/fMLNWm-sHDg/iaEVjOWdcaUJ
01:42 πŸ”— SketchCow http://www.flickr.com/photos/37796451@N00/sets/72157624564641788/with/4821129260/
03:04 πŸ”— ivan` does anyone have software for backing up https://bitcointalk.org/?
03:32 πŸ”— ex-parrot those are some really lovely photographs SketchCow, thanks
03:41 πŸ”— godane looks like underground gamer has all issues of secret service magazine
03:53 πŸ”— godane i posted on g4 forums
03:54 πŸ”— godane telling them its being archived
04:06 πŸ”— adamcaudi Whoever has access to delete pages on the wiki: I just cleared a bunch of spam pages, should be safe to delete them
04:40 πŸ”— SketchCow Great
07:04 πŸ”— SketchCow So, the Punchfork guy ignored my tweets back and left it at that.
07:04 πŸ”— SketchCow So current signs point to "fuck that guy"
07:04 πŸ”— SketchCow Upload begins tonight. 2tb
07:04 πŸ”— * BlueMax salutes
07:38 πŸ”— GLaDOS Say I were to get a news program to do a story on us..
07:38 πŸ”— GLaDOS How many of you would run a knife through my heart?
07:43 πŸ”— ersi No one, unless you're bluemax
07:46 πŸ”— GLaDOS "So we're archiving all these websites and it's illegal"
07:55 πŸ”— chronomex well you don't come out and say that outright, you fool
07:56 πŸ”— chronomex if they ask you might want to say something like "is it illegal for a librarian to make a photocopy?"
07:56 πŸ”— chronomex but there's nothing to gain by bringing it up yourself
07:57 πŸ”— GLaDOS (I was imitating BlueMax)
07:57 πŸ”— chronomex I'm sure you know this
07:57 πŸ”— chronomex right
08:04 πŸ”— * ersi chuckles
08:12 πŸ”— Samuel_Mi omf_: blog.thephoenix.com is done: https://www.dropbox.com/sh/npw4q5v57jylvzm/NKPOxQnxz1
08:17 πŸ”— SketchCow I can give you an ftp for that instead.
08:18 πŸ”— Samuel_Mi oh, that'd be good
08:24 πŸ”— SketchCow He's going to FTP it and I'll throw it into the system.
08:25 πŸ”— chronomex ia is voracious as a woodchipper
08:26 πŸ”— chronomex sometimes you gotta change the truck or add some more fuel, but for the most part om nom nom wngngggggffff wngggffffff
08:27 πŸ”— SmileyG lol
08:27 πŸ”— SmileyG Guys, my suggestion about the THQ forums, I think it got missed
08:27 πŸ”— SmileyG I don't know how to grab it due to the fact it has the weird login page with the age request stuff.
08:28 πŸ”— SmileyG I'd kind of hope we can get the user generated photos from the games too - saints row 3 at least does photo uploads, I don't know about other games.
08:33 πŸ”— Samuel_Mi SketchCow: upload complete.
08:34 πŸ”— SketchCow That small, huh?
08:34 πŸ”— SketchCow Poor thephoenix
08:34 πŸ”— Samuel_Mi Its 2GB unpacked
08:35 πŸ”— Samuel_Mi 32,000 files, I believe
08:35 πŸ”— Samuel_Mi lots of icky asps files
08:35 πŸ”— Samuel_Mi *aspx
08:49 πŸ”— omf_ now we just need to find a way to grab the rest of thephoenix without it ending prematurely
08:50 πŸ”— Samuel_Mi if you can provide a script, I'll rip it
08:52 πŸ”— Samuel_Mi getting a lot of 504s on posterous, btw
09:25 πŸ”— SketchCow http://archive.org/details/archiveteam_blog_thephoenix
09:27 πŸ”— no2pencil is there a page or a room dedicated to the phoenix project?
09:27 πŸ”— ersi no in regards to dedicated room
09:27 πŸ”— GLaDOS no(t yet)
09:28 πŸ”— omf_ okay I will start it up join #theashes
09:28 πŸ”— no2pencil sorry, I have not helped with an archive team project in a while
09:42 πŸ”— arrith sep332: good find
09:46 πŸ”— GLaDOS Just noticed a comment on the Geocities torrent: "Good download, done in 2 mins..."
10:06 πŸ”— SmileyG lol
11:11 πŸ”— SketchCow Korea Blogs uploading. 476gb or something.
11:11 πŸ”— SmileyG :O
11:14 πŸ”— SketchCow http://archive.org/details/yahoo_korea_blogs
11:42 πŸ”— godane doing another download of the off topic forum
11:42 πŸ”— godane cause i'm crazy
11:42 πŸ”— godane just the first 3 pages anyways
11:42 πŸ”— godane trying to make sure i have the all posts of that at least
11:52 πŸ”— Cameron_D I hacked together a wget lua script based off the vbulletin.lua test script and am running it across every board at the moment, I'll upload what I get too (script is untested so who knows what I'll get ¯\_(ツ)_/¯)
11:53 πŸ”— Cameron_D although I might skip off topic due to tis size and the fact you are doing it
12:05 πŸ”— godane Cameron_D: i did most of the forums
12:06 πŸ”— Cameron_D yeah, but hey, an extra copy can't hurt :P
12:06 πŸ”— godane images maybe a problem but i saved about 12gb of exterinal images
12:06 πŸ”— godane agree
12:11 πŸ”— SketchCow http://archive.org/details/yahoo_korea_blogs_20130318044714
12:13 πŸ”— SmileyG "Since weҀ™re talking about records we should also mention the largest and smallest torrents on The Pirate Bay. The largest active torrent is an archive of the late Geocities.com, that was shut down by Yahoo in 2010. small-torrentThe 641.32 GB torrent is currently battling for its survival with just one seeder."
12:13 πŸ”— SmileyG That the IA seeder then?
12:13 πŸ”— SmileyG -- https://torrentfreak.com/the-pirate-bays-oldest-torrent-is-revolution-os-130317/
12:25 πŸ”— BlueMax yeah I mentioned that earlier
12:29 πŸ”— SmileyG BlueMax: ¬_
12:30 πŸ”— * BlueMax throws SmileyG out the window.
12:48 πŸ”— godane so i'm like doing 2 downloads at once right now
12:48 πŸ”— godane trying to get all attack of the show and x-play clips in hd
14:55 πŸ”— swebb omf_: IA already has all of the spritzer tweets from Oct 2011.
15:07 πŸ”— omf_ swebb, is that the twitter stream grab?
15:07 πŸ”— swebb yup
15:08 πŸ”— omf_ I got so many projects to run over with that data
15:09 πŸ”— swebb So the Twitter TOS says that you can collect the stream yourself and use it yourself, but you can't redistribute the tweets to anyone.
15:10 πŸ”— swebb It is a stupid TOS if you ask me, but that's what they're going with.
15:10 πŸ”— omf_ I know
15:10 πŸ”— omf_ They are dumb about it. Every restriction they put on twitter lowers its value
15:10 πŸ”— swebb The IA is receiving the stream as well but they can't distribute the data at this time either. I believe that they're waiting for twitter to change their TOS or go out of business before releasing the data to the public.
15:11 πŸ”— omf_ IA gets the full thing like the Library of Congress?
15:11 πŸ”— swebb no, there was no official agreement between Twitter and IA, so IA only has the free stream, not the full firehose like the LOC has.
15:12 πŸ”— swebb The LOC has everything back to the first tweet.
15:12 πŸ”— balrog_ I sorta doubt the LOC has private accounts, heh
15:12 πŸ”— swebb Only big-time research-y people have access to the LOC archive, I hear.
15:12 πŸ”— swebb It's hard for them to index all of the data due to its size.
15:13 πŸ”— swebb so searching the archive is actually quite a tough task.
15:13 πŸ”— balrog_ well, it's not like you can easily search wayback either
15:13 πŸ”— swebb The twitter firehose currently generates 2TB of data/day.
15:13 πŸ”— omf_ All the press says no one has access because the LOC cannot setup infrastructure to do it. A few professor I went to school with had their requests for access denied
15:13 πŸ”— swebb It wasn't soo large earlier, but it's a large amount of data now.
15:14 πŸ”— swebb They've given access to big-time researchers.
15:14 πŸ”— omf_ how
15:14 πŸ”— swebb Government, and newspapers, I hear.
15:14 πŸ”— omf_ copies on hard drives
15:14 πŸ”— swebb If they have a date range, they can get the data, but searching for a term or a user's tweets is not as easily done. The archive is split up based on date.
15:15 πŸ”— godane please someone back up g4tv twitter: https://twitter.com/g4tv
15:15 πŸ”— swebb I should know, I used to work for Gnip and I was the one in charge of transferring the legacy twitter corpus to the LOC from Gnip.
15:15 πŸ”— omf_ godane, I am on it
15:15 πŸ”— godane thanks
15:16 πŸ”— omf_ swebb, what is it like working for a company that mainly deals in data
15:16 πŸ”— ersi worked*
15:16 πŸ”— swebb They worked out of the Amazon cloud so they dumped everything into S3 at the time.
15:16 πŸ”— omf_ Smart move
15:17 πŸ”— SmileyG is it just sequencial data?
15:17 πŸ”— swebb The hard part of working there were the people in the financial industry on the east-coast who used twitter as a signal to buy/sell stocks. They needed the data there instantly and Gnip used a tcp/http based delivery mechanism which introduced latency when sending the stream across the US from West coast to the East coast. The financial guys said that a few seconds of latency was too slow.
15:18 πŸ”— SmileyG and then a users profile is the stamp for that tweet?
15:18 πŸ”— swebb All tweets are archived similarly to how I'm archiving them. They're stored in a new file every minute and that minute's worth of tweets is bzip2'ed and archived after being processed.
15:19 πŸ”— swebb Every tweet has a tweet ID.
15:19 πŸ”— swebb It's a sequential ID and the spritzer stream has 1-2% of the full firehose (a mod-100, like filter on the tweet ID)
15:20 πŸ”— SmileyG who can access the spritzer?
15:20 πŸ”— balrog_ who has full firehose access?
15:20 πŸ”— swebb Twiter has another annoying thing that made archiving the twitter stream a dog. Anyone archiving the twiter stream had to honor 'deletes' from the users.
15:20 πŸ”— balrog_ so even the LoC does that?
15:21 πŸ”— SmileyG :/
15:21 πŸ”— swebb anyone can access the spritzer stream.
15:21 πŸ”— swebb curl -s http://stream.twitter.com/1.1/statuses/sample.json -uusername:password
15:22 πŸ”— swebb Companies with access to the full stream are usually financial companies (stock market), or social media monitoring companies.
15:22 πŸ”— swebb Advertising companies
15:22 πŸ”— swebb the LOC
15:22 πŸ”— swebb and a few others.
15:22 πŸ”— swebb Adobe
15:22 πŸ”— godane i found more episodes of the screen savers
15:22 πŸ”— ersi wut, that's weird (Re Adobe)
15:22 πŸ”— godane what was left of it since end of 2004 into 2005
15:23 πŸ”— swebb Wall St Journal
15:23 πŸ”— swebb CNN
15:23 πŸ”— godane these rips will sadly not be the original asf files
15:23 πŸ”— balrog_ godane: hmm why?
15:24 πŸ”— godane there convert by g4rewind
15:24 πŸ”— balrog_ since they're no longer on g4?
15:24 πŸ”— godane one guy converts stuff and always makes it very big
15:24 πŸ”— godane these are full episodes
15:25 πŸ”— godane there are no good full episodes even if there on the website
15:25 πŸ”— godane i save the full episodes dialup versions of the screen savers from like 2000
15:51 πŸ”— sep332 are you sure there isn't steganographic data hidden in the files? O_o
15:52 πŸ”— sep332 you could be archiving someone's pr0n stash in there lol
16:39 πŸ”— chronomex nothing wrong with that
16:41 πŸ”— Intensive ?????
16:48 πŸ”— ersi !!!!!
16:56 πŸ”— Intensive is anyone here?
16:59 πŸ”— ersi Yes, why would you ask that on IRC? Especially with 100+ nicks in here?
17:09 πŸ”— eadler Intensive: no, no one is wrong
17:09 πŸ”— eadler ersi: I'm not sure what you are talking about. No one is talking. :)
17:17 πŸ”— Intensive i m new on this chat room
18:08 πŸ”— arkhive Yahoo! Messages Tracker URL please
18:13 πŸ”— ersi There's no tracker. There's #BurnTheMessenger as project channel
18:54 πŸ”— SketchCow https://twitter.com/kwiens/status/313721129346953217
18:55 πŸ”— chronomex fuckin government getting all up in my shit
18:56 πŸ”— SmileyG o_I
19:11 πŸ”— SketchCow I am downloading the internet census 2012.
19:11 πŸ”— SketchCow 563gb
19:11 πŸ”— SketchCow I love that at 7mb/sec, it'll still take a day
19:11 πŸ”— SketchCow YEeaaaahahahahhhhhhhhhh
19:26 πŸ”— WiK god lord
19:26 πŸ”— WiK good rather
20:04 πŸ”— SketchCow Doing an interview with HuffPo about "fuck, a lot of sites are going down"
20:11 πŸ”— WiK sounds like fun
20:50 πŸ”— SmileyG the second dot com boom?
20:50 πŸ”— SmileyG myspace next is mah bet.
20:54 πŸ”— DrDeke i wish myconfinedspace.com would go out of "business" so i could poach the domain name :P
20:54 πŸ”— DrDeke and put up photographs of "CONFINED SPACE - ENTRY PERMIT AND PROCEDURES REQUIRED" signs from various factories and such
21:06 πŸ”— SketchCow Huffpo almost done
21:48 πŸ”— SketchCow 498,725,624,046 KB
21:48 πŸ”— SketchCow That's how big we currently are.
21:54 πŸ”— chronomex jeepers
22:02 πŸ”— SketchCow http://archive.org/details/archiveteam_punchfork
22:02 πŸ”— SketchCow My Little Punchfork.
22:02 πŸ”— SketchCow To Jeff Miller, CEO of Punchfork: Come at me, bro
22:06 πŸ”— dashcloud so, he got back to you again?
22:06 πŸ”— SketchCow So, nobody came forward with 2TB of space for punchfork
22:06 πŸ”— SketchCow So it's going into the archive.
22:06 πŸ”— SketchCow It's acceptable, we'll scramble and spirit away the material if needed
22:06 πŸ”— SketchCow But for now, I'll just get it in there.
22:07 πŸ”— SketchCow No, he never got back
22:07 πŸ”— SketchCow But HuffPo is going to contact him
22:07 πŸ”— SketchCow ha ha ha ha
22:08 πŸ”— SketchCow So while I wouldn't call Punchfork "disposable", I'd not be sad if it got deleted in some sort of threat matrix
22:08 πŸ”— SketchCow But I think it's time to see if I can call one of these fuckers out
22:20 πŸ”— TimmyTwoT What is this? http://www.youtube.com/watch?v=_hjPte3BLVc
22:23 πŸ”— SmileyG i think the one and only comment says it all really.
22:24 πŸ”— chronomex lol
22:24 πŸ”— chronomex woop woop woop off-topic siren
22:25 πŸ”— grawity Gets klined from freenode, starts spamming other networks
22:25 πŸ”— MadSci I'm trying to run the Posterous warrior, but it looks like it's getting blocked (as mentioned in the note)
22:25 πŸ”— TimmyTwoT What is this? http://www.youtube.com/watch?v=_hjPte3BLVc
22:33 πŸ”— dashcloud this is certainly fascinating: http://seclists.org/fulldisclosure/2013/Mar/166 (they scanned the entire IPv4 block using embedded devices with default telnet passwords)
22:34 πŸ”— dashcloud and released the entire archive of results to the public
22:43 πŸ”— alard Hello. There's a yahoo-messages project on the warrior (still in beta). If a few people could run that to test it... #BurnTheMessenger
22:55 πŸ”— amerrykan just joined yahoo messages
22:58 πŸ”— alard Yahoo Messagers: I forgot one small but important thing. That's added now, so please update.
22:58 πŸ”— alard Sorry. :)
22:58 πŸ”— amerrykan how do I update? just stop/restart the project?
22:59 πŸ”— alard Yes.
23:01 πŸ”— amerrykan OK, hope that worked :)
23:01 πŸ”— alard And with that, I'm gone for now. Bye.
23:08 πŸ”— SketchCow I'm downloading the IPv4 block scan as we speak.
23:08 πŸ”— SketchCow 89888.9 / 583096.0 MB Rate: 18038.8 / 2656.5 KB Uploaded: 145369.0 MB [15%] 2d 4:48 [ R: 1.62]
23:08 πŸ”— SketchCow InternetCensus2012
23:08 πŸ”— SketchCow Coming along.
23:10 πŸ”— amerrykan can I run multiple warriors with the same nickname?
23:12 πŸ”— amerrykan i'm getting a lot of this: Rate limited. Waiting for 300 seconds...
23:29 πŸ”— DrDeke amerrykan, you definitely can
23:29 πŸ”— DrDeke which project are you working on and getting those rate limited errors?
23:29 πŸ”— amerrykan yahoo messages
23:30 πŸ”— DrDeke SketchCow, is that a torrent? in any case, what's the URL?
23:30 πŸ”— DrDeke ohh, ok; i don't know anything about the yahoo messages project
23:30 πŸ”— * GLaDOS SQL injects amerrykan into #BurnTheMessenger
23:31 πŸ”— amerrykan i have six concurrent items running, and they all report the rate limited message :\
23:31 πŸ”— amerrykan i see others are hammering away on the leaderboard
23:32 πŸ”— wp494 same, 3 concurrent, and I just got the RL message twice
23:32 πŸ”— wp494 I asked in the BTM channel if it was deliberate or not but I got no answer as of yet
23:33 πŸ”— amerrykan clicking the Website link gives me a Yahoo Error 999
23:33 πŸ”— amerrykan so i think we're being throttled?
23:33 πŸ”— wp494 same message, probably this:
23:34 πŸ”— wp494 "This problem may be due to unusual network activity coming from your Internet Service Provider."
23:35 πŸ”— amerrykan then I guess I want to work on Posterous in the meantime, how do I contribute without getting banned?
23:36 πŸ”— wp494 try not to have too many concurrent items going at once
23:36 πŸ”— wp494 I had all 6 running and I got hammered quickly
23:36 πŸ”— wp494 ~2-3 should be fine
23:36 πŸ”— GLaDOS amerrykan: you should be fine now
23:37 πŸ”— amerrykan GLaDOS: how so?
23:37 πŸ”— GLaDOS Someone within Posterous set something up for users with AT user agents so they don't get banned
23:37 πŸ”— GLaDOS The warriors get said user agent
23:39 πŸ”— amerrykan OK, seems to be working
23:50 πŸ”— pilgrim hooray for taking advantage of the handicapped!
23:53 πŸ”— DrDeke what is an "AT" user agent?
23:53 πŸ”— pilgrim assistive technology. like screenreaders for the blind
23:54 πŸ”— DrDeke ohhhhh, yeah
23:54 πŸ”— chronomex GLaDOS is talking about AT = Archiveteam
23:54 πŸ”— GLaDOS Yeah
23:54 πŸ”— pilgrim oh
23:54 πŸ”— DrDeke ohh
23:54 πŸ”— pilgrim heh
23:54 πŸ”— DrDeke "Now with 100% less guilt!"
23:54 πŸ”— pilgrim i suppose in #archiveteam, "AT" might mean that, yes

irclogger-viewer