#archiveteam 2013-01-26,Sat

↑back Search

Time Nickname Message
00:04 🔗 abartov The Warrior is a really inspired feat of packaging! I love it.
00:15 🔗 godane1 someone uploaded a 2.1gb file of the last x-play episode
00:15 🔗 godane1 getting that one now
00:28 🔗 JudgeDead Ever heard about formspring.me?
00:33 🔗 JudgeDead "This Account Has Expired. As of January 2013, accounts that have not been active in over 18 months may be automatically deleted. If this is your account, you may login within the next 24 hours to stop this account from being permanently deleted."
00:33 🔗 JudgeDead How nice of them. A whole 24 hours to react.
00:51 🔗 xk_id_ how bad is it to disrespect the robots.txt rate guidelines for bots?
00:51 🔗 xk_id_ ethically/legally
00:54 🔗 balrog_ rate guidelines? are they ridiculously strict here?
00:55 🔗 balrog_ also, you may violate the "spirit of the law" but who says you can't use multiple bots from multiple hosts
00:55 🔗 xk_id_ Crawl-delay: 5
00:55 🔗 xk_id_ for the Speedy spider
00:56 🔗 xk_id_ and I suppose our advice wouldn't really work for an EC2 cluster then, would it...
00:58 🔗 xk_id_ in fact... here's what they put in the robots.txt: 5 specific user-agents with crawl-delay restriction. If my crawler is diy, then I can assume it's not restricted?
00:58 🔗 * xk_id_ chuckles
00:58 🔗 balrog_ you can set your own UA
00:58 🔗 balrog_ are they blocking all other UAs?
00:58 🔗 xk_id_ nope
00:58 🔗 balrog_ lol
00:59 🔗 xk_id_ ikr
00:59 🔗 balrog_ I'd still add a delay just to be polite
00:59 🔗 balrog_ but I'd probably make is lower
00:59 🔗 xk_id_ under "User-agent: *" they only list a bunch of "Disallow:"
00:59 🔗 xk_id_ Yes. I shall. say, what would be a polite, yet satisfying delay?
01:00 🔗 xk_id_ in fact, let's keep it to polite. (satisfying depends on my needs)
01:00 🔗 balrog_ 1 or 2 seconds probably is what I'd do
01:00 🔗 xk_id_ ic
01:00 🔗 xk_id_ would that be cumulative across the cluster? or for just one machine?
02:08 🔗 chronomex xk_id_: robots is somewhere between a request and a suggestion
02:08 🔗 chronomex imo
02:08 🔗 xk_id_ ic
02:08 🔗 chronomex polite delay is 1 second, sneaky is 10 seconds
04:33 🔗 godane1 uploaded finally: http://archive.org/details/TechTV_Music_Wars
09:57 🔗 omf__ http://www.ussc.gov/ got hacked by Anonymous
09:57 🔗 omf__ I made a copy for the future to see
10:07 🔗 illunatic omf__: archiving that?
10:12 🔗 omf__ I am downloading it now
10:12 🔗 omf__ many of the content mirrors are down so
10:13 🔗 omf__ I am really interested in what is in these files, no doubt it is going to piss off the gov
10:13 🔗 omf__ they are 150mb each
10:15 🔗 illunatic :)
10:15 🔗 illunatic the gov't is going all out tho
10:16 🔗 illunatic trying to recruit netizens to fight in the cyber war
10:16 🔗 illunatic http://www.whitehouse.gov/blog/2013/01/22/roll-your-sleeves-get-involved-and-get-civic-hacking
10:17 🔗 illunatic http://activepolitic.com:82/external/1785.html?utm_source=dlvr.it&utm_medium=twitter
10:17 🔗 illunatic John Kerry: Foreign Hackers Are '21st Century Nuclear Weapons'
10:18 🔗 ersi #archiveteam-bs man
10:18 🔗 ersi If it ain't about archiving, put it in -bs
10:18 🔗 omf__ the download is going super slow, I assume because everyone is slamming the shit out of those servers
11:23 🔗 omf__ ussc.gov dns has been taken out
11:23 🔗 omf__ am I the only one who backed it up?
14:41 🔗 db48x aww, ussc.gov is down
14:41 🔗 omf__ just the dns is down
14:41 🔗 db48x I always sleep at the worst times
14:41 🔗 db48x ah, have you an ip address?
14:41 🔗 omf__ the direct ip works
14:42 🔗 omf__ it is in the hackernews story
14:42 🔗 omf__ I am still getting 2 of the files from mirrors
14:42 🔗 Schbirid ncdu has new features since september: Added option to dump scanned directory information to a file (-o) & Added option to load scanned directory information from a file (-f)
14:43 🔗 Schbirid this makes it the perfect tool if you want to get a nice overview of some directories, their sizes, etc
14:45 🔗 db48x I guess the story left the first few pages
15:08 🔗 omf__ https://news.ycombinator.com/item?id=5119600
15:08 🔗 omf__ it is still the 2nd story on there
15:08 🔗 omf__ it looks like for now I am only going to get 2 of the files
15:08 🔗 omf__ I am keeping an eye out for more mirrors
15:24 🔗 db48x omf__: that one doesn't seem to have the ip address in it
15:35 🔗 omf__ Yes it does but it does not matter anymore since they partially restored the site to normal
16:34 🔗 ats_ mmm, flashy lights: http://offog.org/stuff/arc-breakout.jpg
16:34 🔗 ats_ (as it turned out, I didn't need the breakout box to do anything in the end, but at least it gives me something to watch while the hard disk image is copying...)
18:27 🔗 db48x omf__: which of those files have you downloaded?
18:27 🔗 balrog_ http://www.youtube.com/watch?v=myYzfsEOaDw
18:28 🔗 balrog_ http://www.youtube.com/watch?v=x3Fz1V3LZtw
18:28 🔗 balrog_ alternate footage of the NYC memorial; official footage of the IA memorial.
18:33 🔗 omf__ almost done with kennedy and scalia
18:34 🔗 omf__ I also got the site as screenshots
18:35 🔗 omf__ I hope others are getting the other files
18:36 🔗 balrog_ link me
18:36 🔗 balrog_ oh, the hn one? ok
18:37 🔗 omf__ http://pastebin.com/d2nvt263
18:37 🔗 omf__ the new anonymous thing
18:37 🔗 balrog_ yeah ok
18:38 🔗 omf__ 7 parts left
18:38 🔗 balrog_ speeds cusk
18:38 🔗 balrog_ suck*
18:38 🔗 omf__ from what I have gathered you need all the files to decrypt everything
18:39 🔗 omf__ I am glad they at least have 4 mirrors since one was already taken down
18:39 🔗 balrog_ there is a torrent
18:40 🔗 balrog_ some are giving 503
18:40 🔗 balrog_ and most are 404
18:42 🔗 balrog_ eta 14h 44m
18:42 🔗 balrog_ for one
18:43 🔗 omf__ I want to add this to interesting things I collect over the years
18:43 🔗 omf__ like the doom 3 alpha and the half life 2 source code
18:43 🔗 omf__ amongst other things
18:44 🔗 balrog_ did you get the half life source code?
18:44 🔗 balrog_ err, half life alpha
18:44 🔗 omf__ I might have a copy of the game not the code, just HL2
18:44 🔗 omf__ I am not sure they ever caught who broke in
18:45 🔗 balrog_ wow
18:45 🔗 balrog_ the torrent picked up
18:45 🔗 balrog_ should be done in a minute
18:45 🔗 Schbirid omf__: half life (1) alpha leaked some week ago
18:45 🔗 Schbirid alpha as in press release disc
18:45 🔗 Schbirid pretty nice early stuff
18:46 🔗 omf__ hmm maybe I should look around for it
18:46 🔗 omf__ my big thing is gaming history, way too much of that is long gone.
18:49 🔗 balrog_ the torrent has a "press release" flv in it
18:49 🔗 balrog_ and .txt
18:50 🔗 balrog_ omf__: I have all the files
18:50 🔗 omf__ you got a fast internet connection
18:50 🔗 balrog_ yeah, 5MB/s
18:56 🔗 balrog_ omf__: someone posted the whole thing on mega : https://mega.co.nz/#!V9sH3TIC!P9U_C2udtPdJyt8772o_aEiceHsV7BDxdOmwO9224Qg
18:56 🔗 omf__ hahah go MEGA
18:56 🔗 balrog_ ha, they force accepting tos
18:56 🔗 balrog_ most download sites don't do that for downloaders
18:58 🔗 omf__ I need fucking flash to use mega
18:58 🔗 balrog_ :[
18:58 🔗 balrog_ meh
18:58 🔗 omf__ what a huge stack of shit
18:58 🔗 balrog_ hold on
18:59 🔗 omf__ why cannot flash just die
19:00 🔗 balrog_ someone grab google cache of ussc.gov
19:02 🔗 omf__ I already got the main page saved from before. First thing I did
19:02 🔗 balrog_ ok
19:02 🔗 omf__ that and the video in case youtube pulled it
19:03 🔗 balrog_ see pm
19:17 🔗 omf__ http://no.reddit.com/r/technology/comments/17awqe/ussc_has_been_taken_down_with_an_important_message/
19:17 🔗 omf__ lists what was in clarity1-3
19:19 🔗 balrog_ hah
19:19 🔗 balrog_ weird
19:20 🔗 omf__ Does anyone else backup twitter feeds? I am only doing a few hundred so far
19:20 🔗 omf__ I was thinking of setting up an archive warrior so people could help
19:20 🔗 SketchCow We have people backing up both twitter feeds of most followed accounts, and a sample (called the drizzle) of the main feed.
19:21 🔗 SketchCow No, twitter would not be worth your effort or the effort of the warrior.
19:21 🔗 omf__ not for everything
19:21 🔗 omf__ just a few hundred
19:21 🔗 SketchCow Yes, but this is literally being done by many others.
19:22 🔗 SketchCow Sexy high-profile site, gets all the downloading and the backing up.
19:22 🔗 omf__ I would like to coordinate with them as to not duplicate effort
19:22 🔗 SketchCow Much more at risk are small communities running vbulletin or sites of people recently dead.
19:22 🔗 SketchCow How can I say this?
19:22 🔗 omf__ I am currently doing the small half life site
19:22 🔗 SketchCow Oh never mind, I did 3 times. Have fun.
19:23 🔗 ersi Self-hosted content is dying an silent death, it's in my opinion a lot more important. We know US LoC gets data from Twitter
19:24 🔗 omf__ SketchCow, you missed my point. I want to make sure they are backing up the things I would, so I do not have to do it. Also a few of these twitter archivers that I know of do not share data because of the TOS
19:24 🔗 SketchCow I totally got the point.
19:24 🔗 SketchCow I have the point.
19:24 🔗 SketchCow I see no reason to fight you. You want to do it, goooooooo nuts.
19:24 🔗 SketchCow Some people like vanilla.
19:25 🔗 omf__ Let me repeat: I want to make sure they are backing up the things I would, so I do not have to do it.
19:25 🔗 omf__ I do not want to do it
19:26 🔗 ersi Then why are you talking about it? If you want to ensure something, start an effort - maybe people tag along
19:28 🔗 Smiley lol
19:28 🔗 omf__ To head back to my original point. I was asking if anyone else is doing it so I can stop doing it
19:29 🔗 Smiley don't trust others to do what you believe should be done.
19:29 🔗 omf__ I would like to contact the others first and see if they would upload it to IA
19:30 🔗 omf__ I have got people to put things up before with a simple email, the normal response is it never dawned on them to back it up
19:30 🔗 Smiley if it hadn't, why would they be here, of all places?
19:32 🔗 omf__ People here know people who are not here who do big data
19:32 🔗 omf__ finding data is like job hunting, you get more through word of mouth than anything else
19:34 🔗 omf__ I got a local non-profit to convert all their tapes to dvd and this year they are going to upload them to IA
19:34 🔗 omf__ they just wanted a place to back them up online and I proposed that solution
19:36 🔗 omf__ plus they have the dvd backups for their library
19:36 🔗 omf__ all of it is news shows from 70s-80s
19:37 🔗 omf__ sorry I went OT
19:44 🔗 SketchCow Someone is contributing roughly 500 CD-ROM images and scans to me. That's happening in another window. More than enough good for the world today.
19:44 🔗 Nemo_bis aww
19:47 🔗 omf__ more shareware? I love that stuff
19:47 🔗 SketchCow Primarily cover discs.
19:48 🔗 omf__ aah from mags
19:48 🔗 SketchCow And the mags.
19:50 🔗 omf__ Do the mags go dark until copyright expires?
19:53 🔗 SketchCow Ask the question again
19:57 🔗 omf__ Do the magazines themselves have to go dark until the copyright on them expires?
19:58 🔗 db48x given the number of other magazines that aren't dark, I'd guess that they don't
19:58 🔗 ersi Unless there's a complaint, etc.
20:01 🔗 omf__ I am just glad we get this stuff
20:48 🔗 godane1 so i have about 30gb of videos from g4tv.com
21:27 🔗 * chronomex omw to portland to pick up some of the zillions of cds that turnkit wanted
21:33 🔗 DFJustin cd-roms fuck yeah
21:33 🔗 chronomex some of, dunno how much I can fit in this vehicle
21:33 🔗 chronomex :P
21:33 🔗 DFJustin O_O
21:34 🔗 chronomex well, penny each
21:34 🔗 SketchCow Truck rental
21:35 🔗 chronomex heh
21:36 🔗 DFJustin I wonder how favourably a carload of CD-ROMs compares with fibre internet
21:36 🔗 SketchCow Seriously. Truck rental.
21:53 🔗 chronomex lady at goodwill doesn't understand indiscriminate purchasing
21:55 🔗 chronomex "give me a shelfload of cdroms, I don't care which"
21:55 🔗 db48x :)
21:55 🔗 chronomex "no, you have to go pick them out on amazon"
22:00 🔗 DrainLbry I've decided to start a no-kill shelter for elderly PCs. My first rescue is a rusty, cosmetically deficient, missing accessories Tandy 1000. He sits nicely on command, is housetrained, needs a little TLC. This guy deserves to live out the rest of his days in a warm and loving home. Does great with cats. Won't you be his forever home?
22:09 🔗 Schbirid :)
22:12 🔗 SketchCow Already running a no-kill shelter
22:26 🔗 Famicoman DrainLbry you basically described my basement
22:38 🔗 omf__ DrainLbry, I do that too
22:38 🔗 omf__ I have a whole 10x10 storage unit full of old computers. It is the only thing I collect
22:38 🔗 omf__ I should get some pics up online
22:38 🔗 omf__ Does it work?
23:46 🔗 bsmith094 hello?
23:46 🔗 no2pencil hi
23:57 🔗 SketchCow whazzzzzuupppp
23:58 🔗 no2pencil happy weekend, SketchCow

irclogger-viewer