#archiveteam 2013-03-22,Fri

↑back Search

Time Nickname Message
00:00 πŸ”— SketchCow I suspect I will eventually.
00:02 πŸ”— SketchCow I mean, make no mistake, I'm getting stuff done all at the same time. This is all work that needs to be done and this room needs to process this material so I can then put it into permanent storage or donate it.
00:11 πŸ”— hdevalenc hmm, what if you get more drives ? if it takes 90 seconds to rip, and you have nine of them, then you change one CD every ten seconds
00:11 πŸ”— hdevalenc on the theory that if you're going to be babysitting it, it might as well go as fast as possible
00:12 πŸ”— SketchCow Right, that's the problem.
00:12 πŸ”— SketchCow I could start to move into custom solutions, but it gets silly.
00:12 πŸ”— dashcloud how do you stagger them, and isn't 10 seconds pretty close to what you need to open the drive, take the CD out, pop it back into a case, snap the case shut, get the next one, repeat?
00:12 πŸ”— SketchCow The fact is, these items already waited a year, if they get delayed over time because that process is running as catch-can all the time, it's cool.
00:13 πŸ”— hdevalenc dashcloud: they'd be staggered because you're loading them serially
00:14 πŸ”— hdevalenc load #1, load #2, ... by the time you finish #1 is done, repeat.
00:16 πŸ”— SketchCow Also, remember, this is with me setting them into a "Ripped" box for later scanning of the labels and CD.
00:16 πŸ”— SketchCow That's a WHOLE other process.
00:16 πŸ”— SketchCow If I lived in SF, I could probably get someone to do it.
00:21 πŸ”— SketchCow See, it's a nice problem to have, but the fundamental issue is rapidly becoming not "do we have the space and bandwidth" for it, but "where do we get the volunteers"
00:23 πŸ”— SketchCow Also, this drive in this thing is ridiculous
01:06 πŸ”— SketchCow I changed the captcha on the wiki.
01:17 πŸ”— SketchCow I'm listening to Tim O'Reilly talk about Digital Preservation and say nothing new for 30 minutes.
01:18 πŸ”— SketchCow So you don't have to.
01:19 πŸ”— BlueMax I would probably listen to you talk about preservation and general computer history for six hours, SketchCow.
01:20 πŸ”— dashcloud but it's good for more people to say things youy've already said- he's a well-known figure, and hopefully can get people thinking and caring about those issues
01:32 πŸ”— SketchCow Hooray (?) I found another cache of DVDs.
01:36 πŸ”— dashcloud with flea market season approaching fast, I'm hoping to find many more awesome goodies to get archived
02:30 πŸ”— dashcloud hi, this item got misnamed somehow: http://archive.org/details/cdrom-riscos-kosovo the correct name is in the item description
02:36 πŸ”— DFJustin I uploaded that, I put riscos- in front of all the risc os stuff from the piratebay so I could keep track of it
02:38 πŸ”— DFJustin as for the kosovo part as far as I know that's correct
02:39 πŸ”— DFJustin somebody apparently thought acorn shovelware would be a great way to raise money for orphans
02:41 πŸ”— DFJustin take a look at http://archive.org/download/cdrom-riscos-kosovo/KosovoOrphansAppeal.iso/INSTRUCTIONS in your favourite text editor
02:45 πŸ”— dashcloud I thought that was the wrong name because of this name: Archimedes World Magazine CD1
02:45 πŸ”— dashcloud Mostly because it's such an out of place name it's like they were trying to avoid selling any of them
02:49 πŸ”— DFJustin apparently it did well enough to sell out one pressing http://archive.org/download/cdrom-riscos-kosovo/KosovoOrphansAppeal.iso/2NDEDITION
02:51 πŸ”— dashcloud no shit
03:15 πŸ”— chronomex lol
03:15 πŸ”— chronomex how absurd
03:28 πŸ”— dashcloud SketchCow: since IA seems to have ABBYY for book OCR, can you re-use that for the labels to generate basic CD descriptions from the case and CD scans?
03:34 πŸ”— DFJustin the abbyy name still shows in a few places but AIUI it's actually luratech under the hood
03:37 πŸ”— dashcloud interesting- I'm not familiar with them
04:37 πŸ”— Santa-Ine Welcokme to dispatchers that steal
04:57 πŸ”— Santa-Ine For those with scanners dispatchers want to see that concord is the 1st Γ’Β€ΒœpoliceҀ station to refuse to let a person get a head in life and if it is federal level etc will stop mail and items in transit and make sure that t here is interception.
05:01 πŸ”— RedType santa-ine: pretty sure buying body parts is illegal for a good reason bro
05:02 πŸ”— RedType you should only get a head in life once (your own)
05:17 πŸ”— TuckLive Why am I getting "rate limited. waiting for 300 seconds" on my Warrior for Yahoo Messages?
05:18 πŸ”— TuckLive Are they banning IPs?
05:18 πŸ”— Samuel_Mi It's their special way of telling you you're awesome ;)
05:18 πŸ”— TuckLive well that's nice of them
05:19 πŸ”— Samuel_Mi (and 'yes' to both your questions)
05:20 πŸ”— Samuel_Mi also, #BurnTheMessenger is the channel for questions related to this project
05:22 πŸ”— Samuel_Mi From that channel: "when you get rate-limited, it waits at least 12 periods of 300 seconds. that's per-thread, and you'll likely get one item done before you get rate-limited"
05:28 πŸ”— TuckLive gotcha
06:34 πŸ”— bolgon is any amount of the MIDI content from AOL Composer's Showcase circa 1997 archived anywhere?
06:37 πŸ”— bolgon also did anyone grab any amount of Digg before they turned to version 3 and wiped all the data ~2.5yrs ago?
07:35 πŸ”— gasoline hi peeps
10:33 πŸ”— ersi Nooo, my town university is killing off all the student home pages
10:46 πŸ”— chronomex noooooooooo
10:46 πŸ”— chronomex fuck why do they do that
10:52 πŸ”— godane mirror that
10:56 πŸ”— ersi Because they're "modernizing" :(
10:56 πŸ”— ersi at least they have a great main index, so no username crawling needed
11:23 πŸ”— godane so i'm uploading more g4 videos
11:23 πŸ”— godane wish i could do it at 5MB a second
11:23 πŸ”— godane only cause i got like over a TB of videos
11:42 πŸ”— godane also i found more high res edge magazines scans
11:43 πŸ”— godane its from a different guy this time
11:43 πŸ”— godane also i will upload the 150dpi rips i go since its more of a complete set from 1995 to 2007 of edge magazine
12:07 πŸ”— godane so i got a the trailer of the new slient hill movie
12:08 πŸ”— godane thanks to g4tv.com
12:08 πŸ”— godane and in hd too
12:57 πŸ”— omf_ 3gb left on 4data
13:30 πŸ”— Smiley \o/
14:21 πŸ”— omf_ T-minus 1gb and counting.
14:33 πŸ”— Smiley o_O
14:43 πŸ”— omf_ It is done. 103gb spread over 380,000 images
14:44 πŸ”— omf_ Another successful save
14:55 πŸ”— SketchCow Yeah, come on. Everyone in the channel, get a warrior running.
14:55 πŸ”— SketchCow It's going to be too close.
14:56 πŸ”— SketchCow Are we all blocked? The tracker has, like, no scrolling.
14:59 πŸ”— DrDeke you do?
15:17 πŸ”— balrog_ SketchCow: yahoo's blocking is a lot more aggressive than that of posterous.
15:19 πŸ”— DrDeke is it per-IP?
15:19 πŸ”— balrog_ DrDeke: I believe so
15:19 πŸ”— balrog_ but I'm not 100% sure
15:23 πŸ”— DrDeke any idea how many concurrent we should run?
15:23 πŸ”— DrDeke as in, will somethign < 6 help prevent limiting
15:23 πŸ”— GLaDOS People have been getting banned running 1 thread
15:23 πŸ”— DFJustin I get limiting with just 2, I don't think it helps at all
15:23 πŸ”— balrog_ I'm limited with 1 after 10 minutes or so
15:26 πŸ”— DrDeke hm
15:27 πŸ”— GLaDOS We need to find what the connection limit is before banning occurs
15:27 πŸ”— DrDeke yeah
15:27 πŸ”— GLaDOS Then we can sit 1 below that with the User Agent of "Fuck your scripts, we're Archive Team"
15:27 πŸ”— Smiley ;)
15:28 πŸ”— GLaDOS Anyway
15:28 πŸ”— * GLaDOS pushes everyone into #BurnTheMessenger
15:28 πŸ”— Smiley guys, as no one is looking in the other channel, anyone know the ID of the AMI for the warrior on EC2? The old (original) one I have seems to not exist anymore.
15:28 πŸ”— Smiley I'll happily fire up a few instances if only I had a working system :D
15:28 πŸ”— DrDeke i have an AMI that i created myself which doesn't use the warrior, and pretty much rapes posterous (sorry)
15:28 πŸ”— GLaDOS Smiley: remember that dedi that I gave details to?
15:28 πŸ”— DrDeke i could make it public or add you to its ACL if you want
15:28 πŸ”— DrDeke but nothing for yahoo yet
15:28 πŸ”— Smiley GLaDOS: yes, but I don't know how to setup the seesaw yet ;)
15:29 πŸ”— balrog_ setting up seesaw is easy, but for this you'll need tons of IPs
15:29 πŸ”— Smiley right
15:29 πŸ”— GLaDOS apt-get install python-pip; pip install seesaw
15:29 πŸ”— Smiley #burnthemessenger !!!!
15:29 πŸ”— GLaDOS Smiley ^
15:43 πŸ”— ersi ------------------------------------------
15:43 πŸ”— ersi #BurnTheMessenger - Yahoo! Messages needs to be archived. Please visit the project channel and/or start the project in your warriors.
15:43 πŸ”— ersi ------------------------------------------
15:44 πŸ”— GLaDOS 04,01What ersi said
15:45 πŸ”— DFJustin 1,8җ€Ò—’Ò—€Ò—’Ò—€Ò—’ ALART ALART ALART җ€Ò—’Ò—€Ò—’Ò—€Ò—’
15:47 πŸ”— omf_ How come no one gets this excited over a project that has announced it will close but no official date set?
15:47 πŸ”— ersi god damn it, that was clear enough
15:47 πŸ”— ersi omf_: It's Yahoo! and they suck
15:47 πŸ”— omf_ That could literally be off tomorrow
15:47 πŸ”— Smiley 742MPH, WE DON'T NEED TO SAY ANYMORE.
15:48 πŸ”— GLaDOS 02n03e04e05ds 06t07o 08b09e 10fa11b12u13l07o08u09s
15:48 πŸ”— ersi Please try to keep this channel A) On-topic B) As low-traffic as possible C) Low-noise
15:48 πŸ”— ersi Stop with the damn colour things. Take that to #archiveteam-bs
15:49 πŸ”— ersi It distracts.
15:49 πŸ”— SketchCow It's meant to distract
15:49 πŸ”— ersi Thanks.
15:49 πŸ”— SketchCow We're waking up the gang.
15:49 πŸ”— SketchCow We have 100 people in the channel, many are idle.
15:49 πŸ”— SketchCow Less idle now!
15:50 πŸ”— ersi They'll surely see it if we make them scroll!
15:50 πŸ”— SketchCow I am not going to agree with your position on this!
15:53 πŸ”— SketchCow Just did an interview with CBC about posterous
15:53 πŸ”— SketchCow And shitty monitors!
15:55 πŸ”— no2pencil I try not to talk too much, don't want to piss people off :P
15:55 πŸ”— SketchCow ^^^^ A thing I have never said
15:56 πŸ”— no2pencil ...well to be more specific, I meant in here
15:56 πŸ”— no2pencil my normal on-line behavior is carefree of who it disturbs
15:57 πŸ”— no2pencil so cbc, this is the Canadian Broadcasting Channel?
15:57 πŸ”— no2pencil Was one of my favorite cable channels growing up.
15:57 πŸ”— DrDeke CBC is pretty great
15:58 πŸ”— no2pencil Kids in the hall uncensored vs Commedy central
15:58 πŸ”— godane can anyone find more g4tv.com xml data?
15:58 πŸ”— godane i'm trying to see if there is somethng hiding in google but not sure if i can find it there
17:23 πŸ”— SketchCow This is Spark at CBC
17:23 πŸ”— SketchCow They've talked with me before
17:23 πŸ”— SketchCow Posterous got some attention
17:32 πŸ”— SketchCow Listening to the Q&A of the 2011 Tim O'Reilly speech.
17:32 πŸ”— SketchCow In it, Stanford bemoans how nobody is saving the source repositories.
17:32 πŸ”— SketchCow We're doing it, as far as I know.
17:32 πŸ”— SketchCow I can't overstate how Archive Team is completely in the forefront of this horseshit
17:33 πŸ”— SketchCow ha ha, some toolbag asking a question about "why do we need to save all this"
17:33 πŸ”— * SketchCow gets archery equipment
17:36 πŸ”— omf_ We need to get you a nice pocket sized crossbow
17:36 πŸ”— omf_ with poison bolts
17:37 πŸ”— soultcer Archery is actually a very relaxing activity
17:38 πŸ”— SketchCow It'd make my presentations better
17:38 πŸ”— SketchCow ssssss THOOOOOOOON
18:17 πŸ”— paulv hey, I've got a linux machine in the IA's friends and family rack. I can't run virtualbox on it, tho. how can I help with the yahoo messages?
18:21 πŸ”— DrDeke you could run this or some variant of it: http://pastebin.com/CarmqNrt
18:22 πŸ”— DrDeke you might want to remove the screen part (or you might not, depends)
18:22 πŸ”— DrDeke also you *might* need to get rid of --concurrent 2 if you don't want to get rate limited
18:22 πŸ”— DrDeke that is not entirely clear at this point
18:26 πŸ”— akkuhn is there any way to utilize google's caches of some of the yahoo messages? example: https://bitly.com/11qet7N
18:26 πŸ”— akkuhn i picked a few at random, most weren't cached, some were.
18:27 πŸ”— akkuhn http://webcache.googleusercontent.com/search?q=cache:http://example.com is apparently format to grab a cached copy via a direct url
18:50 πŸ”— chronomex Yeah and they ban cacherippers pretty fast too iirc
19:22 πŸ”— polpo any way to get the warrior to listen on a port other than 8001? i'm already using that one
19:24 πŸ”— ersi Good question
19:26 πŸ”— polpo it's not super important, i can change the port of the other service on my machine that's listening on 8001 instead
19:27 πŸ”— ersi Looking into it - I know the underlaying scripts have parameters to change the bind/listen port
19:30 πŸ”— jk[SVP] It can be changed in the network adapter settings, under advanced, port forwarding
19:30 πŸ”— polpo aha, i see that
19:31 πŸ”— polpo brilliant, didn't even have to restart the VM
19:31 πŸ”— polpo thanks
20:15 πŸ”— grawity Say, how often does the warrior upload the pages it has downloaded?
20:17 πŸ”— ersi grawity: Project? Yahoo! Messages? Posterous?
20:19 πŸ”— grawity Yahoo Messages... Hit the rate limit after ~200 URLs, that's very little but I don't want to accidentally discard those anyway.
20:21 πŸ”— ersi grawity: The script will sleep for 300 seconds and try again - you need to complete the whole Item before it'll be uploaded
20:22 πŸ”— grawity Ah, okay
20:22 πŸ”— ersi Feel free to join #BurnTheMessenger by the way, it's the project channel for archiving "Yahoo! Messages"
20:22 πŸ”— ersi and feel free to hang around in general ^_^
21:03 πŸ”— neurophyr hello - is there any way to specify a SOCKS proxy to the archive warrior or otherwise route all traffic through Tor, or through a Tor bridge?
21:08 πŸ”— ersi Maybe - not documented/guide written though
21:08 πŸ”— ersi It's probably very doable though
21:09 πŸ”— alard neurophyr: I think wget listens to the http_proxy environment variable.
21:09 πŸ”— ersi yeah, it does
21:13 πŸ”— alard SketchCow: Thanks, I've corrected the Punchfork index Name / Date thing now. http://archive.org/download/archiveteam_punchfork_index/
21:14 πŸ”— neurophyr ah okay so i can log into the warrior. is there documentation on how?
21:14 πŸ”— alard Alt+F3
21:14 πŸ”— ersi User: root Password: archiveteam
21:15 πŸ”— alard Setting the http_proxy variable might be more difficult. Perhaps you should set it in the /home/warrior/.bashrc ?
21:15 πŸ”— neurophyr wonderful, thank you. yeah i am just trying to get around bans (assuming tor exits aren't banned) and have a couple relays...
21:15 πŸ”— neurophyr tor by default presents a SOCKS proxy
21:18 πŸ”— neurophyr i'll head back w/questions if i can't get it working. thanks for running this project, just heard of it today :)
21:18 πŸ”— ersi np :)
21:18 πŸ”— ersi feel free to stay around anytime
21:47 πŸ”— bowman__ why does my warrior download new URLs after I've told it to stop? x)
21:48 πŸ”— alard bowman__: It finishes the task it's currently working on.
21:49 πŸ”— bowman__ alard: ah kk so I suppose I'd better leave it alone until it's done
21:55 πŸ”— alard Yes.
21:55 πŸ”— chronomex yup
21:55 πŸ”— tef alard: did you have modifications to warctools btw
21:55 πŸ”— alard tef: Did I?
21:58 πŸ”— alard tef: Not that I know of. I checked my only three repositories with a hanzo/warctools directory (warc-proxy, warctozip and warctozip-service), but there don't seem to be any changes there.
21:59 πŸ”— tef ah, I also mean things like warctozip
21:59 πŸ”— tef talked to IA today, got a github page
21:59 πŸ”— tef https://github.com/internetarchive/warctools/
21:59 πŸ”— tef going to push stuff there and start merging things too
22:00 πŸ”— alard Ah.
22:00 πŸ”— tef i.e abandon hg \o/

irclogger-viewer