Time |
Nickname |
Message |
00:00
π
|
SketchCow |
I suspect I will eventually. |
00:02
π
|
SketchCow |
I mean, make no mistake, I'm getting stuff done all at the same time. This is all work that needs to be done and this room needs to process this material so I can then put it into permanent storage or donate it. |
00:11
π
|
hdevalenc |
hmm, what if you get more drives ? if it takes 90 seconds to rip, and you have nine of them, then you change one CD every ten seconds |
00:11
π
|
hdevalenc |
on the theory that if you're going to be babysitting it, it might as well go as fast as possible |
00:12
π
|
SketchCow |
Right, that's the problem. |
00:12
π
|
SketchCow |
I could start to move into custom solutions, but it gets silly. |
00:12
π
|
dashcloud |
how do you stagger them, and isn't 10 seconds pretty close to what you need to open the drive, take the CD out, pop it back into a case, snap the case shut, get the next one, repeat? |
00:12
π
|
SketchCow |
The fact is, these items already waited a year, if they get delayed over time because that process is running as catch-can all the time, it's cool. |
00:13
π
|
hdevalenc |
dashcloud: they'd be staggered because you're loading them serially |
00:14
π
|
hdevalenc |
load #1, load #2, ... by the time you finish #1 is done, repeat. |
00:16
π
|
SketchCow |
Also, remember, this is with me setting them into a "Ripped" box for later scanning of the labels and CD. |
00:16
π
|
SketchCow |
That's a WHOLE other process. |
00:16
π
|
SketchCow |
If I lived in SF, I could probably get someone to do it. |
00:21
π
|
SketchCow |
See, it's a nice problem to have, but the fundamental issue is rapidly becoming not "do we have the space and bandwidth" for it, but "where do we get the volunteers" |
00:23
π
|
SketchCow |
Also, this drive in this thing is ridiculous |
01:06
π
|
SketchCow |
I changed the captcha on the wiki. |
01:17
π
|
SketchCow |
I'm listening to Tim O'Reilly talk about Digital Preservation and say nothing new for 30 minutes. |
01:18
π
|
SketchCow |
So you don't have to. |
01:19
π
|
BlueMax |
I would probably listen to you talk about preservation and general computer history for six hours, SketchCow. |
01:20
π
|
dashcloud |
but it's good for more people to say things youy've already said- he's a well-known figure, and hopefully can get people thinking and caring about those issues |
01:32
π
|
SketchCow |
Hooray (?) I found another cache of DVDs. |
01:36
π
|
dashcloud |
with flea market season approaching fast, I'm hoping to find many more awesome goodies to get archived |
02:30
π
|
dashcloud |
hi, this item got misnamed somehow: http://archive.org/details/cdrom-riscos-kosovo the correct name is in the item description |
02:36
π
|
DFJustin |
I uploaded that, I put riscos- in front of all the risc os stuff from the piratebay so I could keep track of it |
02:38
π
|
DFJustin |
as for the kosovo part as far as I know that's correct |
02:39
π
|
DFJustin |
somebody apparently thought acorn shovelware would be a great way to raise money for orphans |
02:41
π
|
DFJustin |
take a look at http://archive.org/download/cdrom-riscos-kosovo/KosovoOrphansAppeal.iso/INSTRUCTIONS in your favourite text editor |
02:45
π
|
dashcloud |
I thought that was the wrong name because of this name: Archimedes World Magazine CD1 |
02:45
π
|
dashcloud |
Mostly because it's such an out of place name it's like they were trying to avoid selling any of them |
02:49
π
|
DFJustin |
apparently it did well enough to sell out one pressing http://archive.org/download/cdrom-riscos-kosovo/KosovoOrphansAppeal.iso/2NDEDITION |
02:51
π
|
dashcloud |
no shit |
03:15
π
|
chronomex |
lol |
03:15
π
|
chronomex |
how absurd |
03:28
π
|
dashcloud |
SketchCow: since IA seems to have ABBYY for book OCR, can you re-use that for the labels to generate basic CD descriptions from the case and CD scans? |
03:34
π
|
DFJustin |
the abbyy name still shows in a few places but AIUI it's actually luratech under the hood |
03:37
π
|
dashcloud |
interesting- I'm not familiar with them |
04:37
π
|
Santa-Ine |
Welcokme to dispatchers that steal |
04:57
π
|
Santa-Ine |
For those with scanners dispatchers want to see that concord is the 1st Γ’ΒΒpoliceΓ’ΒΒ station to refuse to let a person get a head in life and if it is federal level etc will stop mail and items in transit and make sure that t here is interception. |
05:01
π
|
RedType |
santa-ine: pretty sure buying body parts is illegal for a good reason bro |
05:02
π
|
RedType |
you should only get a head in life once (your own) |
05:17
π
|
TuckLive |
Why am I getting "rate limited. waiting for 300 seconds" on my Warrior for Yahoo Messages? |
05:18
π
|
TuckLive |
Are they banning IPs? |
05:18
π
|
Samuel_Mi |
It's their special way of telling you you're awesome ;) |
05:18
π
|
TuckLive |
well that's nice of them |
05:19
π
|
Samuel_Mi |
(and 'yes' to both your questions) |
05:20
π
|
Samuel_Mi |
also, #BurnTheMessenger is the channel for questions related to this project |
05:22
π
|
Samuel_Mi |
From that channel: "when you get rate-limited, it waits at least 12 periods of 300 seconds. that's per-thread, and you'll likely get one item done before you get rate-limited" |
05:28
π
|
TuckLive |
gotcha |
06:34
π
|
bolgon |
is any amount of the MIDI content from AOL Composer's Showcase circa 1997 archived anywhere? |
06:37
π
|
bolgon |
also did anyone grab any amount of Digg before they turned to version 3 and wiped all the data ~2.5yrs ago? |
07:35
π
|
gasoline |
hi peeps |
10:33
π
|
ersi |
Nooo, my town university is killing off all the student home pages |
10:46
π
|
chronomex |
noooooooooo |
10:46
π
|
chronomex |
fuck why do they do that |
10:52
π
|
godane |
mirror that |
10:56
π
|
ersi |
Because they're "modernizing" :( |
10:56
π
|
ersi |
at least they have a great main index, so no username crawling needed |
11:23
π
|
godane |
so i'm uploading more g4 videos |
11:23
π
|
godane |
wish i could do it at 5MB a second |
11:23
π
|
godane |
only cause i got like over a TB of videos |
11:42
π
|
godane |
also i found more high res edge magazines scans |
11:43
π
|
godane |
its from a different guy this time |
11:43
π
|
godane |
also i will upload the 150dpi rips i go since its more of a complete set from 1995 to 2007 of edge magazine |
12:07
π
|
godane |
so i got a the trailer of the new slient hill movie |
12:08
π
|
godane |
thanks to g4tv.com |
12:08
π
|
godane |
and in hd too |
12:57
π
|
omf_ |
3gb left on 4data |
13:30
π
|
Smiley |
\o/ |
14:21
π
|
omf_ |
T-minus 1gb and counting. |
14:33
π
|
Smiley |
o_O |
14:43
π
|
omf_ |
It is done. 103gb spread over 380,000 images |
14:44
π
|
omf_ |
Another successful save |
14:55
π
|
SketchCow |
Yeah, come on. Everyone in the channel, get a warrior running. |
14:55
π
|
SketchCow |
It's going to be too close. |
14:56
π
|
SketchCow |
Are we all blocked? The tracker has, like, no scrolling. |
14:59
π
|
DrDeke |
you do? |
15:17
π
|
balrog_ |
SketchCow: yahoo's blocking is a lot more aggressive than that of posterous. |
15:19
π
|
DrDeke |
is it per-IP? |
15:19
π
|
balrog_ |
DrDeke: I believe so |
15:19
π
|
balrog_ |
but I'm not 100% sure |
15:23
π
|
DrDeke |
any idea how many concurrent we should run? |
15:23
π
|
DrDeke |
as in, will somethign < 6 help prevent limiting |
15:23
π
|
GLaDOS |
People have been getting banned running 1 thread |
15:23
π
|
DFJustin |
I get limiting with just 2, I don't think it helps at all |
15:23
π
|
balrog_ |
I'm limited with 1 after 10 minutes or so |
15:26
π
|
DrDeke |
hm |
15:27
π
|
GLaDOS |
We need to find what the connection limit is before banning occurs |
15:27
π
|
DrDeke |
yeah |
15:27
π
|
GLaDOS |
Then we can sit 1 below that with the User Agent of "Fuck your scripts, we're Archive Team" |
15:27
π
|
Smiley |
;) |
15:28
π
|
GLaDOS |
Anyway |
15:28
π
|
* |
GLaDOS pushes everyone into #BurnTheMessenger |
15:28
π
|
Smiley |
guys, as no one is looking in the other channel, anyone know the ID of the AMI for the warrior on EC2? The old (original) one I have seems to not exist anymore. |
15:28
π
|
Smiley |
I'll happily fire up a few instances if only I had a working system :D |
15:28
π
|
DrDeke |
i have an AMI that i created myself which doesn't use the warrior, and pretty much rapes posterous (sorry) |
15:28
π
|
GLaDOS |
Smiley: remember that dedi that I gave details to? |
15:28
π
|
DrDeke |
i could make it public or add you to its ACL if you want |
15:28
π
|
DrDeke |
but nothing for yahoo yet |
15:28
π
|
Smiley |
GLaDOS: yes, but I don't know how to setup the seesaw yet ;) |
15:29
π
|
balrog_ |
setting up seesaw is easy, but for this you'll need tons of IPs |
15:29
π
|
Smiley |
right |
15:29
π
|
GLaDOS |
apt-get install python-pip; pip install seesaw |
15:29
π
|
Smiley |
#burnthemessenger !!!! |
15:29
π
|
GLaDOS |
Smiley ^ |
15:43
π
|
ersi |
------------------------------------------ |
15:43
π
|
ersi |
#BurnTheMessenger - Yahoo! Messages needs to be archived. Please visit the project channel and/or start the project in your warriors. |
15:43
π
|
ersi |
------------------------------------------ |
15:44
π
|
GLaDOS |
04,01What ersi said |
15:45
π
|
DFJustin |
1,8Γ’Β€ÒΒΒ’Γ’Β€ÒΒΒ’Γ’Β€ÒΒΒ’ ALART ALART ALART Γ’Β€ÒΒΒ’Γ’Β€ÒΒΒ’Γ’Β€ÒΒΒ’ |
15:47
π
|
omf_ |
How come no one gets this excited over a project that has announced it will close but no official date set? |
15:47
π
|
ersi |
god damn it, that was clear enough |
15:47
π
|
ersi |
omf_: It's Yahoo! and they suck |
15:47
π
|
omf_ |
That could literally be off tomorrow |
15:47
π
|
Smiley |
742MPH, WE DON'T NEED TO SAY ANYMORE. |
15:48
π
|
GLaDOS |
02n03e04e05ds 06t07o 08b09e 10fa11b12u13l07o08u09s |
15:48
π
|
ersi |
Please try to keep this channel A) On-topic B) As low-traffic as possible C) Low-noise |
15:48
π
|
ersi |
Stop with the damn colour things. Take that to #archiveteam-bs |
15:49
π
|
ersi |
It distracts. |
15:49
π
|
SketchCow |
It's meant to distract |
15:49
π
|
ersi |
Thanks. |
15:49
π
|
SketchCow |
We're waking up the gang. |
15:49
π
|
SketchCow |
We have 100 people in the channel, many are idle. |
15:49
π
|
SketchCow |
Less idle now! |
15:50
π
|
ersi |
They'll surely see it if we make them scroll! |
15:50
π
|
SketchCow |
I am not going to agree with your position on this! |
15:53
π
|
SketchCow |
Just did an interview with CBC about posterous |
15:53
π
|
SketchCow |
And shitty monitors! |
15:55
π
|
no2pencil |
I try not to talk too much, don't want to piss people off :P |
15:55
π
|
SketchCow |
^^^^ A thing I have never said |
15:56
π
|
no2pencil |
...well to be more specific, I meant in here |
15:56
π
|
no2pencil |
my normal on-line behavior is carefree of who it disturbs |
15:57
π
|
no2pencil |
so cbc, this is the Canadian Broadcasting Channel? |
15:57
π
|
no2pencil |
Was one of my favorite cable channels growing up. |
15:57
π
|
DrDeke |
CBC is pretty great |
15:58
π
|
no2pencil |
Kids in the hall uncensored vs Commedy central |
15:58
π
|
godane |
can anyone find more g4tv.com xml data? |
15:58
π
|
godane |
i'm trying to see if there is somethng hiding in google but not sure if i can find it there |
17:23
π
|
SketchCow |
This is Spark at CBC |
17:23
π
|
SketchCow |
They've talked with me before |
17:23
π
|
SketchCow |
Posterous got some attention |
17:32
π
|
SketchCow |
Listening to the Q&A of the 2011 Tim O'Reilly speech. |
17:32
π
|
SketchCow |
In it, Stanford bemoans how nobody is saving the source repositories. |
17:32
π
|
SketchCow |
We're doing it, as far as I know. |
17:32
π
|
SketchCow |
I can't overstate how Archive Team is completely in the forefront of this horseshit |
17:33
π
|
SketchCow |
ha ha, some toolbag asking a question about "why do we need to save all this" |
17:33
π
|
* |
SketchCow gets archery equipment |
17:36
π
|
omf_ |
We need to get you a nice pocket sized crossbow |
17:36
π
|
omf_ |
with poison bolts |
17:37
π
|
soultcer |
Archery is actually a very relaxing activity |
17:38
π
|
SketchCow |
It'd make my presentations better |
17:38
π
|
SketchCow |
ssssss THOOOOOOOON |
18:17
π
|
paulv |
hey, I've got a linux machine in the IA's friends and family rack. I can't run virtualbox on it, tho. how can I help with the yahoo messages? |
18:21
π
|
DrDeke |
you could run this or some variant of it: http://pastebin.com/CarmqNrt |
18:22
π
|
DrDeke |
you might want to remove the screen part (or you might not, depends) |
18:22
π
|
DrDeke |
also you *might* need to get rid of --concurrent 2 if you don't want to get rate limited |
18:22
π
|
DrDeke |
that is not entirely clear at this point |
18:26
π
|
akkuhn |
is there any way to utilize google's caches of some of the yahoo messages? example: https://bitly.com/11qet7N |
18:26
π
|
akkuhn |
i picked a few at random, most weren't cached, some were. |
18:27
π
|
akkuhn |
http://webcache.googleusercontent.com/search?q=cache:http://example.com is apparently format to grab a cached copy via a direct url |
18:50
π
|
chronomex |
Yeah and they ban cacherippers pretty fast too iirc |
19:22
π
|
polpo |
any way to get the warrior to listen on a port other than 8001? i'm already using that one |
19:24
π
|
ersi |
Good question |
19:26
π
|
polpo |
it's not super important, i can change the port of the other service on my machine that's listening on 8001 instead |
19:27
π
|
ersi |
Looking into it - I know the underlaying scripts have parameters to change the bind/listen port |
19:30
π
|
jk[SVP] |
It can be changed in the network adapter settings, under advanced, port forwarding |
19:30
π
|
polpo |
aha, i see that |
19:31
π
|
polpo |
brilliant, didn't even have to restart the VM |
19:31
π
|
polpo |
thanks |
20:15
π
|
grawity |
Say, how often does the warrior upload the pages it has downloaded? |
20:17
π
|
ersi |
grawity: Project? Yahoo! Messages? Posterous? |
20:19
π
|
grawity |
Yahoo Messages... Hit the rate limit after ~200 URLs, that's very little but I don't want to accidentally discard those anyway. |
20:21
π
|
ersi |
grawity: The script will sleep for 300 seconds and try again - you need to complete the whole Item before it'll be uploaded |
20:22
π
|
grawity |
Ah, okay |
20:22
π
|
ersi |
Feel free to join #BurnTheMessenger by the way, it's the project channel for archiving "Yahoo! Messages" |
20:22
π
|
ersi |
and feel free to hang around in general ^_^ |
21:03
π
|
neurophyr |
hello - is there any way to specify a SOCKS proxy to the archive warrior or otherwise route all traffic through Tor, or through a Tor bridge? |
21:08
π
|
ersi |
Maybe - not documented/guide written though |
21:08
π
|
ersi |
It's probably very doable though |
21:09
π
|
alard |
neurophyr: I think wget listens to the http_proxy environment variable. |
21:09
π
|
ersi |
yeah, it does |
21:13
π
|
alard |
SketchCow: Thanks, I've corrected the Punchfork index Name / Date thing now. http://archive.org/download/archiveteam_punchfork_index/ |
21:14
π
|
neurophyr |
ah okay so i can log into the warrior. is there documentation on how? |
21:14
π
|
alard |
Alt+F3 |
21:14
π
|
ersi |
User: root Password: archiveteam |
21:15
π
|
alard |
Setting the http_proxy variable might be more difficult. Perhaps you should set it in the /home/warrior/.bashrc ? |
21:15
π
|
neurophyr |
wonderful, thank you. yeah i am just trying to get around bans (assuming tor exits aren't banned) and have a couple relays... |
21:15
π
|
neurophyr |
tor by default presents a SOCKS proxy |
21:18
π
|
neurophyr |
i'll head back w/questions if i can't get it working. thanks for running this project, just heard of it today :) |
21:18
π
|
ersi |
np :) |
21:18
π
|
ersi |
feel free to stay around anytime |
21:47
π
|
bowman__ |
why does my warrior download new URLs after I've told it to stop? x) |
21:48
π
|
alard |
bowman__: It finishes the task it's currently working on. |
21:49
π
|
bowman__ |
alard: ah kk so I suppose I'd better leave it alone until it's done |
21:55
π
|
alard |
Yes. |
21:55
π
|
chronomex |
yup |
21:55
π
|
tef |
alard: did you have modifications to warctools btw |
21:55
π
|
alard |
tef: Did I? |
21:58
π
|
alard |
tef: Not that I know of. I checked my only three repositories with a hanzo/warctools directory (warc-proxy, warctozip and warctozip-service), but there don't seem to be any changes there. |
21:59
π
|
tef |
ah, I also mean things like warctozip |
21:59
π
|
tef |
talked to IA today, got a github page |
21:59
π
|
tef |
https://github.com/internetarchive/warctools/ |
21:59
π
|
tef |
going to push stuff there and start merging things too |
22:00
π
|
alard |
Ah. |
22:00
π
|
tef |
i.e abandon hg \o/ |