Time |
Nickname |
Message |
00:02
π
|
godane |
hey Famicoman |
08:44
π
|
godane |
http://www.obviouswinner.com/obvwin/2013/4/26/batman-dude-builds-himself-150000-secret-basement-batcave.html |
08:44
π
|
godane |
Batman Dude Builds Himself $150,000 Secret BasementΓΒ Batcave |
08:45
π
|
godane |
sorry didn't know this was not #-bs |
14:25
π
|
SketchCow |
Oh goddamnit |
14:25
π
|
SketchCow |
People are telling me/us about sites WAY too late. |
14:26
π
|
SketchCow |
I realize it's redundant, but I'm going to try grabbing a site, who else wants it? |
14:26
π
|
SketchCow |
streetfiles.org |
14:28
π
|
andy0 |
I'm currently hopping IPs to be nearly the only posterous downloader currently |
14:29
π
|
andy0 |
what can I do for streetfiles? I can spin up debian VM's |
14:29
π
|
andy0 |
on VPS with limited HD space to retransmit or larger to save the files |
14:32
π
|
SketchCow |
I don't know how big it is. |
14:32
π
|
SketchCow |
I have 2tb set aside |
14:33
π
|
andy000 |
(back after another IP flip) |
14:41
π
|
SilSte |
Hi |
14:41
π
|
SilSte |
how may I download several Webpages simultainoiusly? |
14:41
π
|
SilSte |
at the moment I'm running 3 VMs :/ |
14:42
π
|
SilSte |
or is there another way to squeeze more out of the box? ;-) |
14:47
π
|
RedType |
fork() |
14:48
π
|
RedType |
you could try running multiple instances of your downloader |
14:48
π
|
SilSte |
atm I'm running 3 seperate VMs .... i can only control the first one because of the ports :/ |
14:49
π
|
SilSte |
is it possible to simply change a configfile? |
15:09
π
|
flaushy |
SketchCow: is for streetfiles a script available? don't have much spare gbs here, but could give bandwidth |
15:11
π
|
SketchCow |
No, no worries. |
15:11
π
|
SketchCow |
I'll do it and another team member will do it. |
15:12
π
|
flaushy |
aweseom :) |
15:12
π
|
alard |
SketchCow: Want to make Streetfiles a warrior thing? |
15:13
π
|
omf_ |
All the photo pages are just increasing numbers, just need to find the start point |
15:15
π
|
omf_ |
well that was simple, it starts at 1 |
15:15
π
|
omf_ |
http://streetfiles.org/photos/detail/1 |
15:17
π
|
SketchCow |
Yes |
15:17
π
|
SketchCow |
alard: Yes, sure, why not. |
15:17
π
|
SketchCow |
alard: I'm more concerned about nwnet and the related site - can that go warrior? All of them die on the 30th |
15:18
π
|
omf_ |
Yes it can, someone was brute forcing it to find more urls |
15:18
π
|
alard |
Streetfiles: I've almost finished a lua script that makes a per-user package. Testing it now |
15:19
π
|
alard |
nwnet: Yes, what we need is a list of urls to feed to wget, and a wget command. |
15:20
π
|
SilSte |
I'm having a 2 servers which have nothing to do... if you provide me a vm I can spend as much bandwidth as u want... |
15:20
π
|
SilSte |
and the servers are capable of ;-) |
15:20
π
|
SketchCow |
Someone was supposed to take a dictionary attack and someone was supposed to do a google dictionary attack |
15:21
π
|
SmileyG |
for x in {1..10}; do wget .... $x; done |
15:21
π
|
omf_ |
dashcloud, was working on it |
15:22
π
|
flaushy |
btw is archiveteam present at OHM2013? |
15:28
π
|
SketchCow |
No. |
15:32
π
|
omf_ |
Uploaded the first 100gb of my linux iso archive. Here is one release version http://archive.org/details/opensuse-10.2_release |
15:35
π
|
SketchCow |
Looks fun, though. |
15:36
π
|
SketchCow |
OK, omf_ |
15:36
π
|
SketchCow |
First, set it as software, not texts. |
15:36
π
|
omf_ |
I knew I was forgetting something |
15:36
π
|
SketchCow |
Next, let me make a collection. |
15:37
π
|
SketchCow |
These will all be ISOs? |
15:37
π
|
omf_ |
some of the older ones are floppies |
15:37
π
|
omf_ |
and tars of code |
15:38
π
|
SketchCow |
OK. |
15:38
π
|
SketchCow |
How big is this again, just for trivia's sake? |
15:38
π
|
omf_ |
3tb at last estimate |
15:38
π
|
omf_ |
I got boxes and boxes of drives |
15:39
π
|
SketchCow |
OK, easy enough. |
15:39
π
|
SketchCow |
Keep uploading. |
15:39
π
|
SketchCow |
I have gotten back all the data from the dead FOS machine. |
15:42
π
|
SketchCow |
I'm in the process of getting all the scripts and stuff I had there working, when when I do, stuff will come back, including collection making, and I can put your stuff into the collection. |
15:42
π
|
SmileyG |
whoop. |
15:42
π
|
SketchCow |
Yeah, we didn't have much data loss if any. |
15:42
π
|
SmileyG |
SketchCow: do you need notifying about all the warc's we are uploading for the ign/gamespy grab? They are all tagged archiveteam anyway. |
15:42
π
|
SketchCow |
No. |
15:43
π
|
SketchCow |
No, but I will be doing massive bombing runs of cleaning up what we have. |
15:44
π
|
omf_ |
I found I have some freebsd and openbsd isos as well. I guess the collection title should be "Open Source Operating Systems"? or something better? I am terrible at naming things |
15:44
π
|
SketchCow |
Yes |
15:44
π
|
SketchCow |
I'll be approaching that. |
15:44
π
|
SketchCow |
IT'll e nice, it's an amazing collection. |
15:45
π
|
SketchCow |
Obviously, we have the ones that Walnut Creek did, but I consider this one to be a separate set, and one which might have some redundancies but I'd rather that they e redundant. |
15:45
π
|
SketchCow |
(Sorry, b key sticks on this laptop) |
16:24
π
|
alard |
Now on a warrior near you: https://github.com/ArchiveTeam/streetfiles-grab |
16:25
π
|
* |
Baljem hops - beats watching sweet FA happen on posterous :/ |
16:27
π
|
alard |
Still looking for lists of usernames, though. |
16:31
π
|
frame_at |
I'm really impressed how much the warrior has evolved. New project just showed up, seamless switch. Great software. |
16:36
π
|
Baljem |
heh. mine appears to be dead. time to open the other laptop and poke the VM host... |
16:38
π
|
Baljem |
oh, no, that was my fault. I think I clicked the 'stop' button by mistake (!), as it's powered itself off. doh |
16:38
π
|
SketchCow |
11G simtelnet.bu.mirror.2013.04.zip |
16:38
π
|
SketchCow |
root@teamarchive-0:/1/SIMTELNET/ftp.bu.edu/mirrors# du -sh sim*.zip |
16:46
π
|
omf_ |
SketchCow, I cannot change the mediatype of the items I already uploaded but all the new ones are software |
16:52
π
|
SilSte |
Featurerequest for the warrior: What about an option to participate at multiple projects at the same time? |
17:03
π
|
omf_ |
2 days left, no idea how big it is, lets save some ART - http://tracker.archiveteam.org/streetfiles/ |
17:11
π
|
Baljem |
hmm - something strange just happened on one of my streetfiles tasks - 'westberlinoldschool' I think was the name |
17:12
π
|
Baljem |
it had downloaded something like 3000 URLs, then wget quit with exit code 4, I saw it 'waiting for 10 seconds' and now it's gone completely :-/ |
17:43
π
|
Baljem |
damnit, it's just done it again on a job that had downloaded > 4500 URLs |
17:43
π
|
Baljem |
that's a bit of a pain, it had been working on it for the best part of an hour :-/ |
17:45
π
|
Baljem |
I think the username was 'mahatma-ganja' or some approximation thereof if that's of any interest to anyone. |
17:47
π
|
alard |
Baljem: I'm having a look at westberlinoldshool now. |
17:49
π
|
Baljem |
cool; fingers crossed I don't have another one vanish (currently have one on 4740 URLs downloaded, another on 3100, 2780 and 1970 - lot of work for someone else to redo |
17:49
π
|
alard |
Wget's timeout setting is probably too low (10 seconds) |
17:49
π
|
alard |
Is the site slow for anyone else? It is for me. |
17:50
π
|
Baljem |
aaaaaand the one on 4740 ('rays') just did the same thing. that had been running for even longer than the last one, gawd knows how big it was |
17:50
π
|
Baljem |
the download rate graph is being very variable for me - sometimes it drops to < 1kB/s, sometimes it's around 2MB/s |
17:51
π
|
alard |
The static pictures are very fast, the HTML pages are slow. |
17:51
π
|
Baljem |
I've dropped my concurrent items to 3 to try and back off a bit |
17:51
π
|
alard |
If you browse it, does it feel slow? |
17:51
π
|
Baljem |
although I have five running at the moment :-/ |
17:51
π
|
Baljem |
one sec, I'll try |
17:52
π
|
Baljem |
yes, slower now than it was when the project started |
17:52
π
|
Baljem |
takes about ten seconds to load a page at the moment :-/ |
17:52
π
|
alard |
I've reduced the number of requests given out by the tracker. |
17:53
π
|
Baljem |
I can't test from the same connection as my Warrior is using, though, but that sort of response does seem like server load at their end rather than bandwidth |
17:53
π
|
alard |
Yes, the server is serving pictures very quickly. |
17:54
π
|
alard |
Is anyone else downloading from them? omf_? SketchCow? |
17:55
π
|
Baljem |
think I've just lost another job, although I don't remember which name has gone missing from the dashboard. |
17:55
π
|
omf_ |
yeah I got a streetfiles warrior running |
17:55
π
|
flaushy |
alard: me and a mate are running on it |
17:55
π
|
alard |
Yes, a warrior, but nothing else? |
17:55
π
|
flaushy |
2 fast machines |
17:55
π
|
flaushy |
with the script |
17:56
π
|
alard |
Which script? |
17:56
π
|
flaushy |
on github, in archiveteam? |
17:56
π
|
alard |
Ah, okay, that's going through the tracker then. |
17:56
π
|
omf_ |
this one? https://github.com/ArchiveTeam/streetfiles-grab |
17:56
π
|
flaushy |
right |
17:56
π
|
alard |
There was talk earlier in this channel about people downloading it, before it was a warrior project. |
17:57
π
|
alard |
E.g.: SketchCow: "I'll do it and another team member will do it." |
17:58
π
|
Baljem |
oh, damn, I've gotta run |
17:58
π
|
omf_ |
yeah I believe that got supersceeded by using the tracker since you mentioned you had it working |
17:58
π
|
Baljem |
my Warrior will trundle along as usual - want me to do anything to the settings before I disappear? (back it off further perhaps?) |
17:59
π
|
alard |
Baljem: No, keep it running. The tracker can handle the backing off/scaling up bit. |
18:00
π
|
Baljem |
cool. currently set to three concurrent jobs (down from six earlier). just noticed another one gave up after 790 URLs ('okse1' I think) and getting very worried about the two that remain, but oh well |
18:01
π
|
SilSte |
shall i stop a warrior? getting rate limiting post... |
18:01
π
|
Baljem |
I have a vague suspicion I may have been banned or something, only getting 0.2KB/s |
18:01
π
|
alard |
Baljem: I'm working on an update. |
18:01
π
|
Baljem |
great, I'll check back after dinner then :) |
18:03
π
|
SilSte |
so its better to change project on the warrior? |
18:03
π
|
alard |
SilSte: You can, if you want. We'll have to figure out what works for this site. |
18:04
π
|
SilSte |
k |
18:04
π
|
alard |
But there's also no harm in keeping it running. |
18:05
π
|
SilSte |
kk |
18:18
π
|
SilSte |
alard: project code ist out of date... will it update automatical oder shall i reboot the warrior? |
18:21
π
|
alard |
SilSte: It will update automatically, within an hour. To update immediately: stop the project (not the warrior), then start again. |
18:21
π
|
SilSte |
kk |
18:21
π
|
SilSte |
worked |
18:22
π
|
omf_ |
alard, can I pause pipeline? |
18:22
π
|
alard |
Pause wget? |
18:22
π
|
alard |
Ctrl+Z should work. |
18:22
π
|
omf_ |
no this command: run-pipeline --concurrent 5 pipeline.py |
18:23
π
|
alard |
You can Ctrl+Z that, if you want. Why? |
18:23
π
|
omf_ |
I am getting the project code out of date flowing by |
18:23
π
|
omf_ |
and it makes it hard to keep an eye on what is going on |
18:23
π
|
alard |
No, you can't pause that, unfortunately. |
18:24
π
|
omf_ |
stop and upgrade it or leave it running? |
18:25
π
|
alard |
Click the stop button, let it finish and start a new one (on a different port)? |
18:25
π
|
omf_ |
this is the cli script on a cloud instance |
18:27
π
|
alard |
Then do what you prefer: kill it, or wait until it finishes. |
18:37
π
|
flaushy |
mowk died on a recent version |
18:37
π
|
flaushy |
oh maybe not so recent, sorry |
18:50
π
|
SilSte |
i get a lot of 500 @formspring... |
19:43
π
|
antomatic |
Hey! For omf_ and anyone else who was interested, I've put a transcript of the Defcon 'soy sauce' archive team talk up at http://www.archiveteam.org/index.php?title=DEFCON_19_Talk_Transcript as well as a timed caption file that could be uploaded to YouTube if wanted. |
19:44
π
|
antomatic |
Er, that's all. |
19:44
π
|
antomatic |
[Not sure if the wiki was the right place; obviously do wipe it out if it's not appropriate.] |
20:00
π
|
alard |
Can someone download the http://streetfiles.org/blog/ ? |
20:13
π
|
godane |
looks like archiveteam is a user name on there there now |
20:15
π
|
alard |
Yes, I signed up. |
20:17
π
|
SilSte |
streetfiles down? |
20:19
π
|
godane |
its going to die in 3 days |
20:19
π
|
alard |
SilSte: No, I don't think it is. |
20:20
π
|
SilSte |
okay... only very slow ^^ |
20:21
π
|
godane |
i may not be much help with this one anyways |
20:21
π
|
godane |
alard: i hope you can get it |
20:22
π
|
alard |
godane: Why can't you help? |
20:22
π
|
godane |
i don't have much hard drive space |
20:24
π
|
alard |
Ah. I thought you were always uploading. :) |
20:24
π
|
godane |
i am |
20:24
π
|
godane |
just i have to much stuff to upload right now |
20:24
π
|
godane |
i'm also starting to do a full mirror of newamerica.net |
20:24
π
|
alard |
http://developer.streetfiles.org/ |
20:25
π
|
antomatic |
Eek! streetfiles.org - 785,152 photos, 92,319 members. |
20:27
π
|
antomatic |
Looks like such a good site, too. |
20:29
π
|
chronomex |
yeah |
20:29
π
|
alard |
'bZ-Q(@K7ljlJRft'<GdqOi[5xHf3x><)crcA |
20:29
π
|
alard |
Sorry, Keepass. :) |
20:31
π
|
chronomex |
mmhmmmm |
20:32
π
|
alard |
It's a pity that streetfiles.org is so slow. |
20:39
π
|
Baljem |
mm. we've done about, what, 10% in 4 hours? going to be tight I fear |
20:40
π
|
alard |
We haven't done 10%. |
20:41
π
|
alard |
There are 92,319 users, they're just not all in the tracker. |
20:41
π
|
Baljem |
ah, bugger |
20:42
π
|
Baljem |
I was going to qualify it with '10% of what the tracker knows about' but thought that might be overly-pedantic ;) |
20:43
π
|
Baljem |
looking at the graph on the tracker page mine seems to be struggling recently, for some reason. perhaps it keeps finding things that have a lot of pages but not much data |
20:43
π
|
alard |
It's an important difference in this case. |
20:44
π
|
Baljem |
yes. I didn't realise there was quite that number of users not in the tracker :( |
20:44
π
|
balrog |
would be nice to record http://69.13.218.21 |
20:46
π
|
SilSte |
will they be added to the tracker? |
20:46
π
|
alard |
We have to find them first. |
20:48
π
|
SilSte |
kk |
20:53
π
|
* |
SmileyG looks in |
20:55
π
|
godane |
i just got glenn beck freepac (FreedomWorks) live speech |
20:56
π
|
godane |
the torrent was almost dead |
21:05
π
|
S[h]O[r]T |
balrog what is that |
21:05
π
|
balrog |
a stream from the CoCoFest 2013 |
21:05
π
|
balrog |
http://www.glensideccc.com/cocofest/ |
21:22
π
|
antomatic |
is the tracker rate limiting being especially cautious with streetfiles at the moment? |
21:23
π
|
alard |
I'd like to not kill it. |
21:23
π
|
antomatic |
(nods) |
21:23
π
|
alard |
The current items are groups, perhaps those are easier for them. |
21:28
π
|
alard |
Question: metadata is more important than photos? |
21:29
π
|
alard |
We don't have to download all those /photos/detail/ pages to download to the large photos. |
21:29
π
|
alard |
But I think that's not a good idea. |
21:30
π
|
antomatic |
Does the metadata tell you anything about the photo that could help prioritise how 'important' it is? - e..g number of views, popularity, size, etc? |
21:31
π
|
alard |
I think that downloading the photos, once you have the metadata, is not a problem. |
21:31
π
|
alard |
The bottleneck is in those web pages, I think. |
21:31
π
|
chronomex |
hm |
21:33
π
|
SilSte |
why not downloading more photos to less stress the server? |
21:34
π
|
SilSte |
most of thosegroups sites are very smalll |
21:34
π
|
alard |
Because that would give you a large bunch of anonymous photos. |
21:34
π
|
alard |
Knowing where, when, what is probably interesting. |
21:34
π
|
SilSte |
49 to do? ^^ :D |
21:35
π
|
alard |
Groups. :) |
21:35
π
|
SilSte |
^^ |
21:35
π
|
antomatic |
agreed, and without the metadata you also don't know the author |
21:35
π
|
antomatic |
no, that's rubbish, ignore me |
21:35
π
|
antomatic |
you do have a user ID |
21:35
π
|
alard |
Yes, you could derive that from the "photos by user X" page. |
21:35
π
|
Baljem |
heh. think it's going to run out of items before mine asks for another after 30 seconds |
21:36
π
|
SilSte |
gru_soldier seems to be very fat... downloading for hours... |
21:36
π
|
SilSte |
^^ |
21:36
π
|
Baljem |
looks like flaushy got about 1.6GB in a chunk a short while again! |
21:37
π
|
SilSte |
i get a lot of 500 @formspring... |
21:37
π
|
flaushy |
Baljem: that was a long download ^^ |
21:38
π
|
alard |
slf-city |
21:38
π
|
ivan` |
I'm getting exit code 4 on my posterous downloaders, is this expected? |
21:38
π
|
SilSte |
you should add time @the warrior ;-) |
21:38
π
|
SilSte |
or a timer :D |
21:38
π
|
SilSte |
posterous blocked me completely |
21:38
π
|
flaushy |
another big one is on this machine as well |
21:38
π
|
omf_ |
okay finally we got a wiki page - http://www.archiveteam.org/index.php?title=Streetfiles and an irc channel #streetsoffire |
21:44
π
|
SmileyG |
well with 5000 to do it shouldn't take us too long hopefully |
21:45
π
|
alard |
Not all, not all. |
21:46
π
|
alard |
(Someone should run a bot that repeats this sad message every time someone says we're almost done. :) |
21:47
π
|
SilSte |
then gimme more work :P |
21:48
π
|
SilSte |
is posterous working for anyone? |
21:54
π
|
Baljem |
hey, don't bogart all the work ;) plenty to go round, but the site's creaky enough under this much load, by the looks of it |
22:01
π
|
omf_ |
Are all the active projects deadline projects on this list? http://www.archiveteam.org/index.php?title=Current_Projects |
22:04
π
|
balrog |
Upcoming is done I think |
22:07
π
|
omf_ |
Yeah I'll remove that |
22:15
π
|
SmileyG |
SilSte: posterous regularly bans now (like every 10 minutes it seems). |
22:16
π
|
antomatic |
Are posterous smart enough not to ban, say, ISP web proxy servers, etc? |
22:16
π
|
SmileyG |
unlikely |
22:17
π
|
SilSte |
my rootserver is blocked ... so ... no :D |
22:17
π
|
SilSte |
stopped hours ago... still blocked. |
22:18
π
|
antomatic |
Any idea how they know what to ban? Is it user-agent, or amount of downloads.. |
22:18
π
|
balrog |
amount of requests |
22:18
π
|
antomatic |
Could they be looking at the tracker and banning the most recent IP to access that username, etc? |
22:18
π
|
antomatic |
amount, right. |
22:20
π
|
antomatic |
hmm. disappointing that they seem so opposed to the legitimate preservation of their users' content. |
22:22
π
|
SmileyG |
indeed. |
22:22
π
|
SmileyG |
:( |
22:22
π
|
antomatic |
Other avenues? Google cache? (Not suitable?_ |
22:22
π
|
SmileyG |
antomatic: the tracker is ours.... |
22:22
π
|
antomatic |
) |
22:22
π
|
SmileyG |
google bans ;) |
22:22
π
|
antomatic |
buh! |
22:23
π
|
antomatic |
Some days an archivist can't get a clean break. |
22:23
π
|
omf_ |
Good thing the bad press won't stop for them |
22:23
π
|
antomatic |
Ah, I meant the dashboard rather than the tracker. That's how I'd interfere, if I were an evil site owner. |
22:27
π
|
antomatic |
Loved the comment in the Defcon speech, "But Google is a library or an archive in the same way that a supermarket is a food museum." |
22:30
π
|
balrog |
ok so I'm mirroring a site that contains a lot of realmedia .ram files |
22:30
π
|
balrog |
after doing wget-warc, I need to cat all the ram files together (each contains a url to a .rm file that actually contains the media), and then what do I do? |
22:31
π
|
balrog |
feed the list into wget and generate a second warc file? |
22:33
π
|
antomatic |
are the .rm files coming off a normal HTTP server or are they using RTSP-type streaming? |
22:33
π
|
antomatic |
might be more complicated if so. If HTTP then no problem, just as you say, I reckon. |
22:36
π
|
balrog |
antomatic: http |
22:36
π
|
balrog |
[there are separate tools for RTSP] |
22:37
π
|
antomatic |
I remember how much trouble RTSP used to cause me, back in the day. :) |
22:42
π
|
noahc |
Posterous isn't banning me for some reason? |
22:43
π
|
SmileyG |
yet ;) |
22:43
π
|
noahc |
It's been running all day though. |
22:43
π
|
Baljem |
blimey. you're our last hope then ;) |
22:44
π
|
noahc |
It appears so. |
22:44
π
|
noahc |
Which is a scary thought! |
22:45
π
|
antomatic |
Excellent luck, noah! |
22:46
π
|
noahc |
I'm downloading between 100 - 200kb. |
22:47
π
|
noahc |
I wonder why I'm not banned throughΓ’ΒΒ¦. |
22:59
π
|
DFJustin |
balrog: if you have to generate a second warc you can always concatenate them later |
23:52
π
|
balrog |
https://twitter.com/waxpancake/status/328158604765036546 |
23:52
π
|
balrog |
FYI |
23:52
π
|
balrog |
LJ is deleting old blogs with fewer than 3 posts |