#archiveteam 2013-04-28,Sun

↑back Search

Time Nickname Message
00:02 πŸ”— godane hey Famicoman
08:44 πŸ”— godane http://www.obviouswinner.com/obvwin/2013/4/26/batman-dude-builds-himself-150000-secret-basement-batcave.html
08:44 πŸ”— godane Batman Dude Builds Himself $150,000 Secret Basement Batcave
08:45 πŸ”— godane sorry didn't know this was not #-bs
14:25 πŸ”— SketchCow Oh goddamnit
14:25 πŸ”— SketchCow People are telling me/us about sites WAY too late.
14:26 πŸ”— SketchCow I realize it's redundant, but I'm going to try grabbing a site, who else wants it?
14:26 πŸ”— SketchCow streetfiles.org
14:28 πŸ”— andy0 I'm currently hopping IPs to be nearly the only posterous downloader currently
14:29 πŸ”— andy0 what can I do for streetfiles? I can spin up debian VM's
14:29 πŸ”— andy0 on VPS with limited HD space to retransmit or larger to save the files
14:32 πŸ”— SketchCow I don't know how big it is.
14:32 πŸ”— SketchCow I have 2tb set aside
14:33 πŸ”— andy000 (back after another IP flip)
14:41 πŸ”— SilSte Hi
14:41 πŸ”— SilSte how may I download several Webpages simultainoiusly?
14:41 πŸ”— SilSte at the moment I'm running 3 VMs :/
14:42 πŸ”— SilSte or is there another way to squeeze more out of the box? ;-)
14:47 πŸ”— RedType fork()
14:48 πŸ”— RedType you could try running multiple instances of your downloader
14:48 πŸ”— SilSte atm I'm running 3 seperate VMs .... i can only control the first one because of the ports :/
14:49 πŸ”— SilSte is it possible to simply change a configfile?
15:09 πŸ”— flaushy SketchCow: is for streetfiles a script available? don't have much spare gbs here, but could give bandwidth
15:11 πŸ”— SketchCow No, no worries.
15:11 πŸ”— SketchCow I'll do it and another team member will do it.
15:12 πŸ”— flaushy aweseom :)
15:12 πŸ”— alard SketchCow: Want to make Streetfiles a warrior thing?
15:13 πŸ”— omf_ All the photo pages are just increasing numbers, just need to find the start point
15:15 πŸ”— omf_ well that was simple, it starts at 1
15:15 πŸ”— omf_ http://streetfiles.org/photos/detail/1
15:17 πŸ”— SketchCow Yes
15:17 πŸ”— SketchCow alard: Yes, sure, why not.
15:17 πŸ”— SketchCow alard: I'm more concerned about nwnet and the related site - can that go warrior? All of them die on the 30th
15:18 πŸ”— omf_ Yes it can, someone was brute forcing it to find more urls
15:18 πŸ”— alard Streetfiles: I've almost finished a lua script that makes a per-user package. Testing it now
15:19 πŸ”— alard nwnet: Yes, what we need is a list of urls to feed to wget, and a wget command.
15:20 πŸ”— SilSte I'm having a 2 servers which have nothing to do... if you provide me a vm I can spend as much bandwidth as u want...
15:20 πŸ”— SilSte and the servers are capable of ;-)
15:20 πŸ”— SketchCow Someone was supposed to take a dictionary attack and someone was supposed to do a google dictionary attack
15:21 πŸ”— SmileyG for x in {1..10}; do wget .... $x; done
15:21 πŸ”— omf_ dashcloud, was working on it
15:22 πŸ”— flaushy btw is archiveteam present at OHM2013?
15:28 πŸ”— SketchCow No.
15:32 πŸ”— omf_ Uploaded the first 100gb of my linux iso archive. Here is one release version http://archive.org/details/opensuse-10.2_release
15:35 πŸ”— SketchCow Looks fun, though.
15:36 πŸ”— SketchCow OK, omf_
15:36 πŸ”— SketchCow First, set it as software, not texts.
15:36 πŸ”— omf_ I knew I was forgetting something
15:36 πŸ”— SketchCow Next, let me make a collection.
15:37 πŸ”— SketchCow These will all be ISOs?
15:37 πŸ”— omf_ some of the older ones are floppies
15:37 πŸ”— omf_ and tars of code
15:38 πŸ”— SketchCow OK.
15:38 πŸ”— SketchCow How big is this again, just for trivia's sake?
15:38 πŸ”— omf_ 3tb at last estimate
15:38 πŸ”— omf_ I got boxes and boxes of drives
15:39 πŸ”— SketchCow OK, easy enough.
15:39 πŸ”— SketchCow Keep uploading.
15:39 πŸ”— SketchCow I have gotten back all the data from the dead FOS machine.
15:42 πŸ”— SketchCow I'm in the process of getting all the scripts and stuff I had there working, when when I do, stuff will come back, including collection making, and I can put your stuff into the collection.
15:42 πŸ”— SmileyG whoop.
15:42 πŸ”— SketchCow Yeah, we didn't have much data loss if any.
15:42 πŸ”— SmileyG SketchCow: do you need notifying about all the warc's we are uploading for the ign/gamespy grab? They are all tagged archiveteam anyway.
15:42 πŸ”— SketchCow No.
15:43 πŸ”— SketchCow No, but I will be doing massive bombing runs of cleaning up what we have.
15:44 πŸ”— omf_ I found I have some freebsd and openbsd isos as well. I guess the collection title should be "Open Source Operating Systems"? or something better? I am terrible at naming things
15:44 πŸ”— SketchCow Yes
15:44 πŸ”— SketchCow I'll be approaching that.
15:44 πŸ”— SketchCow IT'll e nice, it's an amazing collection.
15:45 πŸ”— SketchCow Obviously, we have the ones that Walnut Creek did, but I consider this one to be a separate set, and one which might have some redundancies but I'd rather that they e redundant.
15:45 πŸ”— SketchCow (Sorry, b key sticks on this laptop)
16:24 πŸ”— alard Now on a warrior near you: https://github.com/ArchiveTeam/streetfiles-grab
16:25 πŸ”— * Baljem hops - beats watching sweet FA happen on posterous :/
16:27 πŸ”— alard Still looking for lists of usernames, though.
16:31 πŸ”— frame_at I'm really impressed how much the warrior has evolved. New project just showed up, seamless switch. Great software.
16:36 πŸ”— Baljem heh. mine appears to be dead. time to open the other laptop and poke the VM host...
16:38 πŸ”— Baljem oh, no, that was my fault. I think I clicked the 'stop' button by mistake (!), as it's powered itself off. doh
16:38 πŸ”— SketchCow 11G simtelnet.bu.mirror.2013.04.zip
16:38 πŸ”— SketchCow root@teamarchive-0:/1/SIMTELNET/ftp.bu.edu/mirrors# du -sh sim*.zip
16:46 πŸ”— omf_ SketchCow, I cannot change the mediatype of the items I already uploaded but all the new ones are software
16:52 πŸ”— SilSte Featurerequest for the warrior: What about an option to participate at multiple projects at the same time?
17:03 πŸ”— omf_ 2 days left, no idea how big it is, lets save some ART - http://tracker.archiveteam.org/streetfiles/
17:11 πŸ”— Baljem hmm - something strange just happened on one of my streetfiles tasks - 'westberlinoldschool' I think was the name
17:12 πŸ”— Baljem it had downloaded something like 3000 URLs, then wget quit with exit code 4, I saw it 'waiting for 10 seconds' and now it's gone completely :-/
17:43 πŸ”— Baljem damnit, it's just done it again on a job that had downloaded > 4500 URLs
17:43 πŸ”— Baljem that's a bit of a pain, it had been working on it for the best part of an hour :-/
17:45 πŸ”— Baljem I think the username was 'mahatma-ganja' or some approximation thereof if that's of any interest to anyone.
17:47 πŸ”— alard Baljem: I'm having a look at westberlinoldshool now.
17:49 πŸ”— Baljem cool; fingers crossed I don't have another one vanish (currently have one on 4740 URLs downloaded, another on 3100, 2780 and 1970 - lot of work for someone else to redo
17:49 πŸ”— alard Wget's timeout setting is probably too low (10 seconds)
17:49 πŸ”— alard Is the site slow for anyone else? It is for me.
17:50 πŸ”— Baljem aaaaaand the one on 4740 ('rays') just did the same thing. that had been running for even longer than the last one, gawd knows how big it was
17:50 πŸ”— Baljem the download rate graph is being very variable for me - sometimes it drops to < 1kB/s, sometimes it's around 2MB/s
17:51 πŸ”— alard The static pictures are very fast, the HTML pages are slow.
17:51 πŸ”— Baljem I've dropped my concurrent items to 3 to try and back off a bit
17:51 πŸ”— alard If you browse it, does it feel slow?
17:51 πŸ”— Baljem although I have five running at the moment :-/
17:51 πŸ”— Baljem one sec, I'll try
17:52 πŸ”— Baljem yes, slower now than it was when the project started
17:52 πŸ”— Baljem takes about ten seconds to load a page at the moment :-/
17:52 πŸ”— alard I've reduced the number of requests given out by the tracker.
17:53 πŸ”— Baljem I can't test from the same connection as my Warrior is using, though, but that sort of response does seem like server load at their end rather than bandwidth
17:53 πŸ”— alard Yes, the server is serving pictures very quickly.
17:54 πŸ”— alard Is anyone else downloading from them? omf_? SketchCow?
17:55 πŸ”— Baljem think I've just lost another job, although I don't remember which name has gone missing from the dashboard.
17:55 πŸ”— omf_ yeah I got a streetfiles warrior running
17:55 πŸ”— flaushy alard: me and a mate are running on it
17:55 πŸ”— alard Yes, a warrior, but nothing else?
17:55 πŸ”— flaushy 2 fast machines
17:55 πŸ”— flaushy with the script
17:56 πŸ”— alard Which script?
17:56 πŸ”— flaushy on github, in archiveteam?
17:56 πŸ”— alard Ah, okay, that's going through the tracker then.
17:56 πŸ”— omf_ this one? https://github.com/ArchiveTeam/streetfiles-grab
17:56 πŸ”— flaushy right
17:56 πŸ”— alard There was talk earlier in this channel about people downloading it, before it was a warrior project.
17:57 πŸ”— alard E.g.: SketchCow: "I'll do it and another team member will do it."
17:58 πŸ”— Baljem oh, damn, I've gotta run
17:58 πŸ”— omf_ yeah I believe that got supersceeded by using the tracker since you mentioned you had it working
17:58 πŸ”— Baljem my Warrior will trundle along as usual - want me to do anything to the settings before I disappear? (back it off further perhaps?)
17:59 πŸ”— alard Baljem: No, keep it running. The tracker can handle the backing off/scaling up bit.
18:00 πŸ”— Baljem cool. currently set to three concurrent jobs (down from six earlier). just noticed another one gave up after 790 URLs ('okse1' I think) and getting very worried about the two that remain, but oh well
18:01 πŸ”— SilSte shall i stop a warrior? getting rate limiting post...
18:01 πŸ”— Baljem I have a vague suspicion I may have been banned or something, only getting 0.2KB/s
18:01 πŸ”— alard Baljem: I'm working on an update.
18:01 πŸ”— Baljem great, I'll check back after dinner then :)
18:03 πŸ”— SilSte so its better to change project on the warrior?
18:03 πŸ”— alard SilSte: You can, if you want. We'll have to figure out what works for this site.
18:04 πŸ”— SilSte k
18:04 πŸ”— alard But there's also no harm in keeping it running.
18:05 πŸ”— SilSte kk
18:18 πŸ”— SilSte alard: project code ist out of date... will it update automatical oder shall i reboot the warrior?
18:21 πŸ”— alard SilSte: It will update automatically, within an hour. To update immediately: stop the project (not the warrior), then start again.
18:21 πŸ”— SilSte kk
18:21 πŸ”— SilSte worked
18:22 πŸ”— omf_ alard, can I pause pipeline?
18:22 πŸ”— alard Pause wget?
18:22 πŸ”— alard Ctrl+Z should work.
18:22 πŸ”— omf_ no this command: run-pipeline --concurrent 5 pipeline.py
18:23 πŸ”— alard You can Ctrl+Z that, if you want. Why?
18:23 πŸ”— omf_ I am getting the project code out of date flowing by
18:23 πŸ”— omf_ and it makes it hard to keep an eye on what is going on
18:23 πŸ”— alard No, you can't pause that, unfortunately.
18:24 πŸ”— omf_ stop and upgrade it or leave it running?
18:25 πŸ”— alard Click the stop button, let it finish and start a new one (on a different port)?
18:25 πŸ”— omf_ this is the cli script on a cloud instance
18:27 πŸ”— alard Then do what you prefer: kill it, or wait until it finishes.
18:37 πŸ”— flaushy mowk died on a recent version
18:37 πŸ”— flaushy oh maybe not so recent, sorry
18:50 πŸ”— SilSte i get a lot of 500 @formspring...
19:43 πŸ”— antomatic Hey! For omf_ and anyone else who was interested, I've put a transcript of the Defcon 'soy sauce' archive team talk up at http://www.archiveteam.org/index.php?title=DEFCON_19_Talk_Transcript as well as a timed caption file that could be uploaded to YouTube if wanted.
19:44 πŸ”— antomatic Er, that's all.
19:44 πŸ”— antomatic [Not sure if the wiki was the right place; obviously do wipe it out if it's not appropriate.]
20:00 πŸ”— alard Can someone download the http://streetfiles.org/blog/ ?
20:13 πŸ”— godane looks like archiveteam is a user name on there there now
20:15 πŸ”— alard Yes, I signed up.
20:17 πŸ”— SilSte streetfiles down?
20:19 πŸ”— godane its going to die in 3 days
20:19 πŸ”— alard SilSte: No, I don't think it is.
20:20 πŸ”— SilSte okay... only very slow ^^
20:21 πŸ”— godane i may not be much help with this one anyways
20:21 πŸ”— godane alard: i hope you can get it
20:22 πŸ”— alard godane: Why can't you help?
20:22 πŸ”— godane i don't have much hard drive space
20:24 πŸ”— alard Ah. I thought you were always uploading. :)
20:24 πŸ”— godane i am
20:24 πŸ”— godane just i have to much stuff to upload right now
20:24 πŸ”— godane i'm also starting to do a full mirror of newamerica.net
20:24 πŸ”— alard http://developer.streetfiles.org/
20:25 πŸ”— antomatic Eek! streetfiles.org - 785,152 photos, 92,319 members.
20:27 πŸ”— antomatic Looks like such a good site, too.
20:29 πŸ”— chronomex yeah
20:29 πŸ”— alard 'bZ-Q(@K7ljlJRft'<GdqOi[5xHf3x><)crcA
20:29 πŸ”— alard Sorry, Keepass. :)
20:31 πŸ”— chronomex mmhmmmm
20:32 πŸ”— alard It's a pity that streetfiles.org is so slow.
20:39 πŸ”— Baljem mm. we've done about, what, 10% in 4 hours? going to be tight I fear
20:40 πŸ”— alard We haven't done 10%.
20:41 πŸ”— alard There are 92,319 users, they're just not all in the tracker.
20:41 πŸ”— Baljem ah, bugger
20:42 πŸ”— Baljem I was going to qualify it with '10% of what the tracker knows about' but thought that might be overly-pedantic ;)
20:43 πŸ”— Baljem looking at the graph on the tracker page mine seems to be struggling recently, for some reason. perhaps it keeps finding things that have a lot of pages but not much data
20:43 πŸ”— alard It's an important difference in this case.
20:44 πŸ”— Baljem yes. I didn't realise there was quite that number of users not in the tracker :(
20:44 πŸ”— balrog would be nice to record http://69.13.218.21
20:46 πŸ”— SilSte will they be added to the tracker?
20:46 πŸ”— alard We have to find them first.
20:48 πŸ”— SilSte kk
20:53 πŸ”— * SmileyG looks in
20:55 πŸ”— godane i just got glenn beck freepac (FreedomWorks) live speech
20:56 πŸ”— godane the torrent was almost dead
21:05 πŸ”— S[h]O[r]T balrog what is that
21:05 πŸ”— balrog a stream from the CoCoFest 2013
21:05 πŸ”— balrog http://www.glensideccc.com/cocofest/
21:22 πŸ”— antomatic is the tracker rate limiting being especially cautious with streetfiles at the moment?
21:23 πŸ”— alard I'd like to not kill it.
21:23 πŸ”— antomatic (nods)
21:23 πŸ”— alard The current items are groups, perhaps those are easier for them.
21:28 πŸ”— alard Question: metadata is more important than photos?
21:29 πŸ”— alard We don't have to download all those /photos/detail/ pages to download to the large photos.
21:29 πŸ”— alard But I think that's not a good idea.
21:30 πŸ”— antomatic Does the metadata tell you anything about the photo that could help prioritise how 'important' it is? - e..g number of views, popularity, size, etc?
21:31 πŸ”— alard I think that downloading the photos, once you have the metadata, is not a problem.
21:31 πŸ”— alard The bottleneck is in those web pages, I think.
21:31 πŸ”— chronomex hm
21:33 πŸ”— SilSte why not downloading more photos to less stress the server?
21:34 πŸ”— SilSte most of thosegroups sites are very smalll
21:34 πŸ”— alard Because that would give you a large bunch of anonymous photos.
21:34 πŸ”— alard Knowing where, when, what is probably interesting.
21:34 πŸ”— SilSte 49 to do? ^^ :D
21:35 πŸ”— alard Groups. :)
21:35 πŸ”— SilSte ^^
21:35 πŸ”— antomatic agreed, and without the metadata you also don't know the author
21:35 πŸ”— antomatic no, that's rubbish, ignore me
21:35 πŸ”— antomatic you do have a user ID
21:35 πŸ”— alard Yes, you could derive that from the "photos by user X" page.
21:35 πŸ”— Baljem heh. think it's going to run out of items before mine asks for another after 30 seconds
21:36 πŸ”— SilSte gru_soldier seems to be very fat... downloading for hours...
21:36 πŸ”— SilSte ^^
21:36 πŸ”— Baljem looks like flaushy got about 1.6GB in a chunk a short while again!
21:37 πŸ”— SilSte i get a lot of 500 @formspring...
21:37 πŸ”— flaushy Baljem: that was a long download ^^
21:38 πŸ”— alard slf-city
21:38 πŸ”— ivan` I'm getting exit code 4 on my posterous downloaders, is this expected?
21:38 πŸ”— SilSte you should add time @the warrior ;-)
21:38 πŸ”— SilSte or a timer :D
21:38 πŸ”— SilSte posterous blocked me completely
21:38 πŸ”— flaushy another big one is on this machine as well
21:38 πŸ”— omf_ okay finally we got a wiki page - http://www.archiveteam.org/index.php?title=Streetfiles and an irc channel #streetsoffire
21:44 πŸ”— SmileyG well with 5000 to do it shouldn't take us too long hopefully
21:45 πŸ”— alard Not all, not all.
21:46 πŸ”— alard (Someone should run a bot that repeats this sad message every time someone says we're almost done. :)
21:47 πŸ”— SilSte then gimme more work :P
21:48 πŸ”— SilSte is posterous working for anyone?
21:54 πŸ”— Baljem hey, don't bogart all the work ;) plenty to go round, but the site's creaky enough under this much load, by the looks of it
22:01 πŸ”— omf_ Are all the active projects deadline projects on this list? http://www.archiveteam.org/index.php?title=Current_Projects
22:04 πŸ”— balrog Upcoming is done I think
22:07 πŸ”— omf_ Yeah I'll remove that
22:15 πŸ”— SmileyG SilSte: posterous regularly bans now (like every 10 minutes it seems).
22:16 πŸ”— antomatic Are posterous smart enough not to ban, say, ISP web proxy servers, etc?
22:16 πŸ”— SmileyG unlikely
22:17 πŸ”— SilSte my rootserver is blocked ... so ... no :D
22:17 πŸ”— SilSte stopped hours ago... still blocked.
22:18 πŸ”— antomatic Any idea how they know what to ban? Is it user-agent, or amount of downloads..
22:18 πŸ”— balrog amount of requests
22:18 πŸ”— antomatic Could they be looking at the tracker and banning the most recent IP to access that username, etc?
22:18 πŸ”— antomatic amount, right.
22:20 πŸ”— antomatic hmm. disappointing that they seem so opposed to the legitimate preservation of their users' content.
22:22 πŸ”— SmileyG indeed.
22:22 πŸ”— SmileyG :(
22:22 πŸ”— antomatic Other avenues? Google cache? (Not suitable?_
22:22 πŸ”— SmileyG antomatic: the tracker is ours....
22:22 πŸ”— antomatic )
22:22 πŸ”— SmileyG google bans ;)
22:22 πŸ”— antomatic buh!
22:23 πŸ”— antomatic Some days an archivist can't get a clean break.
22:23 πŸ”— omf_ Good thing the bad press won't stop for them
22:23 πŸ”— antomatic Ah, I meant the dashboard rather than the tracker. That's how I'd interfere, if I were an evil site owner.
22:27 πŸ”— antomatic Loved the comment in the Defcon speech, "But Google is a library or an archive in the same way that a supermarket is a food museum."
22:30 πŸ”— balrog ok so I'm mirroring a site that contains a lot of realmedia .ram files
22:30 πŸ”— balrog after doing wget-warc, I need to cat all the ram files together (each contains a url to a .rm file that actually contains the media), and then what do I do?
22:31 πŸ”— balrog feed the list into wget and generate a second warc file?
22:33 πŸ”— antomatic are the .rm files coming off a normal HTTP server or are they using RTSP-type streaming?
22:33 πŸ”— antomatic might be more complicated if so. If HTTP then no problem, just as you say, I reckon.
22:36 πŸ”— balrog antomatic: http
22:36 πŸ”— balrog [there are separate tools for RTSP]
22:37 πŸ”— antomatic I remember how much trouble RTSP used to cause me, back in the day. :)
22:42 πŸ”— noahc Posterous isn't banning me for some reason?
22:43 πŸ”— SmileyG yet ;)
22:43 πŸ”— noahc It's been running all day though.
22:43 πŸ”— Baljem blimey. you're our last hope then ;)
22:44 πŸ”— noahc It appears so.
22:44 πŸ”— noahc Which is a scary thought!
22:45 πŸ”— antomatic Excellent luck, noah!
22:46 πŸ”— noahc I'm downloading between 100 - 200kb.
22:47 πŸ”— noahc I wonder why I'm not banned throughҀ¦.
22:59 πŸ”— DFJustin balrog: if you have to generate a second warc you can always concatenate them later
23:52 πŸ”— balrog https://twitter.com/waxpancake/status/328158604765036546
23:52 πŸ”— balrog FYI
23:52 πŸ”— balrog LJ is deleting old blogs with fewer than 3 posts

irclogger-viewer