#archiveteam 2013-04-24,Wed

↑back Search

Time Nickname Message
03:12 πŸ”— SketchCow RELEASE THE KRAKEN
03:20 πŸ”— dragondon I've seen this twice now, thought I'd let someone know http://imgur.com/kzd5KWh
03:25 πŸ”— dragondon and to be safe, it's a screen shot of the warrior showing code and not the grey screen I'm used to seeing
03:54 πŸ”— offby1 Has anyone made a generalized retargetable warrior EC2 AMI?
03:55 πŸ”— offby1 I'd love to have an image I can spin up on demand that'd pull the project du jour down and churn
03:55 πŸ”— offby1 donate some AWS credits.
03:55 πŸ”— InitHello that actually sounds like something I would be interested in doing
03:56 πŸ”— InitHello I just recently built an AMI and code to spin up/down spot instances to compile and execute third-party addons to the codebase I maintain
03:58 πŸ”— InitHello no way in hell would I let that shit execute on the campus network
04:05 πŸ”— omf_ SketchCow, I got a 24gb backup of ftp.ea.com from last year. Is 7z a supported format or should I just change it to a tar.gz or zip
04:18 πŸ”— SketchCow It's easiest if it's a .zip for the purposes of making it browsable
04:19 πŸ”— SketchCow https://twitter.com/drnormal/status/326904028590116864
04:21 πŸ”— omf_ can do
04:23 πŸ”— TeeCee Mornin' guys.
04:25 πŸ”— SketchCow Morninggggg
04:32 πŸ”— SketchCow http://archive.org/details/bitsavers_disktrend_removed_files
04:32 πŸ”— SketchCow I'm going to leave it there
04:32 πŸ”— SketchCow Like that
04:53 πŸ”— dragondon anyone got any ideas about the messaged up Archive Warrior display? Don't even know if its working or not.
04:55 πŸ”— TeeCee dragondon: I've seen errors like that before...Usually when the host has high load...
04:55 πŸ”— TeeCee What does the webinterface say?
05:02 πŸ”— dragondon TeeCee, seems to show things just fine. And given that I got an AMD 6core, 8GB of ram, and the process monitors shows all lines nearly dead (save for the networking part, which is due to the warrior itself), that answer seems off.
05:04 πŸ”— omf_ which project are you running dragondon
05:05 πŸ”— dragondon omf_, formspring. I have the 'choice' setting turned on.
05:15 πŸ”— chronomex dragondon: looks like the VM is running out of memory
05:16 πŸ”— TeeCee +1
05:16 πŸ”— chronomex I periodically get that on my desktop running the seesaw scripts outside of the vm, so we should probably investigate ways to trim and split crawls in the middle
05:18 πŸ”— chronomex http://archive.org/details/2008.ftp.bundle.collection oh fuck yes ftp.uu.net I missed you
05:31 πŸ”— SketchCow Yeah, that thing is a true goldmine.
05:37 πŸ”— dragondon chronomex, should I stop it, give it more memory for now?
05:38 πŸ”— SketchCow All I'm trying to do is jam as many goldmines up as possible, but we definitely need a round of metadata on these poor things
05:40 πŸ”— dragondon SketchCow, not really knowing what I am talking about, is the metadata an automated or manual process?
05:41 πŸ”— SketchCow It'll be a combination.
05:42 πŸ”— SketchCow here's generally good metadata: http://collections.si.edu/search/results.htm?q=record_ID%3Anmah_834010&repo=DPLA
05:43 πŸ”— dragondon hmm, I don't mind running another Vm for automated stuff (thinking of running a second Warrior VM) but don't want to take much time away from my newest endeavour to learn Python via course
05:43 πŸ”— SketchCow this isn't something that would be running in a warrior.
05:43 πŸ”— SketchCow This is a process.
05:44 πŸ”— dragondon hmm, are there any scripts for scraping details from places like wikipedia to help make this a little quicker?
05:54 πŸ”— SketchCow No.
05:54 πŸ”— SketchCow No, this is a thing.
05:54 πŸ”— SketchCow This is a thing you can stop thinking about.
05:54 πŸ”— SketchCow It's up there with "How do I find the right person for me."
05:55 πŸ”— SketchCow The answer to that is not "I bet if we do a search on the itunes store for "find me the right person" we can punch right through this."
05:55 πŸ”— SketchCow Also if you use Wikipedia for metadata you are actually the antichrist
05:55 πŸ”— SketchCow Like, people should just wheel the babies right to you for eating
05:56 πŸ”— chronomex the "I need more metadata" crowd is mostly librarians and their ilk
05:56 πŸ”— SketchCow Me too
05:56 πŸ”— chronomex SketchCow: well, not all of us can be members of the clean plate club with respect to babies
05:56 πŸ”— SketchCow I just don't think you need to hold stuff offline until the metadata's perfect
05:56 πŸ”— chronomex hell to the no
05:56 πŸ”— SketchCow that's what makes me r a d i c al
05:56 πŸ”— chronomex ha
05:56 πŸ”— * dragondon <--- not much of a librarian
05:59 πŸ”— dragondon I'll just keep on doing what I am till I learn more :)
06:02 πŸ”— dragondon umm, I was going to download another copy of the archive warrior, but archiveteam.org doens't seem to tell me how to get it. I recall seeing a 'how to help' page at one point in the past.
06:04 πŸ”— dragondon ah, found it. Shouldn't have to dig for this. Should be on the main page. http://archiveteam.org/index.php?title=ArchiveTeam_Warrior
06:04 πŸ”— SketchCow yeah, the information on the warrior is tucked far and away on the warrior page.
06:04 πŸ”— SketchCow http://archiveteam.org/index.php?title=Warrior
06:04 πŸ”— SketchCow Only you are digging
06:04 πŸ”— SketchCow I mean, if you call clicking a link 'digging'
06:05 πŸ”— * SketchCow chronomex Going to go back to working on the movie. Don't need this guy. :)
06:05 πŸ”— SketchCow Oh, look at that.
06:05 πŸ”— SketchCow No spam on the wiki.
06:06 πŸ”— SketchCow Thanks again, BlueMax Smiley soultcer
06:11 πŸ”— BlueMax none? All gone?
06:11 πŸ”— SketchCow Well, maybe we have a couple half-eaten broken spam accounts lurking here and there.
06:11 πŸ”— SketchCow But before they were bringing up some live zombies every night, we got it from 30-40 new edits down to 4-5, now we're at zero.
06:12 πŸ”— SketchCow Two nights in a row.
06:12 πŸ”— SketchCow That's a big deal, someone's sad
06:12 πŸ”— SketchCow Just as we get press and attention
06:13 πŸ”— BlueMax yay
06:37 πŸ”— godane so microsoft e3 2012 press conf is getting uploaded
06:37 πŸ”— SketchCow https://twitter.com/textfiles/status/326947951484211200
07:11 πŸ”— SketchCow Here is my life
07:11 πŸ”— SketchCow My life is that I just had to double check I could actually speak at the Library of Congress and still have enough time to make it over to the five day Apple II conference.
07:11 πŸ”— * SketchCow haunted look
07:28 πŸ”— norbert79 SketchCow: We all envy you, no need to emphasize how cool you are ;-)
07:28 πŸ”— norbert79 And I am serious, I really envy you
07:50 πŸ”— tef doesn't make it any less stressful though :-)
07:52 πŸ”— norbert79 and doesn't attract the ladies either... It's like: "Hey, I just saved 100.000 animated gifs, want to date with me?" ... So digital archeology needs it's time to develop it's charm :)
07:52 πŸ”— tef digital preservation
07:53 πŸ”— tef anyway :3
07:54 πŸ”— norbert79 well, both actually, but right
07:54 πŸ”— norbert79 :)
10:02 πŸ”— chronomex norbert79: I believe SketchCow already has a lady in his life
10:02 πŸ”— chronomex in case you were worried
10:27 πŸ”— norbert79 chronomex: Still it's nice being able to show charm
10:27 πŸ”— norbert79 and beinjg appreciated for it :)
10:32 πŸ”— chronomex woop woop woop off-topic siren
16:35 πŸ”— ivan` the corporate youtube channels are worth backing up continuously, they remove old ads and such
16:35 πŸ”— ivan` e.g. a terrible old windows phone ad from microsoft is too embarassing for them https://www.youtube.com/watch?v=ewk8zWx9lqE
16:35 πŸ”— ivan` apple removed their lame genius ads
18:20 πŸ”— SketchCow Jamming in a pile of CD-ROMs
18:24 πŸ”— flaushy <
18:24 πŸ”— flaushy ups
18:28 πŸ”— DFJustin "Total Anhilation"?
18:32 πŸ”— SketchCow No, sadly
18:32 πŸ”— SketchCow One day! One day
18:33 πŸ”— lukeman the guy who runs http://www.marksfriggin.com (a site with logs of each daily howard stern show that's been up since 1995) has been threatening to shut the site for a while and i'm interested in keeping a pre-emptive archive just in case. anyone able to point me in the right direction on doing a simple site archive (whether heritrix or other toolsҀ”i'm primarily a python developer if that influences the tooling at all)?
18:35 πŸ”— chronomex lukeman: I'd start with wget -r -l 0 -m -p --warc-file www_marksfriggin_com http://www.marksfriggin.com/
18:35 πŸ”— ersi lukeman: I'd say using wget (version 1.14) would probably be the best if it's a simple site (ie. content not hidden by javascript)
18:35 πŸ”— chronomex yes
18:35 πŸ”— lukeman thanks guys
18:36 πŸ”— DFJustin http://www.archiveteam.org/index.php?title=Wget_with_WARC_output
18:36 πŸ”— lukeman that works for me. wasn't sure if i needed something more complex.
18:37 πŸ”— ersi Feel free to hack on one of any of our current repositories by the way, a lot of it is in python - available at https://github.com/ArchiveTeam/ ;) (seesaw-kit is an important one for example)
18:37 πŸ”— ersi usually wget is fine, sometimes you need more complex tools though
18:38 πŸ”— DFJustin http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem has a bunch of tools as well
18:38 πŸ”— lukeman yeah, i was reading that before
18:38 πŸ”— GLaDOS 01[13yahoo-upcoming-grab01] 15alard pushed 1 new commit to 06master: 02https://github.com/ArchiveTeam/yahoo-upcoming-grab/commit/21070051b5534a340379a07acae6f0475ffeacc6
18:38 πŸ”— GLaDOS 13yahoo-upcoming-grab/06master 142107005 15Alard: Ignore Wget error 4 (dns resolution).
20:25 πŸ”— SketchCow Posterous attention achieved
20:27 πŸ”— omf_ yeah I just saw the douchbag tweet from a guy at twitter
20:33 πŸ”— flaushy o.O some ppl still use posterous for stuff in May?
20:35 πŸ”— WiK welp, think i may have gotten my 2nd ip ban from github using their api :)
20:37 πŸ”— antomatic Is there any legal means by which Posterous could 'donate' their entire database to the Internet Archive, or similar? (Noticed the great news about Cuil earlier.)
20:38 πŸ”— SketchCow no.
20:38 πŸ”— SketchCow No, no, this is going to explode very quickly.
20:38 πŸ”— SketchCow It's already exploding.
20:39 πŸ”— WiK what has exploded
20:42 πŸ”— closure WiK: full ban, or API rate throttle?
20:44 πŸ”— SketchCow I'm interested to see what happens.
20:55 πŸ”— flaushy hmm we had dedicated boxes and with them we would not have been able to make the deadline? what a joke...
21:00 πŸ”— edoc is some mass "alert the media" scramble in order?
21:01 πŸ”— flaushy edoc: even german news had the posterous shutdown covered
21:02 πŸ”— flaushy http://www.spiegel.de/netzwelt/web/blogdienst-posterous-wird-abgeschaltet-a-883986.html
21:03 πŸ”— flaushy i ll write them up to do a article about archiveteam, lets hope they ll do it :)
21:03 πŸ”— edoc BBC failed to mention.
21:03 πŸ”— edoc this does not suprise me.
21:03 πŸ”— paulv if they're going to let their users download their data after the 30th, why do they care if we are hammering them now?
21:03 πŸ”— Smiley I've been on the BBC sites a few times...
21:04 πŸ”— Smiley paulv: lies and damned lies.
21:12 πŸ”— flaushy SketchCow: can i direct the spiegel guys to you? (twitter)
21:12 πŸ”— flaushy i just hope they do a follow up
21:20 πŸ”— godane SketchCow: any plans on moving my tezkill videos to my tekzilla collection soon?
21:20 πŸ”— Smiley godane: I think now is a bad time to ask bud.
21:20 πŸ”— godane i'm going to uploaded more directly to tezkilla collection
21:20 πŸ”— godane ok
21:21 πŸ”— godane can underscor do that then?
21:22 πŸ”— Smiley Not sure, but Jason has.... a number of things going on atm, including the pending destruction that is Posterous.
21:22 πŸ”— godane got that
21:23 πŸ”— Smiley I'm off to bed as I'm rather ill and not getting any better :
21:23 πŸ”— godane i only ask cause the collection is going to look funny
21:23 πŸ”— antomatic Get well soon, Smiley.
21:23 πŸ”— Smiley thanks
21:23 πŸ”— godane jumping from episode 42 to 105
21:45 πŸ”— abards Ne1 around happy to answer a n00bs question?
21:45 πŸ”— SketchCow What's up
21:46 πŸ”— abards Just started running a linux server and have left it on all day working on upcoming.
21:46 πŸ”— abards Thing is despite a 76mb connection it's only got a gig done so far
21:47 πŸ”— abards Watching the little graph looks like it gets very small amount of data then hangs for s econd
21:47 πŸ”— n00b406 hola
21:47 πŸ”— abards Is this normal or have I set something up wrong
21:48 πŸ”— n00b406 I do not speak much English
21:49 πŸ”— lukeman isn't upcoming complete now?
21:49 πŸ”— antomatic I think that's normal, abards - the client is pretty considerate and only downloads a little bit of data at a time, it doesn't hammer the server or download constantly.
21:49 πŸ”— antomatic If you only have a few threads running (default is 2) then what you describe sounds about right. I'm new myself but I don't think it's a problem.
21:49 πŸ”— omf_ plus upcoming is small
21:50 πŸ”— abards Ok cool, how do I up the threads? Can I ssh into the virtual machine or is their a config page?
21:50 πŸ”— lukeman 143 items out, 0 to do: http://tracker.archiveteam.org/upcoming/
21:51 πŸ”— antomatic And also, quite a lot of Upcoming was finished by last night, so there have been only a fairly small number of work items today anyway.
21:51 πŸ”— abards Ok I'll switch project, it was just the one that alerted me
21:52 πŸ”— antomatic Config page is on the left hand side of your web browser - "Your Settings"
21:52 πŸ”— abards Mine just has name :)
21:52 πŸ”— antomatic Tick the 'Show Advanced Settings' at the top
21:52 πŸ”— abards ahhh
21:53 πŸ”— n00b406 hola a todos me interesa ayudar en su proyecto me esta ayudando el traductor de ingles-español de google :)
21:53 πŸ”— abards Sorry spent so much time playing around with getting it running headless behind afirewall didnt really look very closely
21:53 πŸ”— antomatic All good fun. :)
21:54 πŸ”— abards Yeah, been without internet aside from work for a few years, nice to get back on and start using it for something
21:55 πŸ”— antomatic Hola n00b406 - Yo no hablo español, pero espero que alguien aquí puede ayudar.
21:56 πŸ”— antomatic :)
21:56 πŸ”— abards no entiendo espanol
21:57 πŸ”— n00b406 well I'm interested to discuss your project it is, purpose, philosophy and ...
22:00 πŸ”— noahc No hablo espano, pero Soy Awesome!
22:01 πŸ”— noahc espanol*
22:01 πŸ”— abards lol
22:01 πŸ”— antomatic Hay algunos muy buenos discursos y conferencias sobre YouTube dadas por 'Jason Scott, que puede encontrar interesante, pero estÑn en Inglés.
22:02 πŸ”— antomatic [[Sidenote: It'd be great to get those speeches transcribed, so they could be easily translated... no?]
22:03 πŸ”— omf_ antomatic, I have some info about that
22:03 πŸ”— omf_ give me a minute
22:04 πŸ”— noahc antomatic: I haven't spoken spanish in 4 years, but I can understand what you're saying. Amazing.
22:05 πŸ”— antomatic Wow. I don't even speak Spanish, this is purely Google Translate.
22:06 πŸ”— godane good news on revision3 now
22:06 πŸ”— godane it autoloaded older episodes links on the episodes page now
22:06 πŸ”— noahc I'm not sure it's grammatically correct, but I can understand it. I could never keep por/para straight for example.
22:07 πŸ”— jfranusic wait, what was going on with revision3?
22:13 πŸ”— noahc El Historia de Soy Sauce es muy interesante: http://www.youtube.com/watch?v=-2ZTmuX3cog
22:14 πŸ”— antomatic Fantastic automatic subtitles on that video.
22:14 πŸ”— antomatic "My name is Jason Scott, biamby mascot of our country."
22:18 πŸ”— antomatic Actually that IS something I can help with, if it's useful. I can rip that and clean it up into clean, translatable (and YT-viewable) subtitles. If it helps, obviously. Not if not. :)
22:19 πŸ”— omf_ antomatic, I myself would love it and I know others would as well. If you want to do it, do it
22:19 πŸ”— antomatic Happy to. OK, leave it with me. :)
22:20 πŸ”— WiK closure: i can manually get there, but when i run my code it gets a 403 Forbidden
22:20 πŸ”— WiK ive changed ip address, same, changed username/password same
22:21 πŸ”— omf_ WiK, how many repos till that happens
22:21 πŸ”— WiK anon access in my scrippt still works
22:22 πŸ”— WiK 622336
22:22 πŸ”— omf_ not bad
22:23 πŸ”— omf_ have an estimate on how many repos there are total?
22:23 πŸ”— WiK 1660285
22:24 πŸ”— WiK that was the last id ive seen, and im thinking...
22:24 πŸ”— WiK https://api.github.com/repositories?since=1660285
22:24 πŸ”— WiK you could mess with that to figure out how many there are in totoal, i just never bothered to look
22:24 πŸ”— omf_ Do you want help downloading blocks of it? I can throw some butts at it
22:26 πŸ”— WiK ill gonna stop for a bit and give my modem a break, i wanan keep this 'my' project at least until after defcon
22:26 πŸ”— WiK my modem can use a break anyway
22:26 πŸ”— WiK AND my curret 4tb harddrive is full...so i had to stop the downloading for a bit anyway
22:27 πŸ”— WiK i think they just blocked the user/pass i was using to authenicate
22:27 πŸ”— WiK as its not being blocked per ip/user-agent
22:27 πŸ”— omf_ I understand that WiK. You want to finish what you started.
22:28 πŸ”— WiK nope, i just submitted a cfp to defcon on it, and if they reject ill submit it to firetalks
22:29 πŸ”— WiK after that, ill mostlikly talk to the peeps here about handin goff the project, recoding my stuff so it will work inside of a 'worker'
22:29 πŸ”— omf_ I can help you with that
22:30 πŸ”— WiK http://github.com/wick2o/githunt
22:30 πŸ”— omf_ I have so many ideas of things I want to try on that data
22:30 πŸ”— WiK if you look at api_uname_harvest2.py you can see what im doing
22:31 πŸ”— SketchCow http://i.imgur.com/gZ8Fc83.gif
22:31 πŸ”— WiK im pretty much just ru nning that in different modess
22:31 πŸ”— SketchCow Today in a nutshell
22:31 πŸ”— WiK download = downloads the rips 10 at a time
22:32 πŸ”— WiK processor loops though each folder and updates the database
22:32 πŸ”— WiK so that i can get folder/dir counts
22:32 πŸ”— WiK then i have a custom bash script that runs in cygwin that does grepping on all the results before i remove the HD for another
22:33 πŸ”— WiK and then i have to manual process those :(
22:35 πŸ”— omf_ fucking mother fucking. I just got this back after uploading 33gb
22:36 πŸ”— omf_ I am surprise this is not checked before I upload a file http://paste.archivingyoursh.it/wesepegeba.xml
22:37 πŸ”— Baljem hmm - my Warrior's jumped onto Upcoming at some point today - thought that was done with... is there any use in me telling it to work on Posterous or is that still subject to getting blocked? (or am I just losing my marbles?)
22:38 πŸ”— omf_ Baljem, you can point it back at posterous
22:38 πŸ”— Baljem cool. will do!
22:39 πŸ”— noahc Is form spring worth doing? I notice they say they could be rescued, but there is a ton of stuff left.
22:40 πŸ”— SketchCow https://twitter.com/mrox64/status/327189622591475712
22:40 πŸ”— SketchCow Formspring is low priority to me
22:40 πŸ”— SketchCow I'm worried about posterous, but they're fucking us
22:41 πŸ”— noahc I'll switch over to Posterous then.
22:42 πŸ”— jfranusic What's the deal with Revision3? Are they shutting down?
22:43 πŸ”— antomatic (panics) Whaat?!
22:43 πŸ”— jfranusic that was my thought
22:43 πŸ”— omf_ now I got to upload 33gb again
22:43 πŸ”— antomatic Discovery only just bought them!
22:43 πŸ”— jfranusic "good news on revision3 now
22:43 πŸ”— jfranusic it autoloaded older episodes links on the episodes page now"
22:46 πŸ”— antomatic They wouldn't shut it, surely.. haven't heard anything.
22:50 πŸ”— SketchCow Welcome to the fun, jfranusic
22:52 πŸ”— godane i doing rev3 just in case
22:52 πŸ”— godane alot of the episodes are from like before 2010 right now
22:53 πŸ”— SketchCow Hi.
22:53 πŸ”— SketchCow Would you happen to know if anybody has made an effort to archive
22:53 πŸ”— SketchCow This was a really early ISP based in Manchester, England. A series of
22:53 πŸ”— SketchCow mergers and takeovers means that the domain basically got forgotten
22:53 πŸ”— SketchCow nwnet.co.uk ?
22:53 πŸ”— SketchCow about, and it's full of late 1990s gems such as
22:53 πŸ”— SketchCow http://www.nwnet.co.uk/worsley/ - spot the MS Front Page template, for
22:53 πŸ”— SketchCow an NHS website! Lots of personal pages and very amateur feeling
22:53 πŸ”— SketchCow company homepages. googling site:nwnet.co.uk seems to bring up
22:53 πŸ”— SketchCow loads..
22:53 πŸ”— SketchCow I was a subscriber back in the 1990s. My own website is still there,
22:53 πŸ”— SketchCow and my email still works. And it's through that I've had notification
22:53 πŸ”— SketchCow that they are switching it all off on 1st May ... I'd run a wget
22:53 πŸ”— SketchCow across it, but haven't a tool to grab a starting list of urls from
22:53 πŸ”— SketchCow google, and don't have enough time to write one - is there anything
22:53 πŸ”— SketchCow about already?
22:53 πŸ”— SketchCow Cheers
22:53 πŸ”— SketchCow Rob
22:53 πŸ”— SketchCow --------------
22:53 πŸ”— SketchCow OK, let's do it.
22:54 πŸ”— SketchCow It's a real thing, a website set being shut down in.... sex days
22:54 πŸ”— SketchCow six
22:54 πŸ”— SketchCow well sex days too, if you play it right
22:54 πŸ”— omf_ nwnet.co.uk doesn't work for me
22:55 πŸ”— antomatic Here it redirects to www.telinco.net, which has a general 'shutting down, deleting may 1st, get lost, your fault if you didn't back it up' message on it
22:55 πŸ”— antomatic disgraceful.
22:55 πŸ”— omf_ yeah the www redirects
22:56 πŸ”— jfranusic godane: so, you're just doing a pro-active backup?
22:57 πŸ”— antomatic 1490 google results for http://www.google.co.uk/search?num=100&newwindow=1&site=&source=hp&q=site%3Anwnet.co.uk&oq=site%3Anwnet.co.uk&gs_l=hp.3...1279.3709.0.3900.18.16.1.0.0.0.128.1097.14j2.16.0...0.0...1c.1.11.hp.8MVadSwsvjI
22:57 πŸ”— antomatic eh, I mean, for "site:nwnet.co.uk"
22:57 πŸ”— antomatic oh, 306 by the time it gets to page 4.
22:57 πŸ”— godane yes
22:58 πŸ”— antomatic doh. (no good at this.) :)
22:58 πŸ”— omf_ there is some cool shit on there
22:59 πŸ”— jfranusic godane: gotcha, okay, good show.
23:00 πŸ”— dashcloud nwnet.co.uk/worsley doesn't redirect here (east coast, US)
23:01 πŸ”— omf_ Here we go http://urlsearch.commoncrawl.org/?q=nwnet.co.uk
23:01 πŸ”— omf_ 320 urls there
23:01 πŸ”— godane holy crap
23:03 πŸ”— Baljem must be something about British ISPs - a few years ago UK Online was closed down by Murdoch's lot, taking a whole bunch of subscribers' sites with it (such as Cliff Lawson's resources for Amstrad PCs)
23:04 πŸ”— Baljem alas I didn't know about Archive Team when I got that e-mail or I'd have suggested doing something about it :-/
23:08 πŸ”— dashcloud so, I'm going to do a crawl of the nwnet.co.uk/worsley one first using this wget command: wget -r -l 0 -m -p --warc-file nwnet-worsley http://www.nwnet.co.uk/worsley (and then move on, one by one through the other sites listed in the commoncrawl list)
23:15 πŸ”— chronomex sounds good
23:18 πŸ”— omf_ check the wayback machine. With all that cuil data there are probably more urls in there
23:19 πŸ”— antomatic while I think of it...
23:19 πŸ”— SketchCow http://i.imgur.com/8udKTNj.png
23:20 πŸ”— antomatic WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
23:20 πŸ”— SketchCow antomatic: THY MAGIC WORD IS yahoosucks
23:20 πŸ”— SketchCow dashcloud: Before you do that, I think this is a job for the warrior
23:20 πŸ”— antomatic It seems so obvious now. :) - Thanks!
23:21 πŸ”— balrog looks like twitter sucks more.
23:21 πŸ”— balrog ;[
23:21 πŸ”— dashcloud that's fine- if you go here: http://urlsearch.commoncrawl.org/?q=nwnet.co.uk there's a .json download of all the urls
23:21 πŸ”— noahc I could write a ruby script that brute forces urls, would that be useful ?
23:22 πŸ”— noahc Have it check a new one every 4 seconds or something.
23:23 πŸ”— SketchCow Dicitonary attack isn't unworthwhile
23:24 πŸ”— dashcloud there's not that many listed there in that list
23:24 πŸ”— dashcloud maybe 20 actual sites- haven't checked google's listings yet
23:26 πŸ”— SketchCow We'll want to hit google
23:28 πŸ”— noahc any sense on if they allow numbers in usernames?
23:31 πŸ”— antomatic haven't seen any yet - but they do seem to be case-sensitive
23:31 πŸ”— noahc Yuk!
23:31 πŸ”— antomatic e.g. nwnet.co.uk/BFG/
23:32 πŸ”— antomatic Yes - numbers.
23:32 πŸ”— antomatic www.nwnet.co.uk/i2i/
23:38 πŸ”— SketchCow Please just keep building a massive textfile we can use for the downloader
23:38 πŸ”— SketchCow It's OK if we spend a night with a couple of you tracking possible filenames
23:38 πŸ”— SketchCow Dictionary attacks against google work good too
23:38 πŸ”— BlueMax What project is this for
23:38 πŸ”— dashcloud so, which paste site? archivingyoursh.it or elsewhere?
23:44 πŸ”— pronoiac Hi, everybody.
23:44 πŸ”— antomatic (waves)
23:45 πŸ”— pronoiac I have a Posterous task which has stalled - 2hrs since the last entry on its wget.log.
23:45 πŸ”— pronoiac I'm not running the Warrior.
23:46 πŸ”— pronoiac It's downloaded over 4k URLs, over the past 3 days.
23:47 πŸ”— pronoiac Is this normal, or is this broken?
23:48 πŸ”— pronoiac Er, to be clear: should it wait for hours between requests?
23:50 πŸ”— godane i'm getting the pat and stu show for april 24 2013
23:50 πŸ”— godane *the video
23:50 πŸ”— godane of it
23:51 πŸ”— godane cause the mp3 is not on there site yet
23:58 πŸ”— noahc I'm doing a dictionary attack against the the website. We'll see how it goes.
23:59 πŸ”— pronoiac It looks like it was a spammer, and it's now full of 404s.

irclogger-viewer