#archiveteam 2012-11-27,Tue

↑back Search

Time Nickname Message
00:22 🔗 SketchCow I have even more than you
00:33 🔗 Nemo_bis 6 hours? Peanatus. My book took one month. :p
00:42 🔗 godane i'm uploading amigahistory.co.uk
00:43 🔗 godane not many crawls in wayback machine
00:53 🔗 godane i'm grabing arstechica.com index
00:55 🔗 godane uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror
02:15 🔗 dashcloud so is there an easy way to ask an FTP site how big it is?
02:15 🔗 SketchCow no
02:22 🔗 godane does any one know how to cat a file and just echo what ends with / at the end of the line?
02:23 🔗 godane my arstechnica.com index.txt file has a lot of bad urls
02:24 🔗 godane these urls are going be redirect other urls in the list anyway
02:42 🔗 chronomex godane: try grep '/$' whatever.txt
02:43 🔗 chronomex $ means "here must be end-of-line"
02:43 🔗 chronomex ^ is the same but for beginning-of-line
02:43 🔗 dashcloud hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process
02:57 🔗 SketchCow Might need to hide who you are
03:03 🔗 godane chronomex: that only grabs the last line
03:09 🔗 dashcloud ah- I got it- apparently I wasn't timed out from my last login using a non-wget client
03:14 🔗 dashcloud making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though)
03:15 🔗 chronomex godane: ??? ummmmm not sure what kind of unix you're using
03:18 🔗 godane i'm doing grep '/$' index.txt
03:18 🔗 godane the only line that comes up is the last one in list
08:15 🔗 chronomex hm, are you sure that it's not correct?
14:51 🔗 SmileyG http://www.savewalterwhite.com/
17:03 🔗 soultcer chronomex: Is the tracker still OOM-ing?
19:39 🔗 chronomex augh, it is
19:44 🔗 chronomex hm, seems to have fallen over hard this time
19:49 🔗 chronomex ok it's back
19:49 🔗 ersi awesome, thanks man
19:49 🔗 chronomex alard and I will have to discuss how to make this not happen
20:00 🔗 ersi chronomex: Well, we're back at HTTP 599
20:00 🔗 ersi :<
20:00 🔗 chronomex fuqqq
20:01 🔗 ersi Cocks. Huge cocks. In a bowl. A Bowl of Cocks.
20:01 🔗 ersi In other words, cockbowl.
20:02 🔗 chronomex you sure? the website works
20:02 🔗 ersi Maybe it's just my seesaw pipeline that has fucked up, let me restart that
20:02 🔗 ersi but I'm basically getting a lot of connection refuses
20:02 🔗 chronomex hm
20:03 🔗 ersi res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2")
20:03 🔗 ersi tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused
20:03 🔗 chronomex I just kicked redis and nginx, maybe they started in the wrong order or something
20:04 🔗 chronomex ah, I guess I need to start another daemon?
20:04 🔗 ersi seems to be fucked up for me still unfortunally, oh well
20:04 🔗 ersi mayhapples
20:04 🔗 ersi seems to be the user discovery stuff
20:04 🔗 ersi which very well might be seperate
20:06 🔗 chronomex ok, try now
20:08 🔗 ersi lots better
20:08 🔗 ersi hugs and kisses etc
20:09 🔗 chronomex \o/
20:09 🔗 chronomex it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu
20:10 🔗 chronomex what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive
20:13 🔗 SketchCow Weird.
20:13 🔗 SketchCow I got a slight reprieve on the DEFCON documentary
20:14 🔗 SketchCow So I can spend a little more time on archiveteam projects and things and stuff.
20:15 🔗 ersi chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections.
20:15 🔗 chronomex no, I think some daemon on my side goes into spinloop
20:16 🔗 ersi coolers, maybe both
20:17 🔗 chronomex oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed
20:17 🔗 chronomex I should probably enlarge the swapspace
20:21 🔗 ersi swap sucks, but it's better than none I guess
20:21 🔗 ersi Or maybe not, maybe it's better for it to go get OOM'd
20:24 🔗 chronomex I don't know
20:24 🔗 chronomex next time the box falls over completely I'll take the occasion to rejigger the disk allocation
20:46 🔗 SketchCow --------------------------------------------------
20:47 🔗 SketchCow BETA OF THE NEW WAYBACK MACHINE AVAILABLE
20:47 🔗 SketchCow http://web-beta.archive.org/
20:47 🔗 SketchCow Please pound on it, per Brewster's invite.
20:47 🔗 SketchCow Let me know if you run into anything.
20:47 🔗 SketchCow --------------------------------------------------
20:51 🔗 chronomex whatall's different?
20:51 🔗 SketchCow 50% more data
20:51 🔗 SketchCow Right up to the moment.
20:51 🔗 Deewiant http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/
20:51 🔗 DFJustin sweet, some of the mess wiki content is there
20:51 🔗 chronomex spiffy
20:56 🔗 SketchCow http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray
20:56 🔗 swebb SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred.
20:56 🔗 swebb http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links.
20:59 🔗 ersi SketchCow: Is this a new Wayback Machine or a new Liveweb?
21:00 🔗 SketchCow http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes
21:00 🔗 ersi What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc
21:00 🔗 ersi available under the crawldata keyword
21:01 🔗 DFJustin http://wayback-beta.archive.org/web/*/http://goatse.cx/* throws up an error
21:02 🔗 DFJustin also the display of urls is a little screwy
21:02 🔗 SketchCow http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness
21:03 🔗 SketchCow You can see the insanity of us on May 1-5
21:03 🔗 SketchCow Followed by sad little crawls of a dead site
21:03 🔗 chronomex whoa insanity indeed
21:04 🔗 chronomex and march
21:04 🔗 SketchCow ha ha, yes
21:06 🔗 SketchCow Sounds like the MESS wiki info can be transferred back
21:07 🔗 SketchCow http://wayback-beta.archive.org/web/*/http://www.nytimes.com/
21:10 🔗 DFJustin parts of it anyway
21:11 🔗 DFJustin there was a lot of deeply nested stuff unfortunately
21:16 🔗 DFJustin this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems
21:18 🔗 DFJustin took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool"
21:27 🔗 chronomex nice!
21:46 🔗 ersi SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available?
21:51 🔗 ersi http://wayback-beta.archive.org/web/*/http://www.fortunecity.com/* hung my Firefox Instance >_>
21:51 🔗 ersi and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination."
22:09 🔗 * SketchCow is on the phone with an archive about donating his stuff to an archive
22:09 🔗 SketchCow (some of it)
22:17 🔗 balrog_ it would be nice if the new wayback frontend allowed at least URL grep
22:18 🔗 balrog_ since I know fulltext grep would be really, really difficult
22:19 🔗 balrog_ wait, that's there :P
22:19 🔗 balrog_ didn't think I saw it before
22:22 🔗 chronomex url grep?!?
22:22 🔗 chronomex neato
22:51 🔗 SketchCow Uploading downloaded FTP sites
23:46 🔗 dashcloud so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc

irclogger-viewer