[00:22] I have even more than you [00:33] 6 hours? Peanatus. My book took one month. :p [00:42] i'm uploading amigahistory.co.uk [00:43] not many crawls in wayback machine [00:53] i'm grabing arstechica.com index [00:55] uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror [02:15] so is there an easy way to ask an FTP site how big it is? [02:15] no [02:22] does any one know how to cat a file and just echo what ends with / at the end of the line? [02:23] my arstechnica.com index.txt file has a lot of bad urls [02:24] these urls are going be redirect other urls in the list anyway [02:42] godane: try grep '/$' whatever.txt [02:43] $ means "here must be end-of-line" [02:43] ^ is the same but for beginning-of-line [02:43] hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process [02:57] Might need to hide who you are [03:03] chronomex: that only grabs the last line [03:09] ah- I got it- apparently I wasn't timed out from my last login using a non-wget client [03:14] making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though) [03:15] godane: ??? ummmmm not sure what kind of unix you're using [03:18] i'm doing grep '/$' index.txt [03:18] the only line that comes up is the last one in list [08:15] hm, are you sure that it's not correct? [14:51] http://www.savewalterwhite.com/ [17:03] chronomex: Is the tracker still OOM-ing? [19:39] augh, it is [19:44] hm, seems to have fallen over hard this time [19:49] ok it's back [19:49] awesome, thanks man [19:49] alard and I will have to discuss how to make this not happen [20:00] chronomex: Well, we're back at HTTP 599 [20:00] :< [20:00] fuqqq [20:01] Cocks. Huge cocks. In a bowl. A Bowl of Cocks. [20:01] In other words, cockbowl. [20:02] you sure? the website works [20:02] Maybe it's just my seesaw pipeline that has fucked up, let me restart that [20:02] but I'm basically getting a lot of connection refuses [20:02] hm [20:03] res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2") [20:03] tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused [20:03] I just kicked redis and nginx, maybe they started in the wrong order or something [20:04] ah, I guess I need to start another daemon? [20:04] seems to be fucked up for me still unfortunally, oh well [20:04] mayhapples [20:04] seems to be the user discovery stuff [20:04] which very well might be seperate [20:06] ok, try now [20:08] lots better [20:08] hugs and kisses etc [20:09] \o/ [20:09] it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu [20:10] what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive [20:13] Weird. [20:13] I got a slight reprieve on the DEFCON documentary [20:14] So I can spend a little more time on archiveteam projects and things and stuff. [20:15] chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections. [20:15] no, I think some daemon on my side goes into spinloop [20:16] coolers, maybe both [20:17] oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed [20:17] I should probably enlarge the swapspace [20:21] swap sucks, but it's better than none I guess [20:21] Or maybe not, maybe it's better for it to go get OOM'd [20:24] I don't know [20:24] next time the box falls over completely I'll take the occasion to rejigger the disk allocation [20:46] -------------------------------------------------- [20:47] BETA OF THE NEW WAYBACK MACHINE AVAILABLE [20:47] http://web-beta.archive.org/ [20:47] Please pound on it, per Brewster's invite. [20:47] Let me know if you run into anything. [20:47] -------------------------------------------------- [20:51] whatall's different? [20:51] 50% more data [20:51] Right up to the moment. [20:51] http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/ [20:51] sweet, some of the mess wiki content is there [20:51] spiffy [20:56] http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray [20:56] SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred. [20:56] http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links. [20:59] SketchCow: Is this a new Wayback Machine or a new Liveweb? [21:00] http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes [21:00] What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc [21:00] available under the crawldata keyword [21:01] http://wayback-beta.archive.org/web/*/http://goatse.cx/* throws up an error [21:02] also the display of urls is a little screwy [21:02] http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness [21:03] You can see the insanity of us on May 1-5 [21:03] Followed by sad little crawls of a dead site [21:03] whoa insanity indeed [21:04] and march [21:04] ha ha, yes [21:06] Sounds like the MESS wiki info can be transferred back [21:07] http://wayback-beta.archive.org/web/*/http://www.nytimes.com/ [21:10] parts of it anyway [21:11] there was a lot of deeply nested stuff unfortunately [21:16] this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems [21:18] took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool" [21:27] nice! [21:46] SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available? [21:51] http://wayback-beta.archive.org/web/*/http://www.fortunecity.com/* hung my Firefox Instance >_> [21:51] and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination." [22:09] * SketchCow is on the phone with an archive about donating his stuff to an archive [22:09] (some of it) [22:17] it would be nice if the new wayback frontend allowed at least URL grep [22:18] since I know fulltext grep would be really, really difficult [22:19] wait, that's there :P [22:19] didn't think I saw it before [22:22] url grep?!? [22:22] neato [22:51] Uploading downloaded FTP sites [23:46] so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc