Time |
Nickname |
Message |
00:22
🔗
|
SketchCow |
I have even more than you |
00:33
🔗
|
Nemo_bis |
6 hours? Peanatus. My book took one month. :p |
00:42
🔗
|
godane |
i'm uploading amigahistory.co.uk |
00:43
🔗
|
godane |
not many crawls in wayback machine |
00:53
🔗
|
godane |
i'm grabing arstechica.com index |
00:55
🔗
|
godane |
uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror |
02:15
🔗
|
dashcloud |
so is there an easy way to ask an FTP site how big it is? |
02:15
🔗
|
SketchCow |
no |
02:22
🔗
|
godane |
does any one know how to cat a file and just echo what ends with / at the end of the line? |
02:23
🔗
|
godane |
my arstechnica.com index.txt file has a lot of bad urls |
02:24
🔗
|
godane |
these urls are going be redirect other urls in the list anyway |
02:42
🔗
|
chronomex |
godane: try grep '/$' whatever.txt |
02:43
🔗
|
chronomex |
$ means "here must be end-of-line" |
02:43
🔗
|
chronomex |
^ is the same but for beginning-of-line |
02:43
🔗
|
dashcloud |
hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process |
02:57
🔗
|
SketchCow |
Might need to hide who you are |
03:03
🔗
|
godane |
chronomex: that only grabs the last line |
03:09
🔗
|
dashcloud |
ah- I got it- apparently I wasn't timed out from my last login using a non-wget client |
03:14
🔗
|
dashcloud |
making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though) |
03:15
🔗
|
chronomex |
godane: ??? ummmmm not sure what kind of unix you're using |
03:18
🔗
|
godane |
i'm doing grep '/$' index.txt |
03:18
🔗
|
godane |
the only line that comes up is the last one in list |
08:15
🔗
|
chronomex |
hm, are you sure that it's not correct? |
14:51
🔗
|
SmileyG |
http://www.savewalterwhite.com/ |
17:03
🔗
|
soultcer |
chronomex: Is the tracker still OOM-ing? |
19:39
🔗
|
chronomex |
augh, it is |
19:44
🔗
|
chronomex |
hm, seems to have fallen over hard this time |
19:49
🔗
|
chronomex |
ok it's back |
19:49
🔗
|
ersi |
awesome, thanks man |
19:49
🔗
|
chronomex |
alard and I will have to discuss how to make this not happen |
20:00
🔗
|
ersi |
chronomex: Well, we're back at HTTP 599 |
20:00
🔗
|
ersi |
:< |
20:00
🔗
|
chronomex |
fuqqq |
20:01
🔗
|
ersi |
Cocks. Huge cocks. In a bowl. A Bowl of Cocks. |
20:01
🔗
|
ersi |
In other words, cockbowl. |
20:02
🔗
|
chronomex |
you sure? the website works |
20:02
🔗
|
ersi |
Maybe it's just my seesaw pipeline that has fucked up, let me restart that |
20:02
🔗
|
ersi |
but I'm basically getting a lot of connection refuses |
20:02
🔗
|
chronomex |
hm |
20:03
🔗
|
ersi |
res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2") |
20:03
🔗
|
ersi |
tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused |
20:03
🔗
|
chronomex |
I just kicked redis and nginx, maybe they started in the wrong order or something |
20:04
🔗
|
chronomex |
ah, I guess I need to start another daemon? |
20:04
🔗
|
ersi |
seems to be fucked up for me still unfortunally, oh well |
20:04
🔗
|
ersi |
mayhapples |
20:04
🔗
|
ersi |
seems to be the user discovery stuff |
20:04
🔗
|
ersi |
which very well might be seperate |
20:06
🔗
|
chronomex |
ok, try now |
20:08
🔗
|
ersi |
lots better |
20:08
🔗
|
ersi |
hugs and kisses etc |
20:09
🔗
|
chronomex |
\o/ |
20:09
🔗
|
chronomex |
it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu |
20:10
🔗
|
chronomex |
what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive |
20:13
🔗
|
SketchCow |
Weird. |
20:13
🔗
|
SketchCow |
I got a slight reprieve on the DEFCON documentary |
20:14
🔗
|
SketchCow |
So I can spend a little more time on archiveteam projects and things and stuff. |
20:15
🔗
|
ersi |
chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections. |
20:15
🔗
|
chronomex |
no, I think some daemon on my side goes into spinloop |
20:16
🔗
|
ersi |
coolers, maybe both |
20:17
🔗
|
chronomex |
oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed |
20:17
🔗
|
chronomex |
I should probably enlarge the swapspace |
20:21
🔗
|
ersi |
swap sucks, but it's better than none I guess |
20:21
🔗
|
ersi |
Or maybe not, maybe it's better for it to go get OOM'd |
20:24
🔗
|
chronomex |
I don't know |
20:24
🔗
|
chronomex |
next time the box falls over completely I'll take the occasion to rejigger the disk allocation |
20:46
🔗
|
SketchCow |
-------------------------------------------------- |
20:47
🔗
|
SketchCow |
BETA OF THE NEW WAYBACK MACHINE AVAILABLE |
20:47
🔗
|
SketchCow |
http://web-beta.archive.org/ |
20:47
🔗
|
SketchCow |
Please pound on it, per Brewster's invite. |
20:47
🔗
|
SketchCow |
Let me know if you run into anything. |
20:47
🔗
|
SketchCow |
-------------------------------------------------- |
20:51
🔗
|
chronomex |
whatall's different? |
20:51
🔗
|
SketchCow |
50% more data |
20:51
🔗
|
SketchCow |
Right up to the moment. |
20:51
🔗
|
Deewiant |
http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/ |
20:51
🔗
|
DFJustin |
sweet, some of the mess wiki content is there |
20:51
🔗
|
chronomex |
spiffy |
20:56
🔗
|
SketchCow |
http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray |
20:56
🔗
|
swebb |
SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred. |
20:56
🔗
|
swebb |
http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links. |
20:59
🔗
|
ersi |
SketchCow: Is this a new Wayback Machine or a new Liveweb? |
21:00
🔗
|
SketchCow |
http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes |
21:00
🔗
|
ersi |
What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc |
21:00
🔗
|
ersi |
available under the crawldata keyword |
21:01
🔗
|
DFJustin |
http://wayback-beta.archive.org/web/*/http://goatse.cx/* throws up an error |
21:02
🔗
|
DFJustin |
also the display of urls is a little screwy |
21:02
🔗
|
SketchCow |
http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness |
21:03
🔗
|
SketchCow |
You can see the insanity of us on May 1-5 |
21:03
🔗
|
SketchCow |
Followed by sad little crawls of a dead site |
21:03
🔗
|
chronomex |
whoa insanity indeed |
21:04
🔗
|
chronomex |
and march |
21:04
🔗
|
SketchCow |
ha ha, yes |
21:06
🔗
|
SketchCow |
Sounds like the MESS wiki info can be transferred back |
21:07
🔗
|
SketchCow |
http://wayback-beta.archive.org/web/*/http://www.nytimes.com/ |
21:10
🔗
|
DFJustin |
parts of it anyway |
21:11
🔗
|
DFJustin |
there was a lot of deeply nested stuff unfortunately |
21:16
🔗
|
DFJustin |
this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems |
21:18
🔗
|
DFJustin |
took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool" |
21:27
🔗
|
chronomex |
nice! |
21:46
🔗
|
ersi |
SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available? |
21:51
🔗
|
ersi |
http://wayback-beta.archive.org/web/*/http://www.fortunecity.com/* hung my Firefox Instance >_> |
21:51
🔗
|
ersi |
and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination." |
22:09
🔗
|
* |
SketchCow is on the phone with an archive about donating his stuff to an archive |
22:09
🔗
|
SketchCow |
(some of it) |
22:17
🔗
|
balrog_ |
it would be nice if the new wayback frontend allowed at least URL grep |
22:18
🔗
|
balrog_ |
since I know fulltext grep would be really, really difficult |
22:19
🔗
|
balrog_ |
wait, that's there :P |
22:19
🔗
|
balrog_ |
didn't think I saw it before |
22:22
🔗
|
chronomex |
url grep?!? |
22:22
🔗
|
chronomex |
neato |
22:51
🔗
|
SketchCow |
Uploading downloaded FTP sites |
23:46
🔗
|
dashcloud |
so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc |