[01:47] Hey i'm not sure if you guys heared [01:47] "As of January 31, 2012, all waves will be read-only" [01:47] "and the Wave service will be turned off on April 30, 2012" [01:47] via google wave email [02:04] We have. [02:04] Coderjoe: whoops. [02:18] archiving google wave now?, can i get in on that? [02:18] same [03:29] Looks like splinder is going to be cutting it close. How long do we got? [03:49] id like to know as well if were going to make it [03:50] for Splinder? [03:50] What is [03:50] Splinder? [03:50] http://www.archiveteam.org/index.php?title=Splinder [03:51] How big are the downloads? [03:52] highly variable [03:52] we've gotten most of the 1-10 MB ones already [03:52] ok, i'll give it a shot [03:52] the accounts coming in now are much larger [03:52] we've got until January 31, 2012, though, so don't go crazy [03:52] Poor waves [03:53] the last time we had a huge flood of people hitting Splinder, their infrastructure couldn't handle it [03:53] SketchCow, agreed [03:53] and so we ended up with a lot of incompletes [03:53] I think Splinder is well at hand. [03:53] yeah, it's good [03:53] yipdw^, ok i'll wait a little [03:53] We have plenty of people on it, enough to get it [03:53] ok [03:55] was it actually possible to make a public wave? [03:55] the waves I've got are all restricted to a small group [03:56] Oh, wait, really? [03:56] That was my concern :/ [03:57] I thought Splinder was supposed to go down like...today or something. [03:57] the official announcement is January 31 [03:59] I didn't notice Splinder on the projects wiki page [03:59] oh, speaking of which, I wonder how bad my AWS bill now is [04:00] ooh. [04:00] well it's not as bad as the Friendster days [04:02] actually, I say that most of the small Splinder accounts have been grabbed, but according to http://splinder.heroku.com/ kennethre is still pulling a bunch of them [04:02] weird [04:02] maybe they're all requeues [04:04] Yeah, a couple more people showed up and owned all over my measly 29GB ;) He's really kicking ass. [04:04] he's controlling something like 360 dynos at Heroky [04:04] er, Heroku [04:04] fuck LOIC, that man can nuke sites from orbit single-handedly [04:05] Wow. [04:05] I think I'm up to...150 threads or so? [04:06] (Had to google for what a dyno was, heh) [04:06] oh heh [04:06] he might be doing more, it was all in the #splinder logs at one point [04:07] er, is in the logs [04:13] Splinder shifted the date. [04:13] That's what's missing. [04:18] PatC: I took Splinder out of the "Projects with BASH scripts that need more people running them" when it hit 0 left. Even now, I think it's pretty well covered. Not that it shouldn't be elsewhere on the projects page. [04:21] Ok [05:08] Is there anything out there that is likely to have old DNS info cached or otherwise accessibly saved? [05:21] what info and how old? [05:25] Oh, I'm trying to verify the IP for a site as of...about a month ago? Some reseller on our super legacy Sphera platform went and canceled or something and left one of his clients high and dry. [05:27] i'm not aware of anything that would have data that old [05:27] I'm trying to figure out if we've even got a backup, but none of the usernames she's mentioned exist. And compared to how things are on some servers, we have amazingly good backups for this one. I've checked back to June. [05:28] whois.sc claims to have some historic IP data, but that's $30/month for their stuff. [05:36] one month is old for dns [05:36] wow [05:36] So, you know what would be cool [05:36] Craigslist archive [05:37] Wyatt|Wor: have you tried netcraft [05:37] Isn't Craigslist actively hostile to pretty much anything that uses Craiglist? [05:37] Not that that should stop us, but it would make it difficult, I think. [05:38] DFJustin: Not yet; taking a look now [05:38] Paradoks: WE FUCKIN' ARCHIVED POETRY.COM, GET THAT BLASPHEMY OUTTA HERE [05:38] Just kidding [05:38] But still, I bet we could do it [05:39] I mean, it's not like we're posting [05:39] Paradoks: yes [05:39] That's an interesting moving target. Don't they delete older things? [05:39] they are hostile to anything that uses craigslist [05:39] they go after people who make abstraction layers that make craigslist more useful and I just don't get that [05:39] I tried to put my desk on craigslist. It was surprisingly difficult. [05:39] can i get a copy of poetry.com? is that availible yet? [05:39] they don't insert ads or anything [05:39] We archived a small portion of poetry.com, which was totally worth it. And it was totally cool how irritated we made the bastards by attempting to save the information they wanted to destroy. [05:40] they wanted to /sell/ [05:40] the only worth that company had was their user content [05:40] is that archive public? [05:40] http://www.archive.org/details/archiveteam-poetrydotcom [05:41] this isn't everything we got though [05:41] which was already publically availible anyway, so how is it worth anything to sell it? [05:41] without the content, what are they? [05:42] DFJustin: so where's the rest.. just curious [05:42] a poetry website with no users and nothing to read [05:42] i.e. worthless [05:42] Paradoks: Go ahead and rejoin #Magicallydelicious [05:42] bsmith: Theoretically they controlled the publicly available content. [05:42] pending sketchcow organization [05:42] (if you want) [05:42] I'm nearly done organizing that [05:42] there's some value in an english word domain name I'm sure [05:42] Heheh. I mostly just checkin to see if you're still there. I do wonder if delicious will ever get suitably archived, though. [05:43] Coderjoe: Spread some ops around? we're getting low [05:43] ok but theoretically aol owns happy birthday, doesnt mean they can meaningfully enforce that [05:43] they CAN [05:43] nobody in tv or movies can sing it without mpaying, for example [05:43] -m [05:45] I like to sing, "Good morning to you" and let people infer what's needed. [05:46] bsmith: re: delicious -- It seemed quite likely, once upon a time, that Yahoo was going to shut down delicious and the various user-created lists would be lost. Where user-created content is threatened, there Archive Team will go. [05:49] who here is responsible for the heroku deal [05:50] I don't really understand what heroku is but it looks expensive! [05:51] kennethre is [05:51] But it's free because he's using the "one free dyno" thing [05:51] awesome [05:51] dnova: And Alard set up the splinder/anyhub/mobileme.heroku.com tracking things. [05:52] alard is beyond amazing [05:53] Agreed. [05:53] im running dld-streamer 40, am i making any kind of a dent at all? [05:53] underscor: spread ops where? [05:54] bsmith095: you can run 200-500 threads on even pretty modest hardware if you are ok with using lots of cpu [05:54] Oh yeah, anyhub. :( looks like we only got about half of it? [05:55] bsmith095: You're still getting noticeable quantities of data, though. It's still useful, even if you don't make the top 10 downloaders. [05:56] I wish the heroku thing had stats for all participants [06:00] but yeah, bsmith, another thing is the deadline was extended bigtime so it's not a serious crunch for us now so don't worry too mnuch about going nuts with the # of threads [06:01] just restarted streamer up to 200 and wow this laptop is running great! [06:01] ahhh excellent :D [06:01] I wonder how many kenneth is running!! [06:01] Huh, was there a heroku for mobileme? [06:02] mobileme.h.c isn't working. [06:02] sounds like the person doing it can only have 1 at a time [06:03] Wyatt|Wor: memac.heroku.com [06:03] Ah [06:04] How's their bandwidth? I'll have to get in on that once I consolidate all my Splinder. [06:05] Wyatt|Wor: how is whose bandwidth? [06:06] alard is going to have to dump more claimed but unreturned users into the project again, right? [06:06] probably a few times? [06:06] I guess mobileme? [06:06] I don't think we have seen any indication that mobileme was failing under the load [06:06] but then we hadn't really put out the word yet when splinder came down the line [06:07] yeah this is the first I'm hearing anything about mobileme [06:07] I've got 130 megs of compressed poetry left to sort, out of 1.3 gigs [06:08] Wiki says 200TB by June something...that's a hell of a pull. [06:08] fuck. [06:08] I'm gonna need more space on my VPS. [06:09] heh, yea [06:09] I'm running very low myself [06:09] only 225 gigs left [06:10] even with today's obscenely huge storage capacity, thats still maybe 5 or 6 cubic feet of *really* huge drives [06:10] that's only 50 hard drives [06:10] I don't know if 50 hard drives is 5 cubic feet [06:10] wtf?!?! 50 where do you shop? [06:11] 4tb hard drives are around [06:11] hmm [06:11] I've seen 4tb externals, but those were two drives in an enclosure [06:11] there are real actual 4tb hard drives [06:12] Filesystem overhead. Probably closer to 60 if you use exclusively 4TB disks. [06:12] pff ok :P [06:12] *really*, ive heard of 2tb but never actually seen one, but i bought a 1500gb one for $65 on ebay, then it fell off a table, boom, click of death, bought a dollar per mb, to get it back?! [06:12] That and things usually don't fit neatly. [06:12] pardon me, dollar per gig, but still, damn clean rooms [06:13] 3tb also exists and has for a while [06:13] just got my 2tb luckily before the floods hit [06:13] I have a 2GB on my desk at home. It's at least half for archiveteam stuff [06:13] how do we keep doing this> i mean the tech specs of how theey cram that many little magnetic flux thingies on a spinning platter? [06:13] now I need to fill it with mobileme [06:14] Wyatt|Wor: u mean tb right? [06:14] Err, I have a 2TB. I haven't had a 2GB in forever. [06:15] hrm, there's rar file in here [06:15] where and what collection [06:15] in the poetry collection [06:15] I'm unifying all of the archives produced by individual downloaders into a single collection [06:15] really i just wnloaded that [06:15] the next one is a rar file [06:16] db48x: what do u mean unifying, they're only 4 chunks [06:16] 3 chunks [06:16] that's probably my piece [06:17] bsmith095: there are lots of others that haven't yet been uploaded to archive.org [06:17] u mean the poetry.xom archive [06:17] -x +c [06:18] yes [06:20] http://washingtondc.craigslist.org/nva/sad/2714010065.html [06:20] I really want to apply for that :V [06:22] apply [06:23] I have school haha [06:24] Also, I don't have 7 years of "professional" linux experience [06:24] I mean, I've used it for that long, but [06:24] fuck that' [06:24] haha [06:24] you'd be the best person they interviewed if you were going to do it [06:34] Wow, They're in VA? [06:34] Interesting. [06:54] doh [06:56] my computer locked up :( [06:56] I can't even ssh in [06:56] on the other hand, it is still routing traffic [07:03] An interesting issue. [07:16] I guess I'm going to have to reboot it [07:17] highly annoying [07:27] although it does still have disk activity [07:28] Oh my, now THIS is quite cool http://www.unseen64.net/ [07:33] Wyatt|Wor: cool [07:51] underscor: the "experiance" part is always bullshit [08:33] a genius switched off power in my house, I have over 30 GiB of incomplete users which I won't be able to complete [08:33] what should I do? [08:34] hibernate your computer [08:36] uh? [08:36] It was switched off and now rebooted [08:36] oh [08:36] hmmm. [08:37] Yikes [08:37] I thought you were on UPS or something [08:38] * Nemo_bis hates power company that switches off your power when you go over 3 kW without any warning :-X [08:38] that's fucked, man [08:38] no, that would be supposed to be unnede [08:38] fucked. [08:38] lol, what?! [08:38] 3000W it kicks a breaker? [08:38] So they.. kill your power.. because you're using it? Or have unpaid bills? [08:38] that's two heaters! [08:41] no, just because you go over your power quota [08:41] 4.5 kW costs more [08:41] where is this? I assume you live on an island or something [08:42] or in australia where utilities seem to suck universally [08:42] Italy [08:43] it's a long story, starting in 1960s with the nationalization of energy companies [08:43] aha. [08:44] where I live they charge by the kwh on an increasing scale and go after you eventually if you become unreasonable [08:44] the same here [08:44] but this is another issue [08:44] three fucking kilowatts [08:45] it's a good thing, in general, but if only they gave you one or two beeps to warn you [08:45] and it was worse before, the old mechanical counters were stricter [08:45] Only the internet sucks here is Australia [08:47] Nemo_bis: give alard the output of check-dld.sh [08:47] and just finish up as much as you can [08:47] yipdw^, ok [08:47] I'd tar the incomplete users and upload them to batcave, makes sense? [08:47] safest thing to do is to just requeue the incompletes [08:47] nah [08:48] I don't want to just delete them [08:48] well, I don't think it makes sense -- we have until January 31 to get them [08:48] it's just that everything else on batcave is stuff that, until we get a closer look at it, is presumably complete [08:49] uploading known incomplete data throws a wrench into that [08:49] I'd put them in a different directory and in a different format [08:50] I guess it's fine if it's labeled [08:50] just make sure SketchCow knows [08:50] I think I get the same price per kwH.. unlimited.. [08:51] I mean, we have a contract which specifies a price per kilowatt hour.. [08:51] that's reasonable [08:52] here there's a really low tier for like 100kwh, then it jumps by like 2x for the next few thousand kwh, then it kind of plateaus slightly lower [08:52] the jump in the middle is to pay for the cost of residential service [08:52] I bet if I consume lots.. I'd get a discounted rate :) [08:53] yea but you'd have to pay for all that power [08:53] I remember talking to some dude in Berlin, Germany - who had an industrial power contract to his flat.. Hehe [08:53] Then again.. he had a cluster of machines in his flat :p [09:58] ersi: germans take their porn viewing seriously [10:13] is batcave down? [10:15] How many threads does that kenneth guy have going again? I'm close to 300 and nowhere NEAR his rate. [10:18] ah, works again [10:57] Wyatt|Wor, +1 [10:58] Nemo_bis: ? [10:58] Wyatt|Wor, about kenneth [10:58] Ah [10:58] the good thing is, I don't need to do this job any longer if he goes on like this [10:59] alard, this is the situation of incomplete users (+ about 5000 us users whose errors were not detected due to the locale problem when the site went down): http://p.defau.lt/?yd3MFeDi91WK6IU007B11A [10:59] Nemo_bis: you can use dld-streamer.sh to start fixing those [11:00] also, alard (and others), FYI, this is how heavy they are: http://p.defau.lt/?fCQq9XenY5_0s3yBtHse_Q -> 58207 MiB total [11:00] find -name '.incomplete' | cut -d '/' -f 3,7 | tr '/' ':' >incompletes [11:00] no, I won't [11:00] ./dld-streamer.sh incompletes [11:01] I don't have disk space enough, they were killing my memory and someone else with better tools should do such monster users apparently [11:01] ah [11:01] in that case [11:01] they've been running for ten days in some cases [11:01] build your list of usernames as above, then go to splinder.heroku.com/rescue-me :) [11:02] db48x, what does it do? [11:02] re-add to tracker? [11:02] Hi all. [11:02] hello [11:02] yea, lets you add them back into the tracker [11:02] ok [11:02] they probably already were, but that way you'll know for sure [11:02] Actually, rescue-me doesn't re-add things. [11:03] oh? [11:03] so, alard, can you put those users back in the queue? [11:03] It's for adding unknown usernames to the tracker. [11:03] ah [11:03] (if they are not already, because you mentioned putting back 2+ days old users, and mine were 10 days old) [11:03] (rescue-me is useful for mobileme, where people say 'could you save user X'?) [11:04] Nemo_bis: As long as your client hasn't marked them done they will be added back to the queue. [11:04] alard, those 5000 us users have been marked done [11:05] Ah, I see. There may be more of those, so we'll have to do some checking later on. (Still time until January, right?) [11:05] I'll add your list back to the todo list. [11:08] does rsync want input list of files with null or newlines? [11:09] Newlines, surely? [11:10] Ah, you can configure it: -0, --from0 [11:10] CR+LF [11:10] This tells rsync that the rules/filenames it reads from a file [11:10] are terminated by a null (’\0’) character, not a NL, CR, or [11:10] ok [11:12] Also: I've been playing with a new wget project yesterday, see http://www.archiveteam.org/index.php?title=Wget_with_Lua_hooks . If you have any suggestions or comments, please add them to the wiki. [11:12] SketchCow, I'm uploading those 58 GiB of incomplete users to my splinder-broken directory in case someone can use them in some way (10 days of downloads, argh) [11:13] only 250 KiB/s now :-/ [11:13] see you all [11:17] All right, shift over. I'll let these run and catch everyone on the other side! [12:18] thearchiveteam by proxy http://englishrussia.com/2011/11/25/things-that-must-not-be-forgotten/ [12:25] Soojin: cool [12:31] wait, are there any unassigned users? [12:34] NotGLaDOS: yes, but not many [12:34] drat. [12:34] heh [12:34] there are 13000 left [12:35] http://splinder.heroku.com/ [12:35] they might last another hour or two [12:37] Well, I can always help! [12:37] the more the merrier :) [12:40] Plus, the server's in Romania, so it should help with latency or somethi- oh, wait, only Australians have a cable running along the seafloor [12:40] Nevermind! [12:43] I shall do... 10 [12:43] 10 concurrent sessions [12:46] downloads* [12:47] only 10? [12:48] fine [12:48] I'll do 1000 [12:49] * NotGLaDOS winds down script [12:49] "NotGLaDOS it:maurizio71" [12:49] \o/ [12:50] ...I got a 1GB user, didn't I [12:50] if it said 1000MB, then yes [12:51] most of them aren't that large though [12:51] nope, it choked on a 0MB user. [12:51] * NotGLaDOS is not amused [12:51] what do you mean by choked? [12:51] as in, decided to take as long as it wanted to [12:51] oh, yea [12:52] the server isn't very fast [12:52] It shouldn't do that, as I'm not tunnelling through that server [12:52] That's what I have Cameron_D for. [12:52] and even the users with the least amount of data require several http connections [12:52] ah. [12:53] well, once this winds down, screen -dmS splinder ./dld-streamer.sh NotGLaDOS 1000 [12:53] While I wait, canabalt time! [12:54] heh [12:54] 1000 is pretty optimistic [12:54] I'll probably starve my ZNC users. [12:54] Oh well! [12:55] indeed :) [12:55] IT'S IN THE NAME OF ARCHIVING! [12:55] and that's the best excuse there is [12:56] Indeed! [12:56] mmm [12:56] this disk is 97% [12:56] 4 to go until it's finished winding down. [12:58] probably time for me to wind down as well [13:00] wait, is it doing them simultaneous- of course it is. [13:01] yea [13:02] hrm [13:02] oh dear [13:02] I just get the feeling that the tracker has dumped a large one in as the last one for fun. [13:02] hm? [13:02] I have 56GB free, and although I just stopped my downloaders, I have 100 threads left winding down [13:03] I'm doing the mobileme project, and those users are larger [13:03] crap [13:03] estimated size is 63GB [13:03] yer screwed. [13:03] yep [13:03] "it:habbo" [13:03] I feel sorry for that guy [13:04] heh [13:04] there have been a lot of weird usernames [13:18] Mmm, popcorn [13:19] Crap, this is going to finish spinning down by the time they're all gone, aren't they? [13:19] * NotGLaDOS shakes fist at kenneth [13:19] heh [13:20] just run another one [13:20] in another terminal [13:20] ..that'll work? [13:20] ooh! [13:20] they won't get in each other's way [13:20] * NotGLaDOS uses screen anyway [13:20] indeed :) [13:21] hmm [13:21] time to do my 1000 connection dream, and knock myself off of anything that doesn't go through this HTTP proxy! [13:21] bye, Cameron_D! [13:22] I have 1000GB of friendster data [13:22] "downloading it:chinachina" [13:22] chinachinachinachinachinachinachinachina [13:22] db48x: nice [13:23] hrm [13:23] Welp, there goes my dreams [13:23] 995GB of it is already compressed though [13:23] "Cannot allocate memory" [13:23] "TERMINATE ALL THE SELFS" [13:23] that is a lot of wgets [13:24] ...right, forgot about that [13:24] maybe 100? [13:24] it got up to 170 before derping [13:24] 100 is a good start [13:24] check memory usage, iowait and bandwidth and then adjust [13:25] ...wait, did that just allocate 170 users to me that will never complete? [13:25] yes and no [13:25] at some point we'll add them back to the queue [13:25] Oh, phew. [13:25] or you can collect the list (find -name '.incomplete' | cut -d '/' -f 3,7 | tr '/' ':' >incompletes) and run it through dld-streamer [13:25] "180GB traffic" [13:26] It was at 90 this morning! [13:26] \o/ [13:26] :) [13:26] Wait I haven't archived that much. [13:28] Time to check my TCP buffers! [13:28] Actually not that bad. [13:28] bandwidth, however, just gets a 2Gigabit spike randomly \o/ [13:29] hehe, kernel memory just jumps to 70M [13:34] 2 Gb? are splinder servers so robust? [13:35] No idea [13:35] They had over 1.2 billion users, they would've had to handle that traffic. [13:36] Oh, I can just hear my server screaming for mercy. [13:36] no, 1.3 million [13:36] only off by three orders of magnitude [13:36] drat. [13:37] >MFW it's only number 99 and 100 doing the work [13:37] Oh well, back to canabalt [13:39] awesome, I got all the poems sorted [13:40] Nice [13:41] there are two tarballs left [13:41] one contains their blog [13:42] one contains a categorization of the poems [13:48] oh, and a third that probably is a mixture of files from the site and poems, hrm [13:52] "PID 5766 finished 'it:Barbabietole_Azzurre': Error - exited with status 6." [13:52] My first error! [13:53] * NotGLaDOS feels special [13:54] :) [13:54] And another one! [14:03] Crap. [14:03] I'm starting to get a lot of error 6 [14:11] Down to 1500 (for now) [14:12] It'll go up. [14:12] Oh I know [14:12] Script crashed when I set limit to 1000 [14:12] There's about 180 users not done [14:12] fact: I am a moron [14:14] 25 GB free [14:14] So here's a question: are the mobileme downloads larger in terms of filesize? I have a strong feeling that saving large files are going to be where a beefy datacentre connection will really shine. [14:15] Wyatt: much bigger [14:15] Wyatt: the current average is 650 MB/user [14:16] http://memac.heroku.com/ [14:16] down from 652 MB/user an hour or two ago [14:16] No, I mean, are individual files going to be larger? [14:17] yea, it was a file syncing service, not a weblog host [14:17] Or is it going to be a large number of small files (a lot of http requests; less benefit from fat pipes) [14:17] Ahh [14:17] Okay, neat [14:17] There should be passwords in there.. [14:17] NO, DONT THINK LIKE THAT. [14:18] I'm going to wind my script down. [14:18] * db48x yawns [14:18] yea, we're starting to overload it [14:19] two status 6s in a row! [14:20] I need to cough up for a new zfs array [14:22] 200 to go! [14:22] then I can have one for the archives and one for my own stuff [14:22] Then we can re-add! [14:23] Soon, kenneth will run out of users, so I'll be all over it [14:23] MUAHAHAHA [14:23] :) [14:23] wait, status 5? [14:23] [2011-11-25 14:24:09+00:00] 84/100 PID 15466 finished 'it:Spazio_ai_Giovani': Error - exited with status 5. [14:24] Now it's just spitting status 6s and 5s at me [14:24] the proxies are dying [14:24] ah [14:25] happened a couple of days ago too [14:28] ...so we're doing negative users now.. [15:09] * db48x yawns [15:09] well, I must sleep [15:09] happy archving [15:12] o/ [15:12] 10 processes to wind down, then sleeeeeep [15:12] * PatC looks at the clock [15:12] *1013* [15:13] 2312 here. [15:13] ahh [15:13] Aussy? [15:15] West Australian. [15:15] Cool! [15:37] meh, it:vanillaaa finish already! [16:20] mornin [16:26] Mornin' [17:01] alard: That looks freakin' awesome [17:03] (the lua hooks) [17:25] here's the site for that russian guy with the phonographs, I think http://staroeradio.ru/collection [21:54] http://metaception.com/pepper [22:26] dld-streamer automatically retries incompletes? Is that what I was told yesterday? [22:28] automatically? not unless someone added that since wed. [22:29] dld-single retries 5 times [22:29] Ah, okay [22:30] So do that find thing [22:31] dld-streamer as of wednesday has an optional parameter to provide a list of items to fetch (rather than ask the tracker) [22:35] i found something called js-wikireader [22:36] https://github.com/antimatter15/js-wikireader [22:39] godane: I went to governor's school with him! [22:39] haha [22:43] http://www.youtube.com/watch?v=e3KIyXuZJGY [22:51] SketchCow: in which i learn that you are awesome, teresa has been a friend of mine for some years, we were talking about some things, i pointed her to a speech of yours in which you mentioned geocities, and she was like "hey i had some stuff on geocities" and i was like "yeah, talk to jason, he might have it" [22:52] YOU HAD IT [22:54] SketchCow: i also might be off work for a couple of weeks again, so if there's stuff that needs describing, i might be your man [23:19] back from t-giving.. have splinder users still downloading, incredible [23:20] a lot of them were injected back into the todo queue [23:21] i still have downloads still going on too, though, which is nuts [23:22] -rw-rw-r-- 1 ec2-user ec2-user 110807580 Nov 25 23:22 splinder.com-Rei-chan-blog-touchingthestars.splinder.com.warc.gz [23:22] /home/ec2-user/splinder-grab/data/it/R/Re/Rei/Rei-chan [23:22] [ec2-user@ip-10-80-146-172 Rei-chan]$ ls -l *warc* [23:22] [ec2-user@ip-10-80-146-172 Rei-chan]$ pwd [23:24] heh, mine lasted up to ten days (and counting) [23:24] man... I started the ec2 instance thinking "oh, this will only be 5 days at most..." [23:24] er, wait [23:24] Rei-chan is actually a busted account [23:24] check this out: http://www.splinder.com/myblog/comment/list/4212591/48159251?from=400 [23:25] before splinder extended their closure date [23:25] try to click any of the navigation links [23:25] you will be sent to the same page [23:25] wtf [23:25] yay. spidertraps. [23:25] is it? [23:25] the account does have some legitimate content in it [23:25] I haven't looked yet [23:25] or something that looks human-generated [23:26] i didn't mean an intentional trap [23:26] oh [23:26] friendster had a lot of accidental shit that created spider traps [23:26] alard: I think we need a way of flagging accounts as "cannot archive fully" or some such; see http://www.splinder.com/myblog/comment/list/4212591/48159251 and click the navigation links for an example [23:27] server on fire? [23:27] well [23:27] alard: come to think of it, maybe we don't, because iirc wget doesn't try to retrieve URLs it's already seen [23:27] or does it [23:28] it shouldn't [23:28] I mean, it shouldn't, assuming that it assumes GET is idempotent [23:28] and it shouldn't go offsite, either [23:28] so this should complete at some point, it'll just be a fucking long grab [23:28] which of those links goes to another splinder site? [23:28] the spam links? [23:28] I don't know [23:28] none, as far as I can tell [23:29] haha [23:29] "penis van lesbian" [23:29] is that like an ice cream truck? [23:29] I was thinking Dick van Patten [23:30] oh fuck [23:30] 2011-11-25 23:28:42 URL:http://www.splinder.com/splinder_noconn.html [1402/1402] -> "./tmpfs/it/Rei-chan/www.splinder.com/splinder_noconn.html" [1] [23:30] I hope that doesn't mean I missed something [23:30] ... [23:30] noconn? great [23:31] yeah [23:31] is that an overload error from a reverse proxy gateway? [23:31] it's a maintenance page [23:31] fuck on a stick [23:31] fuck HTTP status codes, we're doing this web style [23:32] I don't know what it is [23:32] but I just saw it in the Rei-chan wget log [23:32] that page looks simlar to the US page [23:32] checking others [23:33] doing a massive rgrep [23:34] er wait [23:34] I just want the logs [23:34] i r smrt [23:34] well done [23:34] ugh, this isn't looking good on my end [23:35] https://gist.github.com/a15c7707ee666502a825 [23:38] looking quite bad here, too [23:39] hmm [23:39] not so bad for me it seems [23:40] https://gist.github.com/0427b4ed12ae48f2fb5f [23:40] at home. let's check the ec2 [23:41] Coderjoe: when did they start happening [23:41] * Nemo_bis has some as well :-( [23:42] WyattL yo, around? [23:42] Wyatt: yo, around? [23:45] closure: Yeah? [23:57] any idea why the check/fix scripts only check the us profiles for 502/504? [23:58] well, 500 errors, not just 502/504