#archiveteam 2011-11-25,Fri

↑back Search

Time Nickname Message
01:47 🔗 PatC Hey i'm not sure if you guys heared
01:47 🔗 PatC "As of January 31, 2012, all waves will be read-only"
01:47 🔗 PatC "and the Wave service will be turned off on April 30, 2012"
01:47 🔗 PatC via google wave email
02:04 🔗 NotGLaDOS We have.
02:04 🔗 NotGLaDOS Coderjoe: whoops.
02:18 🔗 bsmith095 archiving google wave now?, can i get in on that?
02:18 🔗 PatC same
03:29 🔗 Wyatt|Wor Looks like splinder is going to be cutting it close. How long do we got?
03:49 🔗 bsmith095 id like to know as well if were going to make it
03:50 🔗 yipdw^ for Splinder?
03:50 🔗 PatC What is
03:50 🔗 PatC Splinder?
03:50 🔗 yipdw^ http://www.archiveteam.org/index.php?title=Splinder
03:51 🔗 PatC How big are the downloads?
03:52 🔗 yipdw^ highly variable
03:52 🔗 yipdw^ we've gotten most of the 1-10 MB ones already
03:52 🔗 PatC ok, i'll give it a shot
03:52 🔗 yipdw^ the accounts coming in now are much larger
03:52 🔗 yipdw^ we've got until January 31, 2012, though, so don't go crazy
03:52 🔗 SketchCow Poor waves
03:53 🔗 yipdw^ the last time we had a huge flood of people hitting Splinder, their infrastructure couldn't handle it
03:53 🔗 PatC SketchCow, agreed
03:53 🔗 yipdw^ and so we ended up with a lot of incompletes
03:53 🔗 SketchCow I think Splinder is well at hand.
03:53 🔗 yipdw^ yeah, it's good
03:53 🔗 PatC yipdw^, ok i'll wait a little
03:53 🔗 SketchCow We have plenty of people on it, enough to get it
03:53 🔗 PatC ok
03:55 🔗 yipdw^ was it actually possible to make a public wave?
03:55 🔗 yipdw^ the waves I've got are all restricted to a small group
03:56 🔗 Wyatt|Wor Oh, wait, really?
03:56 🔗 PatC That was my concern :/
03:57 🔗 Wyatt|Wor I thought Splinder was supposed to go down like...today or something.
03:57 🔗 yipdw^ the official announcement is January 31
03:59 🔗 PatC I didn't notice Splinder on the projects wiki page
03:59 🔗 yipdw^ oh, speaking of which, I wonder how bad my AWS bill now is
04:00 🔗 yipdw^ ooh.
04:00 🔗 yipdw^ well it's not as bad as the Friendster days
04:02 🔗 yipdw^ actually, I say that most of the small Splinder accounts have been grabbed, but according to http://splinder.heroku.com/ kennethre is still pulling a bunch of them
04:02 🔗 yipdw^ weird
04:02 🔗 yipdw^ maybe they're all requeues
04:04 🔗 Wyatt|Wor Yeah, a couple more people showed up and owned all over my measly 29GB ;) He's really kicking ass.
04:04 🔗 yipdw^ he's controlling something like 360 dynos at Heroky
04:04 🔗 yipdw^ er, Heroku
04:04 🔗 yipdw^ fuck LOIC, that man can nuke sites from orbit single-handedly
04:05 🔗 Wyatt|Wor Wow.
04:05 🔗 Wyatt|Wor I think I'm up to...150 threads or so?
04:06 🔗 Wyatt|Wor (Had to google for what a dyno was, heh)
04:06 🔗 yipdw^ oh heh
04:06 🔗 yipdw^ he might be doing more, it was all in the #splinder logs at one point
04:07 🔗 yipdw^ er, is in the logs
04:13 🔗 SketchCow Splinder shifted the date.
04:13 🔗 SketchCow That's what's missing.
04:18 🔗 Paradoks PatC: I took Splinder out of the "Projects with BASH scripts that need more people running them" when it hit 0 left. Even now, I think it's pretty well covered. Not that it shouldn't be elsewhere on the projects page.
04:21 🔗 PatC Ok
05:08 🔗 Wyatt|Wor Is there anything out there that is likely to have old DNS info cached or otherwise accessibly saved?
05:21 🔗 Coderjoe what info and how old?
05:25 🔗 Wyatt|Wor Oh, I'm trying to verify the IP for a site as of...about a month ago? Some reseller on our super legacy Sphera platform went and canceled or something and left one of his clients high and dry.
05:27 🔗 Coderjoe i'm not aware of anything that would have data that old
05:27 🔗 Wyatt|Wor I'm trying to figure out if we've even got a backup, but none of the usernames she's mentioned exist. And compared to how things are on some servers, we have amazingly good backups for this one. I've checked back to June.
05:28 🔗 Wyatt|Wor whois.sc claims to have some historic IP data, but that's $30/month for their stuff.
05:36 🔗 bsmith095 one month is old for dns
05:36 🔗 bsmith095 wow
05:36 🔗 underscor So, you know what would be cool
05:36 🔗 underscor Craigslist archive
05:37 🔗 DFJustin Wyatt|Wor: have you tried netcraft
05:37 🔗 Paradoks Isn't Craigslist actively hostile to pretty much anything that uses Craiglist?
05:37 🔗 Paradoks Not that that should stop us, but it would make it difficult, I think.
05:38 🔗 Wyatt|Wor DFJustin: Not yet; taking a look now
05:38 🔗 underscor Paradoks: WE FUCKIN' ARCHIVED POETRY.COM, GET THAT BLASPHEMY OUTTA HERE
05:38 🔗 underscor Just kidding
05:38 🔗 underscor But still, I bet we could do it
05:39 🔗 underscor I mean, it's not like we're posting
05:39 🔗 dnova Paradoks: yes
05:39 🔗 Wyatt|Wor That's an interesting moving target. Don't they delete older things?
05:39 🔗 dnova they are hostile to anything that uses craigslist
05:39 🔗 dnova they go after people who make abstraction layers that make craigslist more useful and I just don't get that
05:39 🔗 Wyatt|Wor I tried to put my desk on craigslist. It was surprisingly difficult.
05:39 🔗 bsmith095 can i get a copy of poetry.com? is that availible yet?
05:39 🔗 dnova they don't insert ads or anything
05:39 🔗 Paradoks We archived a small portion of poetry.com, which was totally worth it. And it was totally cool how irritated we made the bastards by attempting to save the information they wanted to destroy.
05:40 🔗 dnova they wanted to /sell/
05:40 🔗 dnova the only worth that company had was their user content
05:40 🔗 bsmith095 is that archive public?
05:40 🔗 DFJustin http://www.archive.org/details/archiveteam-poetrydotcom
05:41 🔗 DFJustin this isn't everything we got though
05:41 🔗 bsmith095 which was already publically availible anyway, so how is it worth anything to sell it?
05:41 🔗 dnova without the content, what are they?
05:42 🔗 bsmith095 DFJustin: so where's the rest.. just curious
05:42 🔗 dnova a poetry website with no users and nothing to read
05:42 🔗 dnova i.e. worthless
05:42 🔗 underscor Paradoks: Go ahead and rejoin #Magicallydelicious
05:42 🔗 Paradoks bsmith: Theoretically they controlled the publicly available content.
05:42 🔗 DFJustin pending sketchcow organization
05:42 🔗 underscor (if you want)
05:42 🔗 db48x I'm nearly done organizing that
05:42 🔗 DFJustin there's some value in an english word domain name I'm sure
05:42 🔗 Paradoks Heheh. I mostly just checkin to see if you're still there. I do wonder if delicious will ever get suitably archived, though.
05:43 🔗 underscor Coderjoe: Spread some ops around? we're getting low
05:43 🔗 bsmith095 ok but theoretically aol owns happy birthday, doesnt mean they can meaningfully enforce that
05:43 🔗 dnova they CAN
05:43 🔗 dnova nobody in tv or movies can sing it without mpaying, for example
05:43 🔗 dnova -m
05:45 🔗 Paradoks I like to sing, "Good morning to you" and let people infer what's needed.
05:46 🔗 Paradoks bsmith: re: delicious -- It seemed quite likely, once upon a time, that Yahoo was going to shut down delicious and the various user-created lists would be lost. Where user-created content is threatened, there Archive Team will go.
05:49 🔗 dnova who here is responsible for the heroku deal
05:50 🔗 dnova I don't really understand what heroku is but it looks expensive!
05:51 🔗 underscor kennethre is
05:51 🔗 underscor But it's free because he's using the "one free dyno" thing
05:51 🔗 dnova awesome
05:51 🔗 Paradoks dnova: And Alard set up the splinder/anyhub/mobileme.heroku.com tracking things.
05:52 🔗 dnova alard is beyond amazing
05:53 🔗 Paradoks Agreed.
05:53 🔗 bsmith095 im running dld-streamer 40, am i making any kind of a dent at all?
05:53 🔗 Coderjoe underscor: spread ops where?
05:54 🔗 dnova bsmith095: you can run 200-500 threads on even pretty modest hardware if you are ok with using lots of cpu
05:54 🔗 Wyatt|Wor Oh yeah, anyhub. :( looks like we only got about half of it?
05:55 🔗 Paradoks bsmith095: You're still getting noticeable quantities of data, though. It's still useful, even if you don't make the top 10 downloaders.
05:56 🔗 dnova I wish the heroku thing had stats for all participants
06:00 🔗 dnova but yeah, bsmith, another thing is the deadline was extended bigtime so it's not a serious crunch for us now so don't worry too mnuch about going nuts with the # of threads
06:01 🔗 bsmith095 just restarted streamer up to 200 and wow this laptop is running great!
06:01 🔗 dnova ahhh excellent :D
06:01 🔗 dnova I wonder how many kenneth is running!!
06:01 🔗 Wyatt|Wor Huh, was there a heroku for mobileme?
06:02 🔗 Wyatt|Wor mobileme.h.c isn't working.
06:02 🔗 dnova sounds like the person doing it can only have 1 at a time
06:03 🔗 db48x Wyatt|Wor: memac.heroku.com
06:03 🔗 Wyatt|Wor Ah
06:04 🔗 Wyatt|Wor How's their bandwidth? I'll have to get in on that once I consolidate all my Splinder.
06:05 🔗 db48x Wyatt|Wor: how is whose bandwidth?
06:06 🔗 dnova alard is going to have to dump more claimed but unreturned users into the project again, right?
06:06 🔗 dnova probably a few times?
06:06 🔗 Wyatt|Wor I guess mobileme?
06:06 🔗 db48x I don't think we have seen any indication that mobileme was failing under the load
06:06 🔗 db48x but then we hadn't really put out the word yet when splinder came down the line
06:07 🔗 dnova yeah this is the first I'm hearing anything about mobileme
06:07 🔗 db48x I've got 130 megs of compressed poetry left to sort, out of 1.3 gigs
06:08 🔗 Wyatt|Wor Wiki says 200TB by June something...that's a hell of a pull.
06:08 🔗 dnova fuck.
06:08 🔗 Wyatt|Wor I'm gonna need more space on my VPS.
06:09 🔗 db48x heh, yea
06:09 🔗 db48x I'm running very low myself
06:09 🔗 db48x only 225 gigs left
06:10 🔗 bsmith095 even with today's obscenely huge storage capacity, thats still maybe 5 or 6 cubic feet of *really* huge drives
06:10 🔗 dnova that's only 50 hard drives
06:10 🔗 dnova I don't know if 50 hard drives is 5 cubic feet
06:10 🔗 bsmith095 wtf?!?! 50 where do you shop?
06:11 🔗 dnova 4tb hard drives are around
06:11 🔗 db48x hmm
06:11 🔗 db48x I've seen 4tb externals, but those were two drives in an enclosure
06:11 🔗 dnova there are real actual 4tb hard drives
06:12 🔗 Wyatt|Wor Filesystem overhead. Probably closer to 60 if you use exclusively 4TB disks.
06:12 🔗 dnova pff ok :P
06:12 🔗 bsmith095 *really*, ive heard of 2tb but never actually seen one, but i bought a 1500gb one for $65 on ebay, then it fell off a table, boom, click of death, bought a dollar per mb, to get it back?!
06:12 🔗 Wyatt|Wor That and things usually don't fit neatly.
06:12 🔗 bsmith095 pardon me, dollar per gig, but still, damn clean rooms
06:13 🔗 dnova 3tb also exists and has for a while
06:13 🔗 DFJustin just got my 2tb luckily before the floods hit
06:13 🔗 Wyatt|Wor I have a 2GB on my desk at home. It's at least half for archiveteam stuff
06:13 🔗 bsmith095 how do we keep doing this> i mean the tech specs of how theey cram that many little magnetic flux thingies on a spinning platter?
06:13 🔗 DFJustin now I need to fill it with mobileme
06:14 🔗 bsmith095 Wyatt|Wor: u mean tb right?
06:14 🔗 Wyatt|Wor Err, I have a 2TB. I haven't had a 2GB in forever.
06:15 🔗 db48x hrm, there's rar file in here
06:15 🔗 bsmith095 where and what collection
06:15 🔗 db48x in the poetry collection
06:15 🔗 db48x I'm unifying all of the archives produced by individual downloaders into a single collection
06:15 🔗 bsmith095 really i just wnloaded that
06:15 🔗 db48x the next one is a rar file
06:16 🔗 bsmith095 db48x: what do u mean unifying, they're only 4 chunks
06:16 🔗 bsmith095 3 chunks
06:16 🔗 DFJustin that's probably my piece
06:17 🔗 db48x bsmith095: there are lots of others that haven't yet been uploaded to archive.org
06:17 🔗 bsmith095 u mean the poetry.xom archive
06:17 🔗 bsmith095 -x +c
06:18 🔗 db48x yes
06:20 🔗 underscor http://washingtondc.craigslist.org/nva/sad/2714010065.html
06:20 🔗 underscor I really want to apply for that :V
06:22 🔗 dnova apply
06:23 🔗 underscor I have school haha
06:24 🔗 underscor Also, I don't have 7 years of "professional" linux experience
06:24 🔗 underscor I mean, I've used it for that long, but
06:24 🔗 dnova fuck that'
06:24 🔗 underscor haha
06:24 🔗 dnova you'd be the best person they interviewed if you were going to do it
06:34 🔗 Wyatt|Wor Wow, They're in VA?
06:34 🔗 Wyatt|Wor Interesting.
06:54 🔗 db48x doh
06:56 🔗 db48x my computer locked up :(
06:56 🔗 db48x I can't even ssh in
06:56 🔗 db48x on the other hand, it is still routing traffic
07:03 🔗 Wyatt|Wor An interesting issue.
07:16 🔗 db48x I guess I'm going to have to reboot it
07:17 🔗 db48x highly annoying
07:27 🔗 db48x although it does still have disk activity
07:28 🔗 Wyatt|Wor Oh my, now THIS is quite cool http://www.unseen64.net/
07:33 🔗 db48x Wyatt|Wor: cool
07:51 🔗 ersi underscor: the "experiance" part is always bullshit
08:33 🔗 Nemo_bis a genius switched off power in my house, I have over 30 GiB of incomplete users which I won't be able to complete
08:33 🔗 Nemo_bis what should I do?
08:34 🔗 chronomex hibernate your computer
08:36 🔗 Nemo_bis uh?
08:36 🔗 Nemo_bis It was switched off and now rebooted
08:36 🔗 chronomex oh
08:36 🔗 chronomex hmmm.
08:37 🔗 ersi Yikes
08:37 🔗 chronomex I thought you were on UPS or something
08:38 🔗 * Nemo_bis hates power company that switches off your power when you go over 3 kW without any warning :-X
08:38 🔗 chronomex that's fucked, man
08:38 🔗 Nemo_bis no, that would be supposed to be unnede
08:38 🔗 chronomex fucked.
08:38 🔗 ersi lol, what?!
08:38 🔗 chronomex 3000W it kicks a breaker?
08:38 🔗 ersi So they.. kill your power.. because you're using it? Or have unpaid bills?
08:38 🔗 chronomex that's two heaters!
08:41 🔗 Nemo_bis no, just because you go over your power quota
08:41 🔗 Nemo_bis 4.5 kW costs more
08:41 🔗 chronomex where is this? I assume you live on an island or something
08:42 🔗 chronomex or in australia where utilities seem to suck universally
08:42 🔗 Nemo_bis Italy
08:43 🔗 Nemo_bis it's a long story, starting in 1960s with the nationalization of energy companies
08:43 🔗 chronomex aha.
08:44 🔗 chronomex where I live they charge by the kwh on an increasing scale and go after you eventually if you become unreasonable
08:44 🔗 Nemo_bis the same here
08:44 🔗 Nemo_bis but this is another issue
08:44 🔗 chronomex three fucking kilowatts
08:45 🔗 Nemo_bis it's a good thing, in general, but if only they gave you one or two beeps to warn you
08:45 🔗 Nemo_bis and it was worse before, the old mechanical counters were stricter
08:45 🔗 Cameron_D Only the internet sucks here is Australia
08:47 🔗 yipdw^ Nemo_bis: give alard the output of check-dld.sh
08:47 🔗 yipdw^ and just finish up as much as you can
08:47 🔗 Nemo_bis yipdw^, ok
08:47 🔗 Nemo_bis I'd tar the incomplete users and upload them to batcave, makes sense?
08:47 🔗 yipdw^ safest thing to do is to just requeue the incompletes
08:47 🔗 yipdw^ nah
08:48 🔗 Nemo_bis I don't want to just delete them
08:48 🔗 yipdw^ well, I don't think it makes sense -- we have until January 31 to get them
08:48 🔗 yipdw^ it's just that everything else on batcave is stuff that, until we get a closer look at it, is presumably complete
08:49 🔗 yipdw^ uploading known incomplete data throws a wrench into that
08:49 🔗 Nemo_bis I'd put them in a different directory and in a different format
08:50 🔗 yipdw^ I guess it's fine if it's labeled
08:50 🔗 yipdw^ just make sure SketchCow knows
08:50 🔗 ersi I think I get the same price per kwH.. unlimited..
08:51 🔗 ersi I mean, we have a contract which specifies a price per kilowatt hour..
08:51 🔗 chronomex that's reasonable
08:52 🔗 chronomex here there's a really low tier for like 100kwh, then it jumps by like 2x for the next few thousand kwh, then it kind of plateaus slightly lower
08:52 🔗 chronomex the jump in the middle is to pay for the cost of residential service
08:52 🔗 ersi I bet if I consume lots.. I'd get a discounted rate :)
08:53 🔗 chronomex yea but you'd have to pay for all that power
08:53 🔗 ersi I remember talking to some dude in Berlin, Germany - who had an industrial power contract to his flat.. Hehe
08:53 🔗 ersi Then again.. he had a cluster of machines in his flat :p
09:58 🔗 RedType ersi: germans take their porn viewing seriously
10:13 🔗 ndurner_w is batcave down?
10:15 🔗 Wyatt|Wor How many threads does that kenneth guy have going again? I'm close to 300 and nowhere NEAR his rate.
10:18 🔗 ndurner_w ah, works again
10:57 🔗 Nemo_bis Wyatt|Wor, +1
10:58 🔗 Wyatt|Wor Nemo_bis: ?
10:58 🔗 Nemo_bis Wyatt|Wor, about kenneth
10:58 🔗 Wyatt|Wor Ah
10:58 🔗 Nemo_bis the good thing is, I don't need to do this job any longer if he goes on like this
10:59 🔗 Nemo_bis alard, this is the situation of incomplete users (+ about 5000 us users whose errors were not detected due to the locale problem when the site went down): http://p.defau.lt/?yd3MFeDi91WK6IU007B11A
10:59 🔗 db48x Nemo_bis: you can use dld-streamer.sh to start fixing those
11:00 🔗 Nemo_bis also, alard (and others), FYI, this is how heavy they are: http://p.defau.lt/?fCQq9XenY5_0s3yBtHse_Q -> 58207 MiB total
11:00 🔗 db48x find -name '.incomplete' | cut -d '/' -f 3,7 | tr '/' ':' >incompletes
11:00 🔗 Nemo_bis no, I won't
11:00 🔗 db48x ./dld-streamer.sh <nick> <threads> incompletes
11:01 🔗 Nemo_bis I don't have disk space enough, they were killing my memory and someone else with better tools should do such monster users apparently
11:01 🔗 db48x ah
11:01 🔗 db48x in that case
11:01 🔗 Nemo_bis they've been running for ten days in some cases
11:01 🔗 db48x build your list of usernames as above, then go to splinder.heroku.com/rescue-me :)
11:02 🔗 Nemo_bis db48x, what does it do?
11:02 🔗 Nemo_bis re-add to tracker?
11:02 🔗 alard Hi all.
11:02 🔗 Nemo_bis hello
11:02 🔗 db48x yea, lets you add them back into the tracker
11:02 🔗 Nemo_bis ok
11:02 🔗 db48x they probably already were, but that way you'll know for sure
11:02 🔗 alard Actually, rescue-me doesn't re-add things.
11:03 🔗 db48x oh?
11:03 🔗 Nemo_bis so, alard, can you put those users back in the queue?
11:03 🔗 alard It's for adding unknown usernames to the tracker.
11:03 🔗 db48x ah
11:03 🔗 Nemo_bis (if they are not already, because you mentioned putting back 2+ days old users, and mine were 10 days old)
11:03 🔗 alard (rescue-me is useful for mobileme, where people say 'could you save user X'?)
11:04 🔗 alard Nemo_bis: As long as your client hasn't marked them done they will be added back to the queue.
11:04 🔗 Nemo_bis alard, those 5000 us users have been marked done
11:05 🔗 alard Ah, I see. There may be more of those, so we'll have to do some checking later on. (Still time until January, right?)
11:05 🔗 alard I'll add your list back to the todo list.
11:08 🔗 Nemo_bis does rsync want input list of files with null or newlines?
11:09 🔗 alard Newlines, surely?
11:10 🔗 alard Ah, you can configure it: -0, --from0
11:10 🔗 alard CR+LF
11:10 🔗 alard This tells rsync that the rules/filenames it reads from a file
11:10 🔗 alard are terminated by a null (’\0’) character, not a NL, CR, or
11:10 🔗 Nemo_bis ok
11:12 🔗 alard Also: I've been playing with a new wget project yesterday, see http://www.archiveteam.org/index.php?title=Wget_with_Lua_hooks . If you have any suggestions or comments, please add them to the wiki.
11:12 🔗 Nemo_bis SketchCow, I'm uploading those 58 GiB of incomplete users to my splinder-broken directory in case someone can use them in some way (10 days of downloads, argh)
11:13 🔗 Nemo_bis only 250 KiB/s now :-/
11:13 🔗 Nemo_bis see you all
11:17 🔗 Wyatt|Wor All right, shift over. I'll let these run and catch everyone on the other side!
12:18 🔗 Soojin thearchiveteam by proxy http://englishrussia.com/2011/11/25/things-that-must-not-be-forgotten/
12:25 🔗 db48x Soojin: cool
12:31 🔗 NotGLaDOS wait, are there any unassigned users?
12:34 🔗 db48x NotGLaDOS: yes, but not many
12:34 🔗 NotGLaDOS drat.
12:34 🔗 db48x heh
12:34 🔗 db48x there are 13000 left
12:35 🔗 db48x http://splinder.heroku.com/
12:35 🔗 db48x they might last another hour or two
12:37 🔗 NotGLaDOS Well, I can always help!
12:37 🔗 db48x the more the merrier :)
12:40 🔗 NotGLaDOS Plus, the server's in Romania, so it should help with latency or somethi- oh, wait, only Australians have a cable running along the seafloor
12:40 🔗 NotGLaDOS Nevermind!
12:43 🔗 NotGLaDOS I shall do... 10
12:43 🔗 NotGLaDOS 10 concurrent sessions
12:46 🔗 NotGLaDOS downloads*
12:47 🔗 Cameron_D only 10?
12:48 🔗 NotGLaDOS fine
12:48 🔗 NotGLaDOS I'll do 1000
12:49 🔗 * NotGLaDOS winds down script
12:49 🔗 NotGLaDOS "NotGLaDOS it:maurizio71"
12:49 🔗 NotGLaDOS \o/
12:50 🔗 NotGLaDOS ...I got a 1GB user, didn't I
12:50 🔗 db48x if it said 1000MB, then yes
12:51 🔗 db48x most of them aren't that large though
12:51 🔗 NotGLaDOS nope, it choked on a 0MB user.
12:51 🔗 * NotGLaDOS is not amused
12:51 🔗 db48x what do you mean by choked?
12:51 🔗 NotGLaDOS as in, decided to take as long as it wanted to
12:51 🔗 db48x oh, yea
12:52 🔗 db48x the server isn't very fast
12:52 🔗 NotGLaDOS It shouldn't do that, as I'm not tunnelling through that server
12:52 🔗 NotGLaDOS That's what I have Cameron_D for.
12:52 🔗 db48x and even the users with the least amount of data require several http connections
12:52 🔗 NotGLaDOS ah.
12:53 🔗 NotGLaDOS well, once this winds down, screen -dmS splinder ./dld-streamer.sh NotGLaDOS 1000
12:53 🔗 NotGLaDOS While I wait, canabalt time!
12:54 🔗 db48x heh
12:54 🔗 db48x 1000 is pretty optimistic
12:54 🔗 NotGLaDOS I'll probably starve my ZNC users.
12:54 🔗 NotGLaDOS Oh well!
12:55 🔗 db48x indeed :)
12:55 🔗 NotGLaDOS IT'S IN THE NAME OF ARCHIVING!
12:55 🔗 db48x and that's the best excuse there is
12:56 🔗 NotGLaDOS Indeed!
12:56 🔗 db48x mmm
12:56 🔗 db48x this disk is 97%
12:56 🔗 NotGLaDOS 4 to go until it's finished winding down.
12:58 🔗 db48x probably time for me to wind down as well
13:00 🔗 NotGLaDOS wait, is it doing them simultaneous- of course it is.
13:01 🔗 db48x yea
13:02 🔗 db48x hrm
13:02 🔗 db48x oh dear
13:02 🔗 NotGLaDOS I just get the feeling that the tracker has dumped a large one in as the last one for fun.
13:02 🔗 NotGLaDOS hm?
13:02 🔗 db48x I have 56GB free, and although I just stopped my downloaders, I have 100 threads left winding down
13:03 🔗 db48x I'm doing the mobileme project, and those users are larger
13:03 🔗 NotGLaDOS crap
13:03 🔗 db48x estimated size is 63GB
13:03 🔗 NotGLaDOS yer screwed.
13:03 🔗 db48x yep
13:03 🔗 NotGLaDOS "it:habbo"
13:03 🔗 NotGLaDOS I feel sorry for that guy
13:04 🔗 db48x heh
13:04 🔗 db48x there have been a lot of weird usernames
13:18 🔗 NotGLaDOS Mmm, popcorn
13:19 🔗 NotGLaDOS Crap, this is going to finish spinning down by the time they're all gone, aren't they?
13:19 🔗 * NotGLaDOS shakes fist at kenneth
13:19 🔗 db48x heh
13:20 🔗 db48x just run another one
13:20 🔗 db48x in another terminal
13:20 🔗 NotGLaDOS ..that'll work?
13:20 🔗 NotGLaDOS ooh!
13:20 🔗 db48x they won't get in each other's way
13:20 🔗 * NotGLaDOS uses screen anyway
13:20 🔗 db48x indeed :)
13:21 🔗 db48x hmm
13:21 🔗 NotGLaDOS time to do my 1000 connection dream, and knock myself off of anything that doesn't go through this HTTP proxy!
13:21 🔗 NotGLaDOS bye, Cameron_D!
13:22 🔗 db48x I have 1000GB of friendster data
13:22 🔗 NotGLaDOS "downloading it:chinachina"
13:22 🔗 NotGLaDOS chinachinachinachinachinachinachinachina
13:22 🔗 NotGLaDOS db48x: nice
13:23 🔗 db48x hrm
13:23 🔗 NotGLaDOS Welp, there goes my dreams
13:23 🔗 db48x 995GB of it is already compressed though
13:23 🔗 NotGLaDOS "Cannot allocate memory"
13:23 🔗 NotGLaDOS "TERMINATE ALL THE SELFS"
13:23 🔗 db48x that is a lot of wgets
13:24 🔗 NotGLaDOS ...right, forgot about that
13:24 🔗 NotGLaDOS maybe 100?
13:24 🔗 NotGLaDOS it got up to 170 before derping
13:24 🔗 db48x 100 is a good start
13:24 🔗 db48x check memory usage, iowait and bandwidth and then adjust
13:25 🔗 NotGLaDOS ...wait, did that just allocate 170 users to me that will never complete?
13:25 🔗 db48x yes and no
13:25 🔗 db48x at some point we'll add them back to the queue
13:25 🔗 NotGLaDOS Oh, phew.
13:25 🔗 db48x or you can collect the list (find -name '.incomplete' | cut -d '/' -f 3,7 | tr '/' ':' >incompletes) and run it through dld-streamer
13:25 🔗 NotGLaDOS "180GB traffic"
13:26 🔗 NotGLaDOS It was at 90 this morning!
13:26 🔗 NotGLaDOS \o/
13:26 🔗 db48x :)
13:26 🔗 NotGLaDOS Wait I haven't archived that much.
13:28 🔗 NotGLaDOS Time to check my TCP buffers!
13:28 🔗 NotGLaDOS Actually not that bad.
13:28 🔗 NotGLaDOS bandwidth, however, just gets a 2Gigabit spike randomly \o/
13:29 🔗 NotGLaDOS hehe, kernel memory just jumps to 70M
13:34 🔗 Nemo_bis 2 Gb? are splinder servers so robust?
13:35 🔗 NotGLaDOS No idea
13:35 🔗 NotGLaDOS They had over 1.2 billion users, they would've had to handle that traffic.
13:36 🔗 NotGLaDOS Oh, I can just hear my server screaming for mercy.
13:36 🔗 db48x no, 1.3 million
13:36 🔗 db48x only off by three orders of magnitude
13:36 🔗 NotGLaDOS drat.
13:37 🔗 NotGLaDOS >MFW it's only number 99 and 100 doing the work
13:37 🔗 NotGLaDOS Oh well, back to canabalt
13:39 🔗 db48x awesome, I got all the poems sorted
13:40 🔗 NotGLaDOS Nice
13:41 🔗 db48x there are two tarballs left
13:41 🔗 db48x one contains their blog
13:42 🔗 db48x one contains a categorization of the poems
13:48 🔗 db48x oh, and a third that probably is a mixture of files from the site and poems, hrm
13:52 🔗 NotGLaDOS "PID 5766 finished 'it:Barbabietole_Azzurre': Error - exited with status 6."
13:52 🔗 NotGLaDOS My first error!
13:53 🔗 * NotGLaDOS feels special
13:54 🔗 db48x :)
13:54 🔗 NotGLaDOS And another one!
14:03 🔗 NotGLaDOS Crap.
14:03 🔗 NotGLaDOS I'm starting to get a lot of error 6
14:11 🔗 Wyatt Down to 1500 (for now)
14:12 🔗 NotGLaDOS It'll go up.
14:12 🔗 Wyatt Oh I know
14:12 🔗 NotGLaDOS Script crashed when I set limit to 1000
14:12 🔗 NotGLaDOS There's about 180 users not done
14:12 🔗 NotGLaDOS fact: I am a moron
14:14 🔗 db48x 25 GB free
14:14 🔗 Wyatt So here's a question: are the mobileme downloads larger in terms of filesize? I have a strong feeling that saving large files are going to be where a beefy datacentre connection will really shine.
14:15 🔗 db48x Wyatt: much bigger
14:15 🔗 db48x Wyatt: the current average is 650 MB/user
14:16 🔗 db48x http://memac.heroku.com/
14:16 🔗 db48x down from 652 MB/user an hour or two ago
14:16 🔗 Wyatt No, I mean, are individual files going to be larger?
14:17 🔗 db48x yea, it was a file syncing service, not a weblog host
14:17 🔗 Wyatt Or is it going to be a large number of small files (a lot of http requests; less benefit from fat pipes)
14:17 🔗 Wyatt Ahh
14:17 🔗 Wyatt Okay, neat
14:17 🔗 NotGLaDOS There should be passwords in there..
14:17 🔗 NotGLaDOS NO, DONT THINK LIKE THAT.
14:18 🔗 NotGLaDOS I'm going to wind my script down.
14:18 🔗 * db48x yawns
14:18 🔗 db48x yea, we're starting to overload it
14:19 🔗 NotGLaDOS two status 6s in a row!
14:20 🔗 db48x I need to cough up for a new zfs array
14:22 🔗 NotGLaDOS 200 to go!
14:22 🔗 db48x then I can have one for the archives and one for my own stuff
14:22 🔗 NotGLaDOS Then we can re-add!
14:23 🔗 NotGLaDOS Soon, kenneth will run out of users, so I'll be all over it
14:23 🔗 NotGLaDOS MUAHAHAHA
14:23 🔗 db48x :)
14:23 🔗 NotGLaDOS wait, status 5?
14:23 🔗 NotGLaDOS [2011-11-25 14:24:09+00:00] 84/100 PID 15466 finished 'it:Spazio_ai_Giovani': Error - exited with status 5.
14:24 🔗 NotGLaDOS Now it's just spitting status 6s and 5s at me
14:24 🔗 db48x the proxies are dying
14:24 🔗 NotGLaDOS ah
14:25 🔗 db48x happened a couple of days ago too
14:28 🔗 NotGLaDOS ...so we're doing negative users now..
15:09 🔗 * db48x yawns
15:09 🔗 db48x well, I must sleep
15:09 🔗 db48x happy archving
15:12 🔗 NotGLaDOS o/
15:12 🔗 NotGLaDOS 10 processes to wind down, then sleeeeeep
15:12 🔗 * PatC looks at the clock
15:12 🔗 PatC *1013*
15:13 🔗 NotGLaDOS 2312 here.
15:13 🔗 PatC ahh
15:13 🔗 PatC Aussy?
15:15 🔗 NotGLaDOS West Australian.
15:15 🔗 PatC Cool!
15:37 🔗 Schbirid meh, it:vanillaaa finish already!
16:20 🔗 dnova mornin
16:26 🔗 PatC Mornin'
17:01 🔗 underscor alard: That looks freakin' awesome
17:03 🔗 underscor (the lua hooks)
17:25 🔗 DFJustin here's the site for that russian guy with the phonographs, I think http://staroeradio.ru/collection
21:54 🔗 underscor http://metaception.com/pepper
22:26 🔗 Wyatt dld-streamer automatically retries incompletes? Is that what I was told yesterday?
22:28 🔗 Coderjoe automatically? not unless someone added that since wed.
22:29 🔗 Coderjoe dld-single retries 5 times
22:29 🔗 Wyatt Ah, okay
22:30 🔗 Wyatt So do that find thing
22:31 🔗 Coderjoe dld-streamer as of wednesday has an optional parameter to provide a list of items to fetch (rather than ask the tracker)
22:35 🔗 godane i found something called js-wikireader
22:36 🔗 godane https://github.com/antimatter15/js-wikireader
22:39 🔗 underscor godane: I went to governor's school with him!
22:39 🔗 underscor haha
22:43 🔗 godane http://www.youtube.com/watch?v=e3KIyXuZJGY
22:51 🔗 winr4r SketchCow: in which i learn that you are awesome, teresa has been a friend of mine for some years, we were talking about some things, i pointed her to a speech of yours in which you mentioned geocities, and she was like "hey i had some stuff on geocities" and i was like "yeah, talk to jason, he might have it"
22:52 🔗 winr4r YOU HAD IT
22:54 🔗 winr4r SketchCow: i also might be off work for a couple of weeks again, so if there's stuff that needs describing, i might be your man
23:19 🔗 closure back from t-giving.. have splinder users still downloading, incredible
23:20 🔗 yipdw^ a lot of them were injected back into the todo queue
23:21 🔗 yipdw^ i still have downloads still going on too, though, which is nuts
23:22 🔗 yipdw^ -rw-rw-r-- 1 ec2-user ec2-user 110807580 Nov 25 23:22 splinder.com-Rei-chan-blog-touchingthestars.splinder.com.warc.gz
23:22 🔗 yipdw^ /home/ec2-user/splinder-grab/data/it/R/Re/Rei/Rei-chan
23:22 🔗 yipdw^ [ec2-user@ip-10-80-146-172 Rei-chan]$ ls -l *warc*
23:22 🔗 yipdw^ [ec2-user@ip-10-80-146-172 Rei-chan]$ pwd
23:24 🔗 Nemo_bis heh, mine lasted up to ten days (and counting)
23:24 🔗 Coderjoe man... I started the ec2 instance thinking "oh, this will only be 5 days at most..."
23:24 🔗 yipdw^ er, wait
23:24 🔗 yipdw^ Rei-chan is actually a busted account
23:24 🔗 yipdw^ check this out: http://www.splinder.com/myblog/comment/list/4212591/48159251?from=400
23:25 🔗 Coderjoe before splinder extended their closure date
23:25 🔗 yipdw^ try to click any of the navigation links
23:25 🔗 yipdw^ you will be sent to the same page
23:25 🔗 yipdw^ wtf
23:25 🔗 Coderjoe yay. spidertraps.
23:25 🔗 yipdw^ is it?
23:25 🔗 yipdw^ the account does have some legitimate content in it
23:25 🔗 Coderjoe I haven't looked yet
23:25 🔗 yipdw^ or something that looks human-generated
23:26 🔗 Coderjoe i didn't mean an intentional trap
23:26 🔗 yipdw^ oh
23:26 🔗 Coderjoe friendster had a lot of accidental shit that created spider traps
23:26 🔗 yipdw^ alard: I think we need a way of flagging accounts as "cannot archive fully" or some such; see http://www.splinder.com/myblog/comment/list/4212591/48159251 and click the navigation links for an example
23:27 🔗 Coderjoe server on fire?
23:27 🔗 yipdw^ well
23:27 🔗 yipdw^ alard: come to think of it, maybe we don't, because iirc wget doesn't try to retrieve URLs it's already seen
23:27 🔗 yipdw^ or does it
23:28 🔗 Coderjoe it shouldn't
23:28 🔗 yipdw^ I mean, it shouldn't, assuming that it assumes GET is idempotent
23:28 🔗 Coderjoe and it shouldn't go offsite, either
23:28 🔗 yipdw^ so this should complete at some point, it'll just be a fucking long grab
23:28 🔗 Coderjoe which of those links goes to another splinder site?
23:28 🔗 yipdw^ the spam links?
23:28 🔗 yipdw^ I don't know
23:28 🔗 yipdw^ none, as far as I can tell
23:29 🔗 Coderjoe haha
23:29 🔗 Coderjoe "penis van lesbian"
23:29 🔗 Coderjoe is that like an ice cream truck?
23:29 🔗 yipdw^ I was thinking Dick van Patten
23:30 🔗 yipdw^ oh fuck
23:30 🔗 yipdw^ 2011-11-25 23:28:42 URL:http://www.splinder.com/splinder_noconn.html [1402/1402] -> "./tmpfs/it/Rei-chan/www.splinder.com/splinder_noconn.html" [1]
23:30 🔗 yipdw^ I hope that doesn't mean I missed something
23:30 🔗 Coderjoe ...
23:30 🔗 Coderjoe noconn? great
23:31 🔗 yipdw^ yeah
23:31 🔗 Coderjoe is that an overload error from a reverse proxy gateway?
23:31 🔗 yipdw^ it's a maintenance page
23:31 🔗 Coderjoe fuck on a stick
23:31 🔗 yipdw^ fuck HTTP status codes, we're doing this web style
23:32 🔗 yipdw^ I don't know what it is
23:32 🔗 yipdw^ but I just saw it in the Rei-chan wget log
23:32 🔗 Coderjoe that page looks simlar to the US page
23:32 🔗 yipdw^ checking others
23:33 🔗 Coderjoe doing a massive rgrep
23:34 🔗 Coderjoe er wait
23:34 🔗 Coderjoe I just want the logs
23:34 🔗 Coderjoe i r smrt
23:34 🔗 yipdw^ well done
23:34 🔗 yipdw^ ugh, this isn't looking good on my end
23:35 🔗 yipdw^ https://gist.github.com/a15c7707ee666502a825
23:38 🔗 Coderjoe looking quite bad here, too
23:39 🔗 Coderjoe hmm
23:39 🔗 Coderjoe not so bad for me it seems
23:40 🔗 Coderjoe https://gist.github.com/0427b4ed12ae48f2fb5f
23:40 🔗 Coderjoe at home. let's check the ec2
23:41 🔗 yipdw^ Coderjoe: when did they start happening
23:41 🔗 * Nemo_bis has some as well :-(
23:42 🔗 closure WyattL yo, around?
23:42 🔗 closure Wyatt: yo, around?
23:45 🔗 Wyatt|Wor closure: Yeah?
23:57 🔗 Coderjoe any idea why the check/fix scripts only check the us profiles for 502/504?
23:58 🔗 Coderjoe well, 500 errors, not just 502/504

irclogger-viewer