#archiveteam 2011-11-21,Mon

↑back Search

Time Nickname Message
00:07 🔗 bsmith094 quick question whats this? mode (+ooo bsmith094 chronomex Paradoks) by underscor
00:07 🔗 zetathust of quick save in
00:14 🔗 underscor bsmith094: op
00:15 🔗 bsmith094 and i dont speak irc codes?
00:15 🔗 zetathust residue codes from algebra to make it beep and shake
00:15 🔗 underscor Nothing, just means you're an operator of the channel
00:15 🔗 zetathust s operator overloading'gotchas'?
00:15 🔗 underscor !shutup
01:04 🔗 db48x underscor: don't tell me you brought a markov-chaining bot in here
01:05 🔗 zetathust brought to light that the deal with him
01:05 🔗 underscor db48x: Ok, I won't tell you
01:06 🔗 db48x heh
01:07 🔗 bsmith094 i just checked the project page, how in the name of sanity do you plane to coordinate the backup of 200 *terabytes* of data?
01:07 🔗 underscor Simple
01:08 🔗 underscor We're insane
01:08 🔗 underscor :D
01:08 🔗 underscor Fixes all the problems
01:08 🔗 db48x heh
01:09 🔗 underscor <ATidlebot> underscor was bucked from a horse. This terrible calamity has slowed them 0 days, 02:59:32 from level 39.
01:09 🔗 zetathust you tolkein about bieber slowed down http : //www
01:09 🔗 underscor Damn
01:32 🔗 underscor db48x: Do you have pushrights to splinder-grab?
01:32 🔗 db48x yes
01:33 🔗 underscor Will you add this to the end of dld-single.sh?
01:34 🔗 underscor After line 117
01:34 🔗 underscor curl http://71.126.138.142/done.php >/dev/null 2>&1 &
01:34 🔗 underscor :D
01:34 🔗 db48x hrm
01:34 🔗 db48x what does that do?
01:34 🔗 db48x besides slow everyone down?
01:35 🔗 underscor It doesn't slow anyone down?
01:35 🔗 db48x a little bit, yes
01:35 🔗 underscor Alright, fine. It's nothing important :P
01:36 🔗 db48x I'm not averse to it if it's interesting
01:36 🔗 zetathust or something interesting to talk about important issues, and a community needing buy in from $x
01:38 🔗 db48x also, are you on github?
01:39 🔗 underscor I think so
01:39 🔗 underscor Can't remember
01:40 🔗 underscor All it does is play a sound on my computer and flash the LEDs around my room
01:40 🔗 underscor So I can fall asleep to people saving splinder
01:41 🔗 db48x awesome
01:44 🔗 underscor whoever 98.207 is is doing a lot
01:49 🔗 db48x that's me :)
01:51 🔗 bsmith094 is there a quick way to see how many useres i have and how much sapce there taking up
01:51 🔗 db48x du -chs
01:52 🔗 bsmith094 wow only 3gb in over 20 hours thats insanely slow
01:53 🔗 db48x yea
01:53 🔗 db48x the site isn't all that fast
01:53 🔗 bsmith094 are their servers just belching blue smoke by now or what?
01:54 🔗 db48x heh
01:54 🔗 underscor probably
01:54 🔗 db48x remember that there is a lot of connection set up time for each file you download
01:55 🔗 bsmith094 could it just reuse the same connection like wget usually does
01:55 🔗 db48x yea, I'm not actually sure if wget does that
01:55 🔗 zetathust with wget
01:56 🔗 db48x anyway, all the more reason to run more clients
01:56 🔗 zetathust to irc clients are always crazy
02:02 🔗 bsmith094 im running 200 threads right now
02:11 🔗 yipdw^ so I just saw SketchCow's tweets about the OWS Library
02:12 🔗 yipdw^ I had no clue that was happening
02:12 🔗 yipdw^ and after reading some articles on it, I'm still not entirely sure what the hell it's about
02:37 🔗 SketchCow Short form.
02:37 🔗 SketchCow OWS had a library, a tent with donated books.
02:37 🔗 SketchCow Many of them rather red and protesty, but also just general books
02:38 🔗 SketchCow During the clear-out of the park, the books were mostly destroyed, a few saved.
02:38 🔗 SketchCow Small finger pointing, ALA made fun of the "librarians" for abandoning the library, librarians going "You weren't there, man... you didn't know..."
02:38 🔗 zetathust abandoning
02:38 🔗 SketchCow Anyway
02:38 🔗 SketchCow So they decided to rebuild it.
02:38 🔗 zetathust to rebuild
02:39 🔗 SketchCow I went "Oh come on."
02:39 🔗 SketchCow Twitter Fight results
02:39 🔗 yipdw^ ah ha
02:39 🔗 yipdw^ that's useful background info
02:39 🔗 SketchCow So they've got people going around in a bookmobile, i.e. a car, picking up books to drop back into the library.
02:39 🔗 SketchCow I have said they should 1. Put it on a van or something 2. Shouldn't put unique books there
02:41 🔗 SketchCow Anyway, boring, what I ended up doing was letting tons of people know about openlibrary.
02:41 🔗 yipdw^ yeah, that's another project I didn't know about until about half an hour ago
02:42 🔗 SketchCow It happens.
02:42 🔗 SketchCow All this crap has a publicity issue
02:42 🔗 SketchCow No Jimbo Wales or Cory Doctorow to crow about it.
02:42 🔗 SketchCow Well, not entirely.
02:42 🔗 SketchCow They got me. I'm something.
02:43 🔗 yipdw^ I find it all a bit surreal
02:43 🔗 yipdw^ OWS is only about 800 miles from me, and Occupy Chicago is two orders of magnitude closer, and yet I've no idea what's going on
02:43 🔗 yipdw^ uncomfortably insulated, etc.
02:57 🔗 rude___ what rare/unique books were known to be donated to OWS?
03:00 🔗 SketchCow The librarything listing of what was at OWS included older one-off books
03:01 🔗 SketchCow Additionally, there were custom books being made at the site
03:01 🔗 SketchCow Ok, I'm flinging a few more machines into the cause.
03:06 🔗 SketchCow 800,000 users done!!
03:07 🔗 underscor \o/
03:09 🔗 underscor It's looking like we're actually gonna make it
03:09 🔗 underscor \o/
03:14 🔗 Coderjoe not a whole lot of data going on for my clients
03:16 🔗 Coderjoe mmm
03:16 🔗 Coderjoe looks like a lot of urls going to www.sp... and files.sp...
03:25 🔗 SketchCow OK, so, this tracker thing does not like FreeBSD.
03:26 🔗 SketchCow Oh well. no contributing for me, yet.
03:27 🔗 SketchCow Not with underscor bogarting it
03:42 🔗 Coderjoe closure's bogarting the splinder stats
03:42 🔗 zetathust those stats
03:43 🔗 yipdw what is zetathust
03:43 🔗 zetathust s what i love when shit breaks hardcore
03:45 🔗 Coderjoe mirc 6.21, probably running some chatterbot script
03:45 🔗 Coderjoe ..
03:45 🔗 Coderjoe or a forged mirc reply
03:45 🔗 Coderjoe but i think it is a chatterbot. and underscor's fault.
03:47 🔗 PatC Hello everyone
03:54 🔗 underscor bogarting?
03:54 🔗 yipdw http://en.wiktionary.org/wiki/bogart
03:54 🔗 underscor Coderjoe: It's python actually
03:54 🔗 underscor So it's a forged reply
03:54 🔗 underscor :P
03:54 🔗 dashcloud stealing, hogging, etc
03:54 🔗 underscor oic
03:56 🔗 yipdw I'm surprised we haven't heard much from DADA S.p.A about the increased traffic
03:56 🔗 yipdw maybe, despite the crappy throughput, none of this is really having an impact on their operations
03:56 🔗 yipdw (alternative explanation: they don't give a shit)
03:57 🔗 dashcloud or maybe they are thrilled people are taking this much of an interest for the end of days?
03:58 🔗 yipdw I dunno, they'd have to be pretty dense to look at their traffic and logs and go "gee, I guess this means we're drawing a new audience"
03:58 🔗 zetathust the case would be useful target audience for the future
04:03 🔗 bsmith094 hoq so i find out whats chewing up my disk
04:03 🔗 bsmith094 hardrive spinning like crazy for 20 min now
04:04 🔗 db48x bsmith094: yea?
04:05 🔗 bsmith094 umm yeah will top sort by hd activity
04:05 🔗 db48x no, but iotop will
04:05 🔗 bsmith094 thanks
04:05 🔗 db48x you're welcome
04:06 🔗 bsmith094 ok all these little monitoring doodads should really be installed by default
04:06 🔗 db48x heh, yea :)
04:07 🔗 bsmith094 are "@" chars valid in profile names
04:10 🔗 db48x I think I've seen a few that use it
04:12 🔗 db48x dnova: what's your setup like?
04:13 🔗 yipdw bsmith094: yeah, see e.g. http://www.us.splinder.com/profile/br0ken@tipic.com
04:13 🔗 yipdw I've only seen them on Splinder US
04:27 🔗 Coderjoe oh boy
04:27 🔗 Coderjoe Dear Amazon EC2 Customer,
04:27 🔗 Coderjoe We are very excited to announce the Public Beta of a new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge).
04:28 🔗 Coderjoe another way to burn money
04:28 🔗 db48x heh
04:28 🔗 yipdw Cluster Compute Eight sounds like a pretty awesome band name
04:29 🔗 yipdw PUBLIC BETA by DAVID GUETTA feat. C.C. EIGHT
04:33 🔗 chronomex hmmm.
04:35 🔗 SketchCow http://www.nytimes.com/2011/11/21/technology/quietly-google-puts-history-online.html?_r=3&pagewanted=all
04:36 🔗 underscor haha, yipdw
04:37 🔗 SketchCow THE DISK CAN'T EVEN HANDLE ME RIGHT NOW
04:55 🔗 db48x I wish iotop could narrow the list of processes down to just those causing activity on certain disks
05:18 🔗 chronomex hey you guys, I altered dld-profile and dld-client and dld-streamer to use a tmpfs in ./tmpfs . who is interested in this?
05:18 🔗 chronomex the only benefit is it thrashes your disk a lot less.
05:19 🔗 db48x I am
05:19 🔗 chronomex okay, one is enough ;)
05:19 🔗 db48x through a branch up on github?
05:19 🔗 db48x throw
05:19 🔗 Coderjoe what does it use the tmpfs for?
05:20 🔗 chronomex it's at https://github.com/chronomex/splinder-grab
05:20 🔗 Coderjoe and my list of problem profiles grows a bit longer. still all us profiles, though
05:20 🔗 chronomex you need to mount a tmpfs, read the beginning of the script to see what you ought do
05:21 🔗 Coderjoe O_o
05:21 🔗 Coderjoe filedir=/tmp/tmpfs/$username
05:21 🔗 chronomex wait did I fuck that up
05:21 🔗 chronomex holdon
05:22 🔗 chronomex no I did not.
05:22 🔗 chronomex what are you looking at?
05:22 🔗 chronomex you should not be seeing that.
05:22 🔗 Coderjoe your first tmpfs commit
05:22 🔗 chronomex yes, the first one is not the current one :)
05:22 🔗 Coderjoe not the "robustify" one
05:22 🔗 chronomex yes that was a proof of concept for my own use only
05:23 🔗 Coderjoe i know, but I figured the "robustify" would build on it
05:23 🔗 zetathust they figured out that trans fats means
05:23 🔗 yipdw chronomex: can you push that tmpfs work to a separate branch, and revert it from master?
05:23 🔗 yipdw I don't know how well it'll work on OpenSolaris
05:23 🔗 chronomex yipdw: hmm?
05:24 🔗 yipdw oh
05:24 🔗 Coderjoe O_o
05:24 🔗 yipdw I guess it doesn't die if you don't set up tmpfs
05:24 🔗 chronomex correct, that's what the robustify commit ensures.
05:24 🔗 yipdw ok
05:24 🔗 Coderjoe why is the "give underscor an epileptic seizure" addition in your "robustify" commit?
05:24 🔗 yipdw I'm just making sure I can update the code running on my Solaris box without getting weird errors
05:25 🔗 chronomex Coderjoe: uhhhh hm. I think that commit also happens to be a merge for some reason.
05:25 🔗 chronomex this is a dirty, badly done branch
05:25 🔗 chronomex because I did the version control wrong.
05:25 🔗 chronomex whatever.
05:25 🔗 yipdw wait
05:26 🔗 yipdw are you saying I can issue GET http://71.126.138.142/done.php and give underscor seizures
05:26 🔗 underscor :P
05:26 🔗 underscor Yes
05:26 🔗 zetathust yes
05:26 🔗 yipdw awesome
05:26 🔗 yipdw here comes ab -c 50
05:26 🔗 db48x lol
05:27 🔗 Coderjoe will this work without tmpfs?
05:27 🔗 yipdw it looks like it does
05:27 🔗 yipdw if there's no tmpfs directory, it just makes one and shoves files there
05:27 🔗 chronomex correct
05:28 🔗 chronomex it also complains.
05:45 🔗 underscor db48x: Grr, you added an e to my name
05:45 🔗 underscor ;)
05:45 🔗 zetathust : )
05:45 🔗 db48x heh
05:46 🔗 Coderjoe what is the largest download so far?
05:46 🔗 Coderjoe (not the warc size but the raw fetched data that would go in the tmpfs directory)
05:48 🔗 chronomex I don't know, tbh.
05:48 🔗 chronomex I'm seeing a steady state of about 1M per concurrent profile being downloaded for the tmpfs
05:49 🔗 underscor btw, adding 5 more machines with 500 threads each
05:49 🔗 chronomex it has a sawtooth wave pattern
05:49 🔗 chronomex underscor: jesus christ
05:49 🔗 zetathust hehe
05:50 🔗 underscor :(
05:51 🔗 chronomex "When in doubt, C4."
05:51 🔗 zetathust That wasn't very nice
05:51 🔗 chronomex okay, so what the fuck is splinder? is it really "italian geocities"?
05:51 🔗 underscor Pretty much
05:51 🔗 underscor It's a blog, and photo host
05:51 🔗 chronomex ok
05:51 🔗 chronomex brb.
05:55 🔗 Coderjoe woah
05:56 🔗 Coderjoe on nov 16 and 17, m1.small spot prices in us west spiked to $50
05:56 🔗 db48x over 100x increase?
05:57 🔗 underscor wow
05:57 🔗 underscor why????
05:57 🔗 db48x a typo perhaps?
05:57 🔗 Coderjoe it looks like the normal spot price is $0.036
05:57 🔗 db48x someone typed 50 instead of .50?
05:57 🔗 Coderjoe db48x: no. that wouldn't affect it
05:58 🔗 Coderjoe there just was a sudden demand for instances. (in which case the bid would have an effect.)
05:59 🔗 Coderjoe $50 spot bid... for when you REALLY don't want the instance to be killed due to price, but still want to (normally) save money
06:00 🔗 Coderjoe on the 18th, for a brief moment, m1.large spiked to $5
06:01 🔗 Coderjoe on the 11th, m1.large spiked to $40
06:02 🔗 chronomex Coderjoe: this was just for a little while, or all day?
06:02 🔗 Coderjoe as far as I can tell from the graphs, just a sample or two
06:02 🔗 chronomex hmmk
06:06 🔗 underscor yay, one machine running
06:06 🔗 underscor (crawl335)
06:06 🔗 chronomex you're not going to try and shoot up past closure?
06:07 🔗 Coderjoe spinning up some of the ia crawler nodes?
06:07 🔗 Cameron_D Hah, are you guys doing the scraping from amazon instances?
06:07 🔗 zetathust those instances btw
06:07 🔗 chronomex zetathust: get the fuck out, or shut up.
06:08 🔗 underscor Coderjoe: Yeah, I finally got access to them
06:08 🔗 underscor chronomex: That's not very nice
06:08 🔗 underscor !replyrate 1
06:08 🔗 Coderjoe Cameron_D: I was doing some mobileme scraping from an amazon instance. my wallet is not happy with me. I was tempted to spin up an instance for the splinder, but I don't know that I can afford it
06:08 🔗 chronomex !replyrate 0
06:09 🔗 Coderjoe although splinder is dead in a few days, so it might be worth it
06:09 🔗 zetathust Access denied to chronomex. You are on the "BITCHES, LEAVE ME BE" list and cannot control me
06:09 🔗 chronomex wtf
06:09 🔗 Coderjoe I really need to get a colocated box with cheap bandwidth
06:09 🔗 underscor ha
06:09 🔗 Cameron_D Coderjoe, ah
06:09 🔗 underscor Come on
06:09 🔗 underscor Don't kick it
06:09 🔗 chronomex no bots that talk
06:09 🔗 underscor :(
06:10 🔗 underscor Let's ask sj
06:10 🔗 underscor SketchCow*
06:10 🔗 chronomex it's still here, but can't speake.
06:10 🔗 Coderjoe I would have less of a problem with it if it was helping with maintaining ops or something
06:10 🔗 underscor He's off now
06:10 🔗 Coderjoe generally, bot should not speak unless spoken to, imo
06:11 🔗 chronomex correct
06:11 🔗 underscor :(
06:12 🔗 chronomex underscor: crawl335 has 500 threads?
06:12 🔗 underscor Yes
06:13 🔗 underscor I think it's iobound
06:13 🔗 chronomex yeah, that's why I added tmpfs :)
06:13 🔗 underscor This is running with tmpfs
06:13 🔗 chronomex o_o
06:14 🔗 Coderjoe any chance of the dashboard also giving the percentage of users remaining?
06:15 🔗 db48x it gives the number done and remaining, just divide them
06:15 🔗 Coderjoe and I have yet to see my name on the dashboard :(
06:17 🔗 db48x yea, that is weird
06:19 🔗 bsmith094 this isnt really the channel for it, but im in here on pidgin and i keep seeing these wired 5 pointed stars things next to certain usernames in the list, what are they?
06:19 🔗 Cameron_D Haha, I just remembered I have a VPS that is currently idle
06:20 🔗 * Cameron_D puts it to use
06:20 🔗 chronomex bsmith094: like who?
06:20 🔗 bsmith094 alard chronomex coderjoe ersi
06:20 🔗 chronomex these people are channel operators, they can kick people out and otherwise enforce order
06:21 🔗 bsmith094 is there a tut channel for all the weird syntax irc seems to have?
06:21 🔗 chronomex bsmith094: it's called google
06:22 🔗 bsmith094 oy alright then
06:22 🔗 chronomex because explaining all this shit gets old quick
06:22 🔗 Coderjoe iirc, there is a site named irchelp.com or something
06:22 🔗 bsmith094 i can imagine :)
06:24 🔗 bsmith094 ive heard there are global emergency irc channels for natural disasters, is that just a myth
06:24 🔗 db48x there probably are
06:24 🔗 db48x but they wouldn't be very reliable
06:27 🔗 yipdw if there are, I've never heard of one
06:27 🔗 yipdw you *can* get reports of disasters via IRC
06:27 🔗 yipdw it's happened before, e.g. with news of the Gulf War
06:27 🔗 chronomex like "holy shit earthquake"
06:28 🔗 yipdw but as far as I know there's no government-operated IRC channel for that
06:28 🔗 yipdw these days, IRC is pretty crappy for that anyway
06:29 🔗 yipdw for disasters that don't impact the Internet infrastructure too badly you're probably better off with Twitter
06:29 🔗 yipdw for worse things, amateur radio
06:29 🔗 Cameron_D relevent XKCD http://xkcd.com/723/
06:29 🔗 Coderjoe also after 9/11/01, several people were outputting new channel caption feeds into various irc channels
06:29 🔗 yipdw for disasters that will kill off California and Twitter
06:29 🔗 yipdw amateur radio
06:29 🔗 yipdw followed by partying if you're up for that
06:29 🔗 Coderjoe when all else fails: amateur radio
06:30 🔗 chronomex woop woop woop off-topic siren
06:30 🔗 yipdw indeed
06:30 🔗 yipdw my Splinder downloaders are all clogged with failing US profiles
06:30 🔗 * Coderjoe fingers the shotgun while eying that siren
06:31 🔗 Coderjoe I keep getting downloader dying due to failing us profiles
06:31 🔗 yipdw there's an update that retries
06:31 🔗 yipdw maximum of five times
06:31 🔗 Coderjoe yes. it retries 5 times and then the script exits
06:31 🔗 yipdw oh
06:31 🔗 Coderjoe back to a prompt
06:32 🔗 yipdw dld-streamer.sh collects that status
06:32 🔗 Coderjoe (I'm not using dld-streamer at the moment
06:32 🔗 yipdw oh
06:32 🔗 Coderjoe I've been collecting the failures in a text file
06:33 🔗 yipdw would it be useful to build a log of all of the profiles that have blogs with dashes in their subdomain?
06:33 🔗 yipdw I can collect those under OS X
06:33 🔗 yipdw probably under some other OSes too
06:33 🔗 yipdw I know there's a partial list in the Splinder wiki article, but something more comprehensive might be useful
06:34 🔗 Coderjoe oh look. I restarted a client after it died on a us profile... and it gets another failing us profile
06:39 🔗 db48x heh
06:41 🔗 chronomex ah, the joy of computers.
06:49 🔗 Coderjoe man
06:50 🔗 SketchCow No idea what happened, but wow, it's slamming up
06:50 🔗 Coderjoe i just spun up an instance and told it to do 500 threads. I still haven't broken 300
06:51 🔗 * BlueMax slaps chronomex. Insanity flows from his throat!
06:51 🔗 chronomex to finish splinder before 24-Nov 24:00 my time (it's 23:00 20-nov now), we'll have to average 2 users per second sustained.
06:52 🔗 chronomex this is looking doable
06:52 🔗 yipdw alard: are you open to replicating the Redis backend for the tracker?
06:54 🔗 Coderjoe mmm
06:55 🔗 Coderjoe stalled waiting for the tracker
07:04 🔗 Coderjoe it would be nifty, for dld-streamer.sh startup, if the tracker supported a means of requesting a chunk of IDs with one request
07:08 🔗 yipdw actually
07:08 🔗 yipdw that raises a good point
07:08 🔗 * yipdw adds an integration test suite for universal-tracker
07:08 🔗 Coderjoe i just now got over 300
07:09 🔗 yipdw honestly, I think 300 is excessive
07:09 🔗 yipdw Splinder's servers don't seem particularly high-capacity
07:10 🔗 chronomex Coderjoe: hm.
07:10 🔗 chronomex for rev2, I want to push all the tracker interaction code into streamer or something similar
07:10 🔗 chronomex so it can manage it smartly
07:11 🔗 chronomex currently dld-streamer requests a username, and dld-profile marks it as done
07:11 🔗 yipdw not sure if you know this, but the tracker does have a service endpoint to release a work item
07:11 🔗 Coderjoe but that means not being able to run dld-single to retry a failed profile and have it mark it as done if successful
07:11 🔗 chronomex correct.
07:12 🔗 yipdw could be useful for e.g. constantly failing US profiles
07:12 🔗 chronomex I'm not sure what the best way to do it is
07:12 🔗 chronomex maybe a script that does both streamer and single modes, then.
07:12 🔗 Coderjoe you could put the tracker code in a library file as functions to call, and then source the library file
07:13 🔗 Coderjoe that way all the actual tracker interaction code is in one place, but still can be called by whichever scripts
07:13 🔗 chronomex right
07:57 🔗 Coderjoe ew
07:57 🔗 Coderjoe failing us profiles are going to clutter the tmpfs directory
07:58 🔗 Coderjoe as are any profiles that finish "with network errors"
07:58 🔗 chronomex I patched dld-profile.sh to fix that
07:59 🔗 Coderjoe after the version I grabbed then
07:59 🔗 chronomex yeah. git pull from my version should resolve it.
08:00 🔗 Coderjoe i'm not seeing a commit on your repo though
08:00 🔗 Coderjoe er
08:00 🔗 Coderjoe wait
08:00 🔗 Cameron_D "Robustify tmpfs support. Add reinforcement to dld-client.sh." "chronomex authored about 3 hours ago"
08:00 🔗 Coderjoe there it is
08:00 🔗 Cameron_D that one?
08:00 🔗 chronomex yes, that one.
08:01 🔗 chronomex commit nr b6a63644b55d01453930de7ddd2e0b6108a918e6
08:01 🔗 Coderjoe i'm running a version that should have been after that and I don't have it in my wc
08:01 🔗 * Coderjoe checks stuff
08:01 🔗 Cameron_D so is it possible to pull from the firrerent branch without stopping everything?
08:01 🔗 Cameron_D *different
08:01 🔗 chronomex ought to be
08:01 🔗 Coderjoe it also would have been nice if you could have put all the log files in a subdirectory and had git ignore that subdir
08:01 🔗 chronomex that would have been nice, yes.
08:02 🔗 Coderjoe ok. I have that as my current revision
08:02 🔗 db48x2 you can have git ignore all of those anyway
08:02 🔗 Coderjoe you missed some exits
08:02 🔗 chronomex hmmm, what line nrs?
08:03 🔗 SketchCow Your hips are fat
08:03 🔗 chronomex oh yeah I missed loads.
08:03 🔗 Coderjoe 131, 140, 146, 192, 201
08:03 🔗 chronomex haven't had any leaks yet tho :P
08:03 🔗 Coderjoe may I suggest perhaps using a trap
08:03 🔗 chronomex it's a trap!
08:04 🔗 Coderjoe 207
08:04 🔗 db48x2 I pushed a .gitignore file
09:01 🔗 yipdw alard: I've started some tests for the tracker code; see https://github.com/ArchiveTeam/universal-tracker/tree/integration-tests if you're interested
09:30 🔗 alard yipdw: Ah, good, I'll have a look later.
09:30 🔗 alard No problem with redis replication, by the way.
09:31 🔗 alard Although I'm replicating it to my own computer too, so I do have a copy of the data.
09:33 🔗 Nemo_bis 37 GiB of incomplete users :-/ http://p.defau.lt/?z0oxTE2k2iUjIHUj8LLyOA
09:34 🔗 Nemo_bis and I was downloading at 10 Mb/s until a hour ago, now superslow again, umpf
09:46 🔗 Nemo_bis 50/50 Getting next username from tracker... downloading us:kllin@tipic.com
09:46 🔗 Nemo_bis why does dld-streamer.sh do this?
09:46 🔗 Nemo_bis 50/50 PID 13128 finished 'it:Emyll': Success.
09:46 🔗 Nemo_bis find: "data/us/k/kl/kll/kllin@tipic.com/files/": File o directory non esistente
09:47 🔗 Nemo_bis 50/50 Getting next username from tracker... downloading it:ammazzarvi
09:47 🔗 Nemo_bis 50/50 PID 16984 finished 'us:kllin@tipic.com': Success.
09:47 🔗 db48x2 is the @ confusing it?
10:07 🔗 Nemo_bis hmm
10:07 🔗 Nemo_bis but why does it look for it with find?
10:08 🔗 Nemo_bis that's not the issue
10:08 🔗 Nemo_bis 50/50 Getting next username from tracker... downloading it:Sam1279
10:08 🔗 Nemo_bis 50/50 Getting next username from tracker... downloading us:studyqueensland
10:08 🔗 Nemo_bis 50/50 PID 1474 finished 'it:yhdts002': Success.
10:08 🔗 Nemo_bis 50/50 PID 31601 finished 'it:Alias78': Success.
10:08 🔗 Nemo_bis find: "data/us/s/st/stu/studyqueensland/files/": File o directory non esistente
10:08 🔗 Nemo_bis 50/50 PID 2333 finished 'us:studyqueensland': Success.
10:09 🔗 Nemo_bis perhaps it's just checking whether the files/ directory was successfully deleted or not
10:12 🔗 db48x2 not sure
10:13 🔗 db48x2 oh, I see
10:14 🔗 db48x2 that's coming from dld-profile.sh
10:15 🔗 Nemo_bis aww
10:15 🔗 Nemo_bis found it
10:16 🔗 Nemo_bis http://toolserver.org/~nemobis/wget-phase-1.log
10:16 🔗 db48x2 doh
10:16 🔗 db48x2 if grep -q "ERROR 50" "${userdir}/wget-phase-1.log"
10:17 🔗 Nemo_bis and dld-streamer says "success"; I hope it doesn't tell to tracker that the user is done
10:18 🔗 db48x2 it did
10:18 🔗 Nemo_bis so what?
10:18 🔗 db48x2 so we need to get a list of all of those users and redo them
10:18 🔗 Nemo_bis aww
10:19 🔗 Nemo_bis well, one "just" has to grep the whole data directory for those error 500, for each user check that there's no phase-2 or phase-3 and then launch a bunch of dld-single, I guess
10:20 🔗 Nemo_bis I have no time to do it now, though
10:20 🔗 db48x2 yes, that will work
10:20 🔗 Nemo_bis but wait, I have infinite scrollback, I could just grep it
10:23 🔗 Nemo_bis http://toolserver.org/~nemobis/streamer500.log
10:32 🔗 db48x2 Nemo_bis: try setting LC_MESSAGES to en and see if you get english error messages out of wget
10:35 🔗 db48x2 Nemo_bis: also, fix-dld.sh will do that for you
10:39 🔗 Nemo_bis why do I need to change language?
10:42 🔗 Nemo_bis I guess I'll run it at the end
10:42 🔗 db48x2 you need to change the language because the downloader scripts are looking for specific strings that indicate that errors happeend
10:42 🔗 db48x2 happened
10:43 🔗 db48x2 if wget outputs traslated error messages, then the downloader scripts will fail to recognize errors
10:43 🔗 db48x2 which means that a lot of the profiles you've reported as being done have actually not been done
10:45 🔗 Nemo_bis ah
10:46 🔗 Nemo_bis looks like I don't have en locale installed
10:46 🔗 Nemo_bis but does this produce any error that fix-dld won't fix?
10:53 🔗 db48x actually, setting LC_MESSAGES is probably bad
10:53 🔗 Nemo_bis hm
10:53 🔗 Nemo_bis so the only option is to adapt fix-dld and run it?
10:57 🔗 db48x I think recompiling wget-warc so that it only ever uses english would be best
10:57 🔗 db48x not that I can figure out how to do that
11:03 🔗 db48x but in the mean time fixing up your copy of fix-dld.sh would let you fix them
11:46 🔗 Nemo_bis would compiling wget-warc without nls (which seems also responsible of part of the dash problems) do the job?
14:38 🔗 dnova hey waht is going on with splinder?
14:38 🔗 dnova looks like the project is stalled
14:39 🔗 Nemo_bis yep
14:39 🔗 Nemo_bis slower than ever
14:39 🔗 dnova neither of my boxes are doing anything
14:39 🔗 dnova it looks like they're still working
14:39 🔗 dnova ok it's not just me then?
14:41 🔗 Nemo_bis ah, that was another download
14:41 🔗 dnova what?
14:41 🔗 Nemo_bis all wget-warc instances are at 0.000 KiB/s for me now
14:41 🔗 dnova well hmph.
14:41 🔗 Nemo_bis and I can't open it in my browser
14:42 🔗 dnova damn.
14:44 🔗 Nemo_bis oh, but looks like they issued an official statement that bit will close "only" on the 31st January
15:37 🔗 Coderjoe mmm
15:39 🔗 Konklone SketchCow: may I have an rsync slot? I have a few accounts to upload
15:42 🔗 Konklone Oh, there's way more to do - I will chip in a bit from work here, I doubt they'd care
15:43 🔗 Konklone so I'll take that rsync slot later, after the job is done
15:44 🔗 db48x2 uh oh
15:44 🔗 db48x2 splinder users/hour is down to 0
15:44 🔗 Coderjoe see wiki
15:45 🔗 Coderjoe also [Nov 21 11 09:44] <Nemo_bis> oh, but looks like they issued an official statement that bit will close "only" on the 31st January
15:45 🔗 Konklone ah ha
15:45 🔗 Konklone should we still grab it?
15:45 🔗 Konklone oh it's down
15:45 🔗 db48x2 nice
15:45 🔗 Coderjoe also, they might have blackholed everyone that was doing mass downloading
15:45 🔗 Konklone well then maybe I do need that rsync slot, to give what I have so far
15:45 🔗 Konklone It's not much
15:45 🔗 Konklone but it is what I have.
15:46 🔗 db48x2 no, this computer here has never participated
15:46 🔗 alard https://twitter.com/#!/chepalleblog/status/138633712630902784
15:46 🔗 alard https://twitter.com/#!/Terrychan_/status/138639333560291328
15:46 🔗 alard Normal people have the same problem.
15:46 🔗 db48x2 are you saying that we are abnormal?
15:49 🔗 alard Well, visiting eight hundred thousand Splinder profiles in a week may be a little beyond what is normal... :)
15:51 🔗 alard It must be in a psychology manual somewhere.
16:05 🔗 SketchCow Wow, we've REALLY pulled ahead on the splinder.
16:06 🔗 tef well it is very very easy to help
16:07 🔗 SketchCow I agree.
16:07 🔗 SketchCow Definitely shows the benefits of a client.
16:08 🔗 SketchCow It's easier when you have a total-known.
16:09 🔗 tef well if the tracker just stored urls, and the clients could submit new ones
16:09 🔗 Konklone Using the client script made it extremely easy, I hadn't done this before
16:10 🔗 tef it would be a bit easier to adapt it
16:10 🔗 tef yeah I just convinced someone to join in because it's git-pull & shellscript
16:10 🔗 Konklone yeah, I guess the next step is generalizing it so you don't need a whole new codebase for each project
16:11 🔗 Konklone you could store the info server-side and have it fetch the project info from a project handle before it fetches project-specific stuff like the username
16:11 🔗 tef or make the client dumb
16:13 🔗 tef server stores list of things on the web to fetch, client just grabs it and uploads it. I guess the next bit would really be automating the uploads
16:14 🔗 SketchCow So wait, splinder is down?
16:14 🔗 Konklone yep :/
16:14 🔗 Konklone Or I'd be pulling more
16:16 🔗 tef I am seeing errors on this aws box
17:07 🔗 SketchCow So help me here... Splinder is down but we're still getting webpages?
17:07 🔗 SketchCow What's happening there.
17:07 🔗 Paradoks Konklone: I'm pretty sure that Alard wrote most of the script when dealing with me.com stuff, reused it again for Splinder, and again for Anyhub, so the codebase is getting a lot of use.
17:08 🔗 SketchCow Yes, it's generalized. I think alard is focusing first of relability, then portability and configuration.
17:09 🔗 yipdw Konklone: fyi, the tracker code is here -> https://github.com/ArchiveTeam/universal-tracker
17:10 🔗 kennethre How long as the ArchiveTeam been on github?
17:10 🔗 kennethre I could do a shoutout on the github blog
17:10 🔗 yipdw May 7, 2011
17:11 🔗 kennethre well i mean actively using it
17:11 🔗 yipdw May 7, 2011 :P
17:11 🔗 kennethre maybe i'll do a shoutout for splinder-grab
17:11 🔗 yipdw the first project put up there was a set of scripts for archiving Goole Video
17:11 🔗 yipdw er, Google
17:13 🔗 yipdw hmm
17:13 🔗 yipdw alard: as a medium-term issue, we should look into making more tracker state accessible via HTTP
17:14 🔗 yipdw alard: I just noticed that a lot of integration tests rely on accessing tracker state via the tracker object, which I guess is fine for now but feels a bit weird if you consider integration testing from a black-box perspective
17:15 🔗 yipdw it
17:15 🔗 yipdw er, where the hell did that "it" come from
17:19 🔗 yipdw but!
17:19 🔗 yipdw until then, I guess I'll get the tracker running in Travis
17:29 🔗 SketchCow We've been doing a lot of things on github.
17:29 🔗 SketchCow Trying to build tools people will like.
17:32 🔗 kennethre SketchCow: i'll mention it in the blog today
17:32 🔗 kennethre SketchCow: maybe it'll get more people interested
17:34 🔗 db48x2 kennethre: cool
17:34 🔗 DoubleJ exit
17:34 🔗 DoubleJ d'oh
17:34 🔗 yipdw sudo -iu build bash
17:34 🔗 DoubleJ c-a d didn't make it through apparently
17:38 🔗 closure wow, a lot of new splinder grabbers now
17:38 🔗 closure and wow, we're like 66% done
17:40 🔗 kennethre how many concurrent streams would you run on a pretty decently co-lo'd box?
17:40 🔗 kennethre doing 10 currently
17:40 🔗 closure I've been to 300 on linode w/o problem
17:40 🔗 closure 1000 on real hardware
17:40 🔗 kennethre well then
17:40 🔗 kennethre cranking it up :)
17:53 🔗 db48x2 whoa, it jumped all the way to 10k users/hour, and then half way to 15k/hour
17:56 🔗 closure woah</reeves>
17:59 🔗 closure has splinder.com been entirely DOSed now?
18:01 🔗 tef preservation of service, please
18:01 🔗 closure guess it's up, but sloow
18:03 🔗 Coderjoe so is splinder back now?
18:05 🔗 db48x2 not sure
18:05 🔗 db48x2 actually, it might be back but not working at all
18:06 🔗 db48x2 http://www.us.splinder.com/profile/pjvkzdvWg@tipic.com/:
18:06 🔗 db48x2 2011-11-21 10:04:17 ERROR 404: Not Found.
18:08 🔗 db48x2 yea, all of these are broken
18:08 🔗 db48x2 everything we are downloading right now is useless
18:10 🔗 Coderjoe hey kid. i'm a computer. stop all the downloadin.
18:10 🔗 Coderjoe I stopped my downloads earlier when things were just not working at all
18:11 🔗 Coderjoe yipdw, alard: I haven't looked at the tracker code yet. is there a "no more work" response from the tracker?
18:25 🔗 yipdw Coderjoe: yeah, if there's no more work to be issued, then POST /request returns 404
18:26 🔗 yipdw at some point I guess it might be appropriate to change that to a different response, but meh
18:27 🔗 yipdw because it's technically not a client error
18:27 🔗 yipdw but what's there is good enough for now
18:30 🔗 db48x2 ok, next step is to update fix-dld.sh so that it detects downloads where the profile returned 404
18:30 🔗 chronomex should I stop?
18:30 🔗 db48x2 then we make dld-profile.sh bail in that case
18:30 🔗 db48x2 chronomex: yes
18:30 🔗 chronomex forcibly?
18:30 🔗 db48x2 doesn't matter
18:31 🔗 chronomex ok
18:31 🔗 db48x2 everyone needs to stop
18:31 🔗 chronomex ----------------------------------
18:32 🔗 chronomex EVERYBODY KILL YOUR SPLINDER DOWNLOADERS
18:32 🔗 chronomex ----------------------------------
18:32 🔗 chronomex FORCIBLY
18:32 🔗 chronomex ----------------------------------
18:32 🔗 yipdw HELP COMPUTER
18:32 🔗 yipdw STOP ALL THE DOWNLOADIN'
18:32 🔗 db48x2 I have a meeting, so someone else needs to jump in and make the changes
18:32 🔗 db48x2 I'll be back in an hour or so
18:33 🔗 yipdw oh yeah.
18:33 🔗 yipdw http://travis-ci.org/#!/ArchiveTeam/universal-tracker
18:43 🔗 naomy ciao
18:43 🔗 naomy list
18:50 🔗 alard yipdw: Yes, I wondered about using the tracker object in integration tests. Adding more info accessible over HTTP just to run a test also feels a bit weird.
18:50 🔗 closure I assume my existing dld's that are "stuck" getting large users don't have this problem, and can run to completion.
18:50 🔗 closure (all 500 of them, sigh)
18:52 🔗 dnova wow, who is crawl??
18:53 🔗 alard yipdw: Also, is rspec the way to go for testing the tracker object? I have a couple of little tests now, but haven't decided yet whether it's better to use stub methods or a real Redis connection.
18:54 🔗 yipdw alard: I think that if the Redis connection is an integral part of the tracker object, it is fair to use a real connection in unit tests
18:55 🔗 yipdw otherwise, the stubbing gets out of hand
18:55 🔗 yipdw in my experience
18:55 🔗 alard Yes, okay. It's also easier to use with the redis.pipelined method.
18:55 🔗 yipdw doing that tends to litter the test with a lot of stub/mock noise that gets in the way of understanding what the test means
18:57 🔗 yipdw alard: as far as the integration tests go -- yeah, providing more information over HTTP is one of those fuzzy "dunno if this is a good idea" things. my main justification for doing that is to specify what behaviors a user (without access to the source code or similar hooks) would see
18:58 🔗 yipdw so, "if the tracker has usernames left, POST /request gives me a username" and "if the tracker has no pending work, POST /request returns 404" would not test internal state of the tracker unless it was externally observable
18:58 🔗 alard Another thing: maybe it's better to force the test runner to create a redis_conf.rb. I accidentally lost my redis db because I placed the redis_conf.rb in the wrong directory. (No important damage done, but still.)
18:58 🔗 yipdw oh
18:58 🔗 yipdw yeah, that's fine
18:59 🔗 alard The rspec tests probably will need to share that configuration anyway.
18:59 🔗 yipdw we can throw that into config/database.yml or something
18:59 🔗 alard That may be useful, yes. Well, I'll be back later.
19:00 🔗 yipdw seeya
19:05 🔗 dnova stop? really?
19:07 🔗 chronomex yes
19:07 🔗 chronomex splinder is belly up today
19:08 🔗 chronomex they also claim they will be around a while longer
19:08 🔗 chronomex like to january
19:09 🔗 chronomex but mostly we broke the site so hold off until they can fix it
19:22 🔗 dnova I don't get it
19:22 🔗 dnova people are still pulling data, so it appears
19:23 🔗 db48x2 dnova: yes, but it's all empty
19:23 🔗 dnova shit.
19:23 🔗 db48x2 dnova: grabbing the profile results in a 404, so we don't download any blogs, images, etc
19:23 🔗 dnova why are some >1mb
19:23 🔗 db48x2 could be intermittant
19:24 🔗 Coderjoe I wonder if they are trying to move it to a more robust backend... would be weird if they are shutting down in a little over a month
19:24 🔗 dnova well, shit.
19:24 🔗 dnova ok.
19:24 🔗 db48x2 actually, it might just be the us ones that are broken
19:24 🔗 dnova well soooooo
19:24 🔗 db48x2 didn't actually check any it ones
19:25 🔗 db48x2 but it was giving an error message earlier
19:25 🔗 dnova might it be ok now?
19:25 🔗 db48x2 someone needs to update fix-dld.sh to go back and find these and redo them
19:25 🔗 db48x2 hmm
19:26 🔗 db48x2 the front page is back
19:27 🔗 Coderjoe has the tmpfs change been pulled into the main repo? (I'm at work and have not looked at the repos since last night)
19:28 🔗 dnova I am going to let my scripts continue unless I hear a more definitive statement about what's going on
19:31 🔗 SketchCow Obviously, a sanity checker is going to have to be implemented, to allow people to re-run stuff.
19:32 🔗 dnova yeah that's why I think I should keep going... it can't be 5mb of 404s can it?
19:32 🔗 dnova some of these are not trivially small
19:33 🔗 SketchCow I leave it to alard to decide, he know the system better.
19:49 🔗 PepsiMax http://archiveteam.org/index.php?title=It_Died
19:49 🔗 PepsiMax Project status Online!?
19:49 🔗 PepsiMax whaaat
19:49 🔗 PepsiMax http://itdied.com/ "This domain has expired. Please renew it at Dynadot.com. "
19:50 🔗 Coderjoe oh no... metadeath
19:58 🔗 tef BANK OF ENGLAND NOTES:
19:58 🔗 tef welp
20:12 🔗 SketchCow So I suspect that we're likely to have people upload questionable sets into the rsync slots
20:12 🔗 SketchCow That's fine - as long as I have a script to run on there, I can shore them up.
20:12 🔗 ersi .. Like a boss
20:12 🔗 SketchCow talk to corporate
20:13 🔗 ersi I respectfully refuse
20:13 🔗 Angra greetings, I have a question about uploading. SketchCow directed me to ask alard, but I thought I would ask here so more people can see the answer
20:14 🔗 Angra I have been given a rsync command line
20:14 🔗 Angra I am running the streaming grabber
20:14 🔗 Angra can I run the rsync while the streaming grabber is running, or should I stop it first?
20:14 🔗 ersi You can run it at the same time
20:15 🔗 ersi Just make sure you run it more than once though
20:15 🔗 ersi Every time you run it, it'll sync what has changed since the last time you ran it
20:15 🔗 Angra thanks! Next question: is OK to run the same command line from multiple machines?
20:16 🔗 ersi Depends, I'd say :P
20:17 🔗 Angra OK - I am running the streaming grabber on 3 separate machines
20:32 🔗 yipdw wait, is Splinder even up?
20:33 🔗 yipdw I still can't contact it
20:34 🔗 yipdw oh, there it goes
20:34 🔗 yipdw wow, 17 second response time
20:46 🔗 PepsiMax Angra: have you tried running the script more then once on the same box
20:46 🔗 PepsiMax worked for anyhub and splinder
20:47 🔗 Angra yes, I have rerun from the same box, it seems to work OK
20:47 🔗 Angra I do seem to be getting some errors:
20:47 🔗 Angra sent 20146625 bytes received 490529 bytes 655147.75 bytes/sec total size is 4561197631 speedup is 221.02 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.8]
20:51 🔗 Coderjoe er
20:51 🔗 Coderjoe it will also upload the temporary files as well
20:52 🔗 Coderjoe there is a script for uploading just the complete stuff, though I don't know what state it is in
20:59 🔗 kennethre is it safe to run it now?
21:02 🔗 Angra Coderjoe: are you saying you'd rather me not upload while the stream downloader is running? I'll do whatever is best for the project
21:05 🔗 alard Angra: It's probably best to use the upload script from the git repository. It should only upload things that are done, and I think it should also work with the stream downloader.
21:07 🔗 alard cd to your script's directory ; ./upload-finished.sh <SERVER>::<MODULE>/splinder/
21:11 🔗 Angra http://www.archiveteam.org/index.php?title=Splinder implies that I should use batcave.textfiles.com for <SERVER> and Angra for <MODULE> is that right?
21:11 🔗 alard Yes.
21:12 🔗 Angra OK I get an error when doing that:
21:12 🔗 Angra :~/splinder-grab>./upload-finished.sh batcave.textfiles.com::Angra/splinder/ @ERROR: Unknown module 'Angra' rsync error: error starting client-server protocol (code 5) at main.c(1524) [sender=3.0.7]
21:12 🔗 alard Angra: Yes, I see, for module you should use the module name SketchCow has given to you.
21:12 🔗 alard From the list of module on batcave, I believe it's angra with a small a.
21:13 🔗 Angra aha! yes!
21:13 🔗 Angra thank you. conerting uploads to that script now
21:15 🔗 closure SketchCow: I'd promised you a Berlios finish up script to run.. here it is: http://216.41.255.233/berlios/burp
21:19 🔗 PatC Hello
21:32 🔗 SketchCow Yo, pat
21:34 🔗 bsmith094 are you gonna make the splinder deadline?
21:34 🔗 SketchCow They APPEAR to have shifted it.
21:43 🔗 closure http://www.procionegobbo.it/blog/2011/11/splinder-chiude/ does not seem very official
21:44 🔗 closure ah, 31 Gennaio 2012 on slinder.com now
22:37 🔗 Coderjoe haha
22:37 🔗 Coderjoe http://www.archive.org/post/401968/we-have-2tb-of-data-to-upload
22:42 🔗 dashcloud so it looks like splinder is still going?
22:52 🔗 underscor Coderjoe: lol
23:11 🔗 Ymgve Coderjoe: I don't see the problem as long as the 2 tb is public
23:11 🔗 Ymgve but that's probably not what they meant
23:25 🔗 db48x2 Coderjoe: heh
23:58 🔗 underscor Man
23:59 🔗 underscor Sublime Text 2 is the fucking shizzle

irclogger-viewer