[00:07] quick question whats this? mode (+ooo bsmith094 chronomex Paradoks) by underscor [00:07] of quick save in [00:14] bsmith094: op [00:15] and i dont speak irc codes? [00:15] residue codes from algebra to make it beep and shake [00:15] Nothing, just means you're an operator of the channel [00:15] s operator overloading'gotchas'? [00:15] !shutup [01:04] underscor: don't tell me you brought a markov-chaining bot in here [01:05] brought to light that the deal with him [01:05] db48x: Ok, I won't tell you [01:06] heh [01:07] i just checked the project page, how in the name of sanity do you plane to coordinate the backup of 200 *terabytes* of data? [01:07] Simple [01:08] We're insane [01:08] :D [01:08] Fixes all the problems [01:08] heh [01:09] underscor was bucked from a horse. This terrible calamity has slowed them 0 days, 02:59:32 from level 39. [01:09] you tolkein about bieber slowed down http : //www [01:09] Damn [01:32] db48x: Do you have pushrights to splinder-grab? [01:32] yes [01:33] Will you add this to the end of dld-single.sh? [01:34] After line 117 [01:34] curl http://71.126.138.142/done.php >/dev/null 2>&1 & [01:34] :D [01:34] hrm [01:34] what does that do? [01:34] besides slow everyone down? [01:35] It doesn't slow anyone down? [01:35] a little bit, yes [01:35] Alright, fine. It's nothing important :P [01:36] I'm not averse to it if it's interesting [01:36] or something interesting to talk about important issues, and a community needing buy in from $x [01:38] also, are you on github? [01:39] I think so [01:39] Can't remember [01:40] All it does is play a sound on my computer and flash the LEDs around my room [01:40] So I can fall asleep to people saving splinder [01:41] awesome [01:44] whoever 98.207 is is doing a lot [01:49] that's me :) [01:51] is there a quick way to see how many useres i have and how much sapce there taking up [01:51] du -chs [01:52] wow only 3gb in over 20 hours thats insanely slow [01:53] yea [01:53] the site isn't all that fast [01:53] are their servers just belching blue smoke by now or what? [01:54] heh [01:54] probably [01:54] remember that there is a lot of connection set up time for each file you download [01:55] could it just reuse the same connection like wget usually does [01:55] yea, I'm not actually sure if wget does that [01:55] with wget [01:56] anyway, all the more reason to run more clients [01:56] to irc clients are always crazy [02:02] im running 200 threads right now [02:11] so I just saw SketchCow's tweets about the OWS Library [02:12] I had no clue that was happening [02:12] and after reading some articles on it, I'm still not entirely sure what the hell it's about [02:37] Short form. [02:37] OWS had a library, a tent with donated books. [02:37] Many of them rather red and protesty, but also just general books [02:38] During the clear-out of the park, the books were mostly destroyed, a few saved. [02:38] Small finger pointing, ALA made fun of the "librarians" for abandoning the library, librarians going "You weren't there, man... you didn't know..." [02:38] abandoning [02:38] Anyway [02:38] So they decided to rebuild it. [02:38] to rebuild [02:39] I went "Oh come on." [02:39] Twitter Fight results [02:39] ah ha [02:39] that's useful background info [02:39] So they've got people going around in a bookmobile, i.e. a car, picking up books to drop back into the library. [02:39] I have said they should 1. Put it on a van or something 2. Shouldn't put unique books there [02:41] Anyway, boring, what I ended up doing was letting tons of people know about openlibrary. [02:41] yeah, that's another project I didn't know about until about half an hour ago [02:42] It happens. [02:42] All this crap has a publicity issue [02:42] No Jimbo Wales or Cory Doctorow to crow about it. [02:42] Well, not entirely. [02:42] They got me. I'm something. [02:43] I find it all a bit surreal [02:43] OWS is only about 800 miles from me, and Occupy Chicago is two orders of magnitude closer, and yet I've no idea what's going on [02:43] uncomfortably insulated, etc. [02:57] what rare/unique books were known to be donated to OWS? [03:00] The librarything listing of what was at OWS included older one-off books [03:01] Additionally, there were custom books being made at the site [03:01] Ok, I'm flinging a few more machines into the cause. [03:06] 800,000 users done!! [03:07] \o/ [03:09] It's looking like we're actually gonna make it [03:09] \o/ [03:14] not a whole lot of data going on for my clients [03:16] mmm [03:16] looks like a lot of urls going to www.sp... and files.sp... [03:25] OK, so, this tracker thing does not like FreeBSD. [03:26] Oh well. no contributing for me, yet. [03:27] Not with underscor bogarting it [03:42] closure's bogarting the splinder stats [03:42] those stats [03:43] what is zetathust [03:43] s what i love when shit breaks hardcore [03:45] mirc 6.21, probably running some chatterbot script [03:45] .. [03:45] or a forged mirc reply [03:45] but i think it is a chatterbot. and underscor's fault. [03:47] Hello everyone [03:54] bogarting? [03:54] http://en.wiktionary.org/wiki/bogart [03:54] Coderjoe: It's python actually [03:54] So it's a forged reply [03:54] :P [03:54] stealing, hogging, etc [03:54] oic [03:56] I'm surprised we haven't heard much from DADA S.p.A about the increased traffic [03:56] maybe, despite the crappy throughput, none of this is really having an impact on their operations [03:56] (alternative explanation: they don't give a shit) [03:57] or maybe they are thrilled people are taking this much of an interest for the end of days? [03:58] I dunno, they'd have to be pretty dense to look at their traffic and logs and go "gee, I guess this means we're drawing a new audience" [03:58] the case would be useful target audience for the future [04:03] hoq so i find out whats chewing up my disk [04:03] hardrive spinning like crazy for 20 min now [04:04] bsmith094: yea? [04:05] umm yeah will top sort by hd activity [04:05] no, but iotop will [04:05] thanks [04:05] you're welcome [04:06] ok all these little monitoring doodads should really be installed by default [04:06] heh, yea :) [04:07] are "@" chars valid in profile names [04:10] I think I've seen a few that use it [04:12] dnova: what's your setup like? [04:13] bsmith094: yeah, see e.g. http://www.us.splinder.com/profile/br0ken@tipic.com [04:13] I've only seen them on Splinder US [04:27] oh boy [04:27] Dear Amazon EC2 Customer, [04:27] We are very excited to announce the Public Beta of a new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge). [04:28] another way to burn money [04:28] heh [04:28] Cluster Compute Eight sounds like a pretty awesome band name [04:29] PUBLIC BETA by DAVID GUETTA feat. C.C. EIGHT [04:33] hmmm. [04:35] http://www.nytimes.com/2011/11/21/technology/quietly-google-puts-history-online.html?_r=3&pagewanted=all [04:36] haha, yipdw [04:37] THE DISK CAN'T EVEN HANDLE ME RIGHT NOW [04:55] I wish iotop could narrow the list of processes down to just those causing activity on certain disks [05:18] hey you guys, I altered dld-profile and dld-client and dld-streamer to use a tmpfs in ./tmpfs . who is interested in this? [05:18] the only benefit is it thrashes your disk a lot less. [05:19] I am [05:19] okay, one is enough ;) [05:19] through a branch up on github? [05:19] throw [05:19] what does it use the tmpfs for? [05:20] it's at https://github.com/chronomex/splinder-grab [05:20] and my list of problem profiles grows a bit longer. still all us profiles, though [05:20] you need to mount a tmpfs, read the beginning of the script to see what you ought do [05:21] O_o [05:21] filedir=/tmp/tmpfs/$username [05:21] wait did I fuck that up [05:21] holdon [05:22] no I did not. [05:22] what are you looking at? [05:22] you should not be seeing that. [05:22] your first tmpfs commit [05:22] yes, the first one is not the current one :) [05:22] not the "robustify" one [05:22] yes that was a proof of concept for my own use only [05:23] i know, but I figured the "robustify" would build on it [05:23] they figured out that trans fats means [05:23] chronomex: can you push that tmpfs work to a separate branch, and revert it from master? [05:23] I don't know how well it'll work on OpenSolaris [05:23] yipdw: hmm? [05:24] oh [05:24] O_o [05:24] I guess it doesn't die if you don't set up tmpfs [05:24] correct, that's what the robustify commit ensures. [05:24] ok [05:24] why is the "give underscor an epileptic seizure" addition in your "robustify" commit? [05:24] I'm just making sure I can update the code running on my Solaris box without getting weird errors [05:25] Coderjoe: uhhhh hm. I think that commit also happens to be a merge for some reason. [05:25] this is a dirty, badly done branch [05:25] because I did the version control wrong. [05:25] whatever. [05:25] wait [05:26] are you saying I can issue GET http://71.126.138.142/done.php and give underscor seizures [05:26] :P [05:26] Yes [05:26] yes [05:26] awesome [05:26] here comes ab -c 50 [05:26] lol [05:27] will this work without tmpfs? [05:27] it looks like it does [05:27] if there's no tmpfs directory, it just makes one and shoves files there [05:27] correct [05:28] it also complains. [05:45] db48x: Grr, you added an e to my name [05:45] ;) [05:45] : ) [05:45] heh [05:46] what is the largest download so far? [05:46] (not the warc size but the raw fetched data that would go in the tmpfs directory) [05:48] I don't know, tbh. [05:48] I'm seeing a steady state of about 1M per concurrent profile being downloaded for the tmpfs [05:49] btw, adding 5 more machines with 500 threads each [05:49] it has a sawtooth wave pattern [05:49] underscor: jesus christ [05:49] hehe [05:50] :( [05:51] "When in doubt, C4." [05:51] That wasn't very nice [05:51] okay, so what the fuck is splinder? is it really "italian geocities"? [05:51] Pretty much [05:51] It's a blog, and photo host [05:51] ok [05:51] brb. [05:55] woah [05:56] on nov 16 and 17, m1.small spot prices in us west spiked to $50 [05:56] over 100x increase? [05:57] wow [05:57] why???? [05:57] a typo perhaps? [05:57] it looks like the normal spot price is $0.036 [05:57] someone typed 50 instead of .50? [05:57] db48x: no. that wouldn't affect it [05:58] there just was a sudden demand for instances. (in which case the bid would have an effect.) [05:59] $50 spot bid... for when you REALLY don't want the instance to be killed due to price, but still want to (normally) save money [06:00] on the 18th, for a brief moment, m1.large spiked to $5 [06:01] on the 11th, m1.large spiked to $40 [06:02] Coderjoe: this was just for a little while, or all day? [06:02] as far as I can tell from the graphs, just a sample or two [06:02] hmmk [06:06] yay, one machine running [06:06] (crawl335) [06:06] you're not going to try and shoot up past closure? [06:07] spinning up some of the ia crawler nodes? [06:07] Hah, are you guys doing the scraping from amazon instances? [06:07] those instances btw [06:07] zetathust: get the fuck out, or shut up. [06:08] Coderjoe: Yeah, I finally got access to them [06:08] chronomex: That's not very nice [06:08] !replyrate 1 [06:08] Cameron_D: I was doing some mobileme scraping from an amazon instance. my wallet is not happy with me. I was tempted to spin up an instance for the splinder, but I don't know that I can afford it [06:08] !replyrate 0 [06:09] although splinder is dead in a few days, so it might be worth it [06:09] Access denied to chronomex. You are on the "BITCHES, LEAVE ME BE" list and cannot control me [06:09] wtf [06:09] I really need to get a colocated box with cheap bandwidth [06:09] ha [06:09] Coderjoe, ah [06:09] Come on [06:09] Don't kick it [06:09] no bots that talk [06:09] :( [06:10] Let's ask sj [06:10] SketchCow* [06:10] it's still here, but can't speake. [06:10] I would have less of a problem with it if it was helping with maintaining ops or something [06:10] He's off now [06:10] generally, bot should not speak unless spoken to, imo [06:11] correct [06:11] :( [06:12] underscor: crawl335 has 500 threads? [06:12] Yes [06:13] I think it's iobound [06:13] yeah, that's why I added tmpfs :) [06:13] This is running with tmpfs [06:13] o_o [06:14] any chance of the dashboard also giving the percentage of users remaining? [06:15] it gives the number done and remaining, just divide them [06:15] and I have yet to see my name on the dashboard :( [06:17] yea, that is weird [06:19] this isnt really the channel for it, but im in here on pidgin and i keep seeing these wired 5 pointed stars things next to certain usernames in the list, what are they? [06:19] Haha, I just remembered I have a VPS that is currently idle [06:20] * Cameron_D puts it to use [06:20] bsmith094: like who? [06:20] alard chronomex coderjoe ersi [06:20] these people are channel operators, they can kick people out and otherwise enforce order [06:21] is there a tut channel for all the weird syntax irc seems to have? [06:21] bsmith094: it's called google [06:22] oy alright then [06:22] because explaining all this shit gets old quick [06:22] iirc, there is a site named irchelp.com or something [06:22] i can imagine :) [06:24] ive heard there are global emergency irc channels for natural disasters, is that just a myth [06:24] there probably are [06:24] but they wouldn't be very reliable [06:27] if there are, I've never heard of one [06:27] you *can* get reports of disasters via IRC [06:27] it's happened before, e.g. with news of the Gulf War [06:27] like "holy shit earthquake" [06:28] but as far as I know there's no government-operated IRC channel for that [06:28] these days, IRC is pretty crappy for that anyway [06:29] for disasters that don't impact the Internet infrastructure too badly you're probably better off with Twitter [06:29] for worse things, amateur radio [06:29] relevent XKCD http://xkcd.com/723/ [06:29] also after 9/11/01, several people were outputting new channel caption feeds into various irc channels [06:29] for disasters that will kill off California and Twitter [06:29] amateur radio [06:29] followed by partying if you're up for that [06:29] when all else fails: amateur radio [06:30] woop woop woop off-topic siren [06:30] indeed [06:30] my Splinder downloaders are all clogged with failing US profiles [06:30] * Coderjoe fingers the shotgun while eying that siren [06:31] I keep getting downloader dying due to failing us profiles [06:31] there's an update that retries [06:31] maximum of five times [06:31] yes. it retries 5 times and then the script exits [06:31] oh [06:31] back to a prompt [06:32] dld-streamer.sh collects that status [06:32] (I'm not using dld-streamer at the moment [06:32] oh [06:32] I've been collecting the failures in a text file [06:33] would it be useful to build a log of all of the profiles that have blogs with dashes in their subdomain? [06:33] I can collect those under OS X [06:33] probably under some other OSes too [06:33] I know there's a partial list in the Splinder wiki article, but something more comprehensive might be useful [06:34] oh look. I restarted a client after it died on a us profile... and it gets another failing us profile [06:39] heh [06:41] ah, the joy of computers. [06:49] man [06:50] No idea what happened, but wow, it's slamming up [06:50] i just spun up an instance and told it to do 500 threads. I still haven't broken 300 [06:51] * BlueMax slaps chronomex. Insanity flows from his throat! [06:51] to finish splinder before 24-Nov 24:00 my time (it's 23:00 20-nov now), we'll have to average 2 users per second sustained. [06:52] this is looking doable [06:52] alard: are you open to replicating the Redis backend for the tracker? [06:54] mmm [06:55] stalled waiting for the tracker [07:04] it would be nifty, for dld-streamer.sh startup, if the tracker supported a means of requesting a chunk of IDs with one request [07:08] actually [07:08] that raises a good point [07:08] * yipdw adds an integration test suite for universal-tracker [07:08] i just now got over 300 [07:09] honestly, I think 300 is excessive [07:09] Splinder's servers don't seem particularly high-capacity [07:10] Coderjoe: hm. [07:10] for rev2, I want to push all the tracker interaction code into streamer or something similar [07:10] so it can manage it smartly [07:11] currently dld-streamer requests a username, and dld-profile marks it as done [07:11] not sure if you know this, but the tracker does have a service endpoint to release a work item [07:11] but that means not being able to run dld-single to retry a failed profile and have it mark it as done if successful [07:11] correct. [07:12] could be useful for e.g. constantly failing US profiles [07:12] I'm not sure what the best way to do it is [07:12] maybe a script that does both streamer and single modes, then. [07:12] you could put the tracker code in a library file as functions to call, and then source the library file [07:13] that way all the actual tracker interaction code is in one place, but still can be called by whichever scripts [07:13] right [07:57] ew [07:57] failing us profiles are going to clutter the tmpfs directory [07:58] as are any profiles that finish "with network errors" [07:58] I patched dld-profile.sh to fix that [07:59] after the version I grabbed then [07:59] yeah. git pull from my version should resolve it. [08:00] i'm not seeing a commit on your repo though [08:00] er [08:00] wait [08:00] "Robustify tmpfs support. Add reinforcement to dld-client.sh." "chronomex authored about 3 hours ago" [08:00] there it is [08:00] that one? [08:00] yes, that one. [08:01] commit nr b6a63644b55d01453930de7ddd2e0b6108a918e6 [08:01] i'm running a version that should have been after that and I don't have it in my wc [08:01] * Coderjoe checks stuff [08:01] so is it possible to pull from the firrerent branch without stopping everything? [08:01] *different [08:01] ought to be [08:01] it also would have been nice if you could have put all the log files in a subdirectory and had git ignore that subdir [08:01] that would have been nice, yes. [08:02] ok. I have that as my current revision [08:02] you can have git ignore all of those anyway [08:02] you missed some exits [08:02] hmmm, what line nrs? [08:03] Your hips are fat [08:03] oh yeah I missed loads. [08:03] 131, 140, 146, 192, 201 [08:03] haven't had any leaks yet tho :P [08:03] may I suggest perhaps using a trap [08:03] it's a trap! [08:04] 207 [08:04] I pushed a .gitignore file [09:01] alard: I've started some tests for the tracker code; see https://github.com/ArchiveTeam/universal-tracker/tree/integration-tests if you're interested [09:30] yipdw: Ah, good, I'll have a look later. [09:30] No problem with redis replication, by the way. [09:31] Although I'm replicating it to my own computer too, so I do have a copy of the data. [09:33] 37 GiB of incomplete users :-/ http://p.defau.lt/?z0oxTE2k2iUjIHUj8LLyOA [09:34] and I was downloading at 10 Mb/s until a hour ago, now superslow again, umpf [09:46] 50/50 Getting next username from tracker... downloading us:kllin@tipic.com [09:46] why does dld-streamer.sh do this? [09:46] 50/50 PID 13128 finished 'it:Emyll': Success. [09:46] find: "data/us/k/kl/kll/kllin@tipic.com/files/": File o directory non esistente [09:47] 50/50 Getting next username from tracker... downloading it:ammazzarvi [09:47] 50/50 PID 16984 finished 'us:kllin@tipic.com': Success. [09:47] is the @ confusing it? [10:07] hmm [10:07] but why does it look for it with find? [10:08] that's not the issue [10:08] 50/50 Getting next username from tracker... downloading it:Sam1279 [10:08] 50/50 Getting next username from tracker... downloading us:studyqueensland [10:08]