[00:07] quick question whats this? mode (+ooo bsmith094 chronomex Paradoks) by underscor [00:07] of quick save in [00:14] bsmith094: op [00:15] and i dont speak irc codes? [00:15] residue codes from algebra to make it beep and shake [00:15] Nothing, just means you're an operator of the channel [00:15] s operator overloading'gotchas'? [00:15] !shutup [01:04] underscor: don't tell me you brought a markov-chaining bot in here [01:05] brought to light that the deal with him [01:05] db48x: Ok, I won't tell you [01:06] heh [01:07] i just checked the project page, how in the name of sanity do you plane to coordinate the backup of 200 *terabytes* of data? [01:07] Simple [01:08] We're insane [01:08] :D [01:08] Fixes all the problems [01:08] heh [01:09] underscor was bucked from a horse. This terrible calamity has slowed them 0 days, 02:59:32 from level 39. [01:09] you tolkein about bieber slowed down http : //www [01:09] Damn [01:32] db48x: Do you have pushrights to splinder-grab? [01:32] yes [01:33] Will you add this to the end of dld-single.sh? [01:34] After line 117 [01:34] curl http://71.126.138.142/done.php >/dev/null 2>&1 & [01:34] :D [01:34] hrm [01:34] what does that do? [01:34] besides slow everyone down? [01:35] It doesn't slow anyone down? [01:35] a little bit, yes [01:35] Alright, fine. It's nothing important :P [01:36] I'm not averse to it if it's interesting [01:36] or something interesting to talk about important issues, and a community needing buy in from $x [01:38] also, are you on github? [01:39] I think so [01:39] Can't remember [01:40] All it does is play a sound on my computer and flash the LEDs around my room [01:40] So I can fall asleep to people saving splinder [01:41] awesome [01:44] whoever 98.207 is is doing a lot [01:49] that's me :) [01:51] is there a quick way to see how many useres i have and how much sapce there taking up [01:51] du -chs [01:52] wow only 3gb in over 20 hours thats insanely slow [01:53] yea [01:53] the site isn't all that fast [01:53] are their servers just belching blue smoke by now or what? [01:54] heh [01:54] probably [01:54] remember that there is a lot of connection set up time for each file you download [01:55] could it just reuse the same connection like wget usually does [01:55] yea, I'm not actually sure if wget does that [01:55] with wget [01:56] anyway, all the more reason to run more clients [01:56] to irc clients are always crazy [02:02] im running 200 threads right now [02:11] so I just saw SketchCow's tweets about the OWS Library [02:12] I had no clue that was happening [02:12] and after reading some articles on it, I'm still not entirely sure what the hell it's about [02:37] Short form. [02:37] OWS had a library, a tent with donated books. [02:37] Many of them rather red and protesty, but also just general books [02:38] During the clear-out of the park, the books were mostly destroyed, a few saved. [02:38] Small finger pointing, ALA made fun of the "librarians" for abandoning the library, librarians going "You weren't there, man... you didn't know..." [02:38] abandoning [02:38] Anyway [02:38] So they decided to rebuild it. [02:38] to rebuild [02:39] I went "Oh come on." [02:39] Twitter Fight results [02:39] ah ha [02:39] that's useful background info [02:39] So they've got people going around in a bookmobile, i.e. a car, picking up books to drop back into the library. [02:39] I have said they should 1. Put it on a van or something 2. Shouldn't put unique books there [02:41] Anyway, boring, what I ended up doing was letting tons of people know about openlibrary. [02:41] yeah, that's another project I didn't know about until about half an hour ago [02:42] It happens. [02:42] All this crap has a publicity issue [02:42] No Jimbo Wales or Cory Doctorow to crow about it. [02:42] Well, not entirely. [02:42] They got me. I'm something. [02:43] I find it all a bit surreal [02:43] OWS is only about 800 miles from me, and Occupy Chicago is two orders of magnitude closer, and yet I've no idea what's going on [02:43] uncomfortably insulated, etc. [02:57] what rare/unique books were known to be donated to OWS? [03:00] The librarything listing of what was at OWS included older one-off books [03:01] Additionally, there were custom books being made at the site [03:01] Ok, I'm flinging a few more machines into the cause. [03:06] 800,000 users done!! [03:07] \o/ [03:09] It's looking like we're actually gonna make it [03:09] \o/ [03:14] not a whole lot of data going on for my clients [03:16] mmm [03:16] looks like a lot of urls going to www.sp... and files.sp... [03:25] OK, so, this tracker thing does not like FreeBSD. [03:26] Oh well. no contributing for me, yet. [03:27] Not with underscor bogarting it [03:42] closure's bogarting the splinder stats [03:42] those stats [03:43] what is zetathust [03:43] s what i love when shit breaks hardcore [03:45] mirc 6.21, probably running some chatterbot script [03:45] .. [03:45] or a forged mirc reply [03:45] but i think it is a chatterbot. and underscor's fault. [03:47] Hello everyone [03:54] bogarting? [03:54] http://en.wiktionary.org/wiki/bogart [03:54] Coderjoe: It's python actually [03:54] So it's a forged reply [03:54] :P [03:54] stealing, hogging, etc [03:54] oic [03:56] I'm surprised we haven't heard much from DADA S.p.A about the increased traffic [03:56] maybe, despite the crappy throughput, none of this is really having an impact on their operations [03:56] (alternative explanation: they don't give a shit) [03:57] or maybe they are thrilled people are taking this much of an interest for the end of days? [03:58] I dunno, they'd have to be pretty dense to look at their traffic and logs and go "gee, I guess this means we're drawing a new audience" [03:58] the case would be useful target audience for the future [04:03] hoq so i find out whats chewing up my disk [04:03] hardrive spinning like crazy for 20 min now [04:04] bsmith094: yea? [04:05] umm yeah will top sort by hd activity [04:05] no, but iotop will [04:05] thanks [04:05] you're welcome [04:06] ok all these little monitoring doodads should really be installed by default [04:06] heh, yea :) [04:07] are "@" chars valid in profile names [04:10] I think I've seen a few that use it [04:12] dnova: what's your setup like? [04:13] bsmith094: yeah, see e.g. http://www.us.splinder.com/profile/br0ken@tipic.com [04:13] I've only seen them on Splinder US [04:27] oh boy [04:27] Dear Amazon EC2 Customer, [04:27] We are very excited to announce the Public Beta of a new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge). [04:28] another way to burn money [04:28] heh [04:28] Cluster Compute Eight sounds like a pretty awesome band name [04:29] PUBLIC BETA by DAVID GUETTA feat. C.C. EIGHT [04:33] hmmm. [04:35] http://www.nytimes.com/2011/11/21/technology/quietly-google-puts-history-online.html?_r=3&pagewanted=all [04:36] haha, yipdw [04:37] THE DISK CAN'T EVEN HANDLE ME RIGHT NOW [04:55] I wish iotop could narrow the list of processes down to just those causing activity on certain disks [05:18] hey you guys, I altered dld-profile and dld-client and dld-streamer to use a tmpfs in ./tmpfs . who is interested in this? [05:18] the only benefit is it thrashes your disk a lot less. [05:19] I am [05:19] okay, one is enough ;) [05:19] through a branch up on github? [05:19] throw [05:19] what does it use the tmpfs for? [05:20] it's at https://github.com/chronomex/splinder-grab [05:20] and my list of problem profiles grows a bit longer. still all us profiles, though [05:20] you need to mount a tmpfs, read the beginning of the script to see what you ought do [05:21] O_o [05:21] filedir=/tmp/tmpfs/$username [05:21] wait did I fuck that up [05:21] holdon [05:22] no I did not. [05:22] what are you looking at? [05:22] you should not be seeing that. [05:22] your first tmpfs commit [05:22] yes, the first one is not the current one :) [05:22] not the "robustify" one [05:22] yes that was a proof of concept for my own use only [05:23] i know, but I figured the "robustify" would build on it [05:23] they figured out that trans fats means [05:23] chronomex: can you push that tmpfs work to a separate branch, and revert it from master? [05:23] I don't know how well it'll work on OpenSolaris [05:23] yipdw: hmm? [05:24] oh [05:24] O_o [05:24] I guess it doesn't die if you don't set up tmpfs [05:24] correct, that's what the robustify commit ensures. [05:24] ok [05:24] why is the "give underscor an epileptic seizure" addition in your "robustify" commit? [05:24] I'm just making sure I can update the code running on my Solaris box without getting weird errors [05:25] Coderjoe: uhhhh hm. I think that commit also happens to be a merge for some reason. [05:25] this is a dirty, badly done branch [05:25] because I did the version control wrong. [05:25] whatever. [05:25] wait [05:26] are you saying I can issue GET http://71.126.138.142/done.php and give underscor seizures [05:26] :P [05:26] Yes [05:26] yes [05:26] awesome [05:26] here comes ab -c 50 [05:26] lol [05:27] will this work without tmpfs? [05:27] it looks like it does [05:27] if there's no tmpfs directory, it just makes one and shoves files there [05:27] correct [05:28] it also complains. [05:45] db48x: Grr, you added an e to my name [05:45] ;) [05:45] : ) [05:45] heh [05:46] what is the largest download so far? [05:46] (not the warc size but the raw fetched data that would go in the tmpfs directory) [05:48] I don't know, tbh. [05:48] I'm seeing a steady state of about 1M per concurrent profile being downloaded for the tmpfs [05:49] btw, adding 5 more machines with 500 threads each [05:49] it has a sawtooth wave pattern [05:49] underscor: jesus christ [05:49] hehe [05:50] :( [05:51] "When in doubt, C4." [05:51] That wasn't very nice [05:51] okay, so what the fuck is splinder? is it really "italian geocities"? [05:51] Pretty much [05:51] It's a blog, and photo host [05:51] ok [05:51] brb. [05:55] woah [05:56] on nov 16 and 17, m1.small spot prices in us west spiked to $50 [05:56] over 100x increase? [05:57] wow [05:57] why???? [05:57] a typo perhaps? [05:57] it looks like the normal spot price is $0.036 [05:57] someone typed 50 instead of .50? [05:57] db48x: no. that wouldn't affect it [05:58] there just was a sudden demand for instances. (in which case the bid would have an effect.) [05:59] $50 spot bid... for when you REALLY don't want the instance to be killed due to price, but still want to (normally) save money [06:00] on the 18th, for a brief moment, m1.large spiked to $5 [06:01] on the 11th, m1.large spiked to $40 [06:02] Coderjoe: this was just for a little while, or all day? [06:02] as far as I can tell from the graphs, just a sample or two [06:02] hmmk [06:06] yay, one machine running [06:06] (crawl335) [06:06] you're not going to try and shoot up past closure? [06:07] spinning up some of the ia crawler nodes? [06:07] Hah, are you guys doing the scraping from amazon instances? [06:07] those instances btw [06:07] zetathust: get the fuck out, or shut up. [06:08] Coderjoe: Yeah, I finally got access to them [06:08] chronomex: That's not very nice [06:08] !replyrate 1 [06:08] Cameron_D: I was doing some mobileme scraping from an amazon instance. my wallet is not happy with me. I was tempted to spin up an instance for the splinder, but I don't know that I can afford it [06:08] !replyrate 0 [06:09] although splinder is dead in a few days, so it might be worth it [06:09] Access denied to chronomex. You are on the "BITCHES, LEAVE ME BE" list and cannot control me [06:09] wtf [06:09] I really need to get a colocated box with cheap bandwidth [06:09] ha [06:09] Coderjoe, ah [06:09] Come on [06:09] Don't kick it [06:09] no bots that talk [06:09] :( [06:10] Let's ask sj [06:10] SketchCow* [06:10] it's still here, but can't speake. [06:10] I would have less of a problem with it if it was helping with maintaining ops or something [06:10] He's off now [06:10] generally, bot should not speak unless spoken to, imo [06:11] correct [06:11] :( [06:12] underscor: crawl335 has 500 threads? [06:12] Yes [06:13] I think it's iobound [06:13] yeah, that's why I added tmpfs :) [06:13] This is running with tmpfs [06:13] o_o [06:14] any chance of the dashboard also giving the percentage of users remaining? [06:15] it gives the number done and remaining, just divide them [06:15] and I have yet to see my name on the dashboard :( [06:17] yea, that is weird [06:19] this isnt really the channel for it, but im in here on pidgin and i keep seeing these wired 5 pointed stars things next to certain usernames in the list, what are they? [06:19] Haha, I just remembered I have a VPS that is currently idle [06:20] * Cameron_D puts it to use [06:20] bsmith094: like who? [06:20] alard chronomex coderjoe ersi [06:20] these people are channel operators, they can kick people out and otherwise enforce order [06:21] is there a tut channel for all the weird syntax irc seems to have? [06:21] bsmith094: it's called google [06:22] oy alright then [06:22] because explaining all this shit gets old quick [06:22] iirc, there is a site named irchelp.com or something [06:22] i can imagine :) [06:24] ive heard there are global emergency irc channels for natural disasters, is that just a myth [06:24] there probably are [06:24] but they wouldn't be very reliable [06:27] if there are, I've never heard of one [06:27] you *can* get reports of disasters via IRC [06:27] it's happened before, e.g. with news of the Gulf War [06:27] like "holy shit earthquake" [06:28] but as far as I know there's no government-operated IRC channel for that [06:28] these days, IRC is pretty crappy for that anyway [06:29] for disasters that don't impact the Internet infrastructure too badly you're probably better off with Twitter [06:29] for worse things, amateur radio [06:29] relevent XKCD http://xkcd.com/723/ [06:29] also after 9/11/01, several people were outputting new channel caption feeds into various irc channels [06:29] for disasters that will kill off California and Twitter [06:29] amateur radio [06:29] followed by partying if you're up for that [06:29] when all else fails: amateur radio [06:30] woop woop woop off-topic siren [06:30] indeed [06:30] my Splinder downloaders are all clogged with failing US profiles [06:30] * Coderjoe fingers the shotgun while eying that siren [06:31] I keep getting downloader dying due to failing us profiles [06:31] there's an update that retries [06:31] maximum of five times [06:31] yes. it retries 5 times and then the script exits [06:31] oh [06:31] back to a prompt [06:32] dld-streamer.sh collects that status [06:32] (I'm not using dld-streamer at the moment [06:32] oh [06:32] I've been collecting the failures in a text file [06:33] would it be useful to build a log of all of the profiles that have blogs with dashes in their subdomain? [06:33] I can collect those under OS X [06:33] probably under some other OSes too [06:33] I know there's a partial list in the Splinder wiki article, but something more comprehensive might be useful [06:34] oh look. I restarted a client after it died on a us profile... and it gets another failing us profile [06:39] heh [06:41] ah, the joy of computers. [06:49] man [06:50] No idea what happened, but wow, it's slamming up [06:50] i just spun up an instance and told it to do 500 threads. I still haven't broken 300 [06:51] * BlueMax slaps chronomex. Insanity flows from his throat! [06:51] to finish splinder before 24-Nov 24:00 my time (it's 23:00 20-nov now), we'll have to average 2 users per second sustained. [06:52] this is looking doable [06:52] alard: are you open to replicating the Redis backend for the tracker? [06:54] mmm [06:55] stalled waiting for the tracker [07:04] it would be nifty, for dld-streamer.sh startup, if the tracker supported a means of requesting a chunk of IDs with one request [07:08] actually [07:08] that raises a good point [07:08] * yipdw adds an integration test suite for universal-tracker [07:08] i just now got over 300 [07:09] honestly, I think 300 is excessive [07:09] Splinder's servers don't seem particularly high-capacity [07:10] Coderjoe: hm. [07:10] for rev2, I want to push all the tracker interaction code into streamer or something similar [07:10] so it can manage it smartly [07:11] currently dld-streamer requests a username, and dld-profile marks it as done [07:11] not sure if you know this, but the tracker does have a service endpoint to release a work item [07:11] but that means not being able to run dld-single to retry a failed profile and have it mark it as done if successful [07:11] correct. [07:12] could be useful for e.g. constantly failing US profiles [07:12] I'm not sure what the best way to do it is [07:12] maybe a script that does both streamer and single modes, then. [07:12] you could put the tracker code in a library file as functions to call, and then source the library file [07:13] that way all the actual tracker interaction code is in one place, but still can be called by whichever scripts [07:13] right [07:57] ew [07:57] failing us profiles are going to clutter the tmpfs directory [07:58] as are any profiles that finish "with network errors" [07:58] I patched dld-profile.sh to fix that [07:59] after the version I grabbed then [07:59] yeah. git pull from my version should resolve it. [08:00] i'm not seeing a commit on your repo though [08:00] er [08:00] wait [08:00] "Robustify tmpfs support. Add reinforcement to dld-client.sh." "chronomex authored about 3 hours ago" [08:00] there it is [08:00] that one? [08:00] yes, that one. [08:01] commit nr b6a63644b55d01453930de7ddd2e0b6108a918e6 [08:01] i'm running a version that should have been after that and I don't have it in my wc [08:01] * Coderjoe checks stuff [08:01] so is it possible to pull from the firrerent branch without stopping everything? [08:01] *different [08:01] ought to be [08:01] it also would have been nice if you could have put all the log files in a subdirectory and had git ignore that subdir [08:01] that would have been nice, yes. [08:02] ok. I have that as my current revision [08:02] you can have git ignore all of those anyway [08:02] you missed some exits [08:02] hmmm, what line nrs? [08:03] Your hips are fat [08:03] oh yeah I missed loads. [08:03] 131, 140, 146, 192, 201 [08:03] haven't had any leaks yet tho :P [08:03] may I suggest perhaps using a trap [08:03] it's a trap! [08:04] 207 [08:04] I pushed a .gitignore file [09:01] alard: I've started some tests for the tracker code; see https://github.com/ArchiveTeam/universal-tracker/tree/integration-tests if you're interested [09:30] yipdw: Ah, good, I'll have a look later. [09:30] No problem with redis replication, by the way. [09:31] Although I'm replicating it to my own computer too, so I do have a copy of the data. [09:33] 37 GiB of incomplete users :-/ http://p.defau.lt/?z0oxTE2k2iUjIHUj8LLyOA [09:34] and I was downloading at 10 Mb/s until a hour ago, now superslow again, umpf [09:46] 50/50 Getting next username from tracker... downloading us:kllin@tipic.com [09:46] why does dld-streamer.sh do this? [09:46] 50/50 PID 13128 finished 'it:Emyll': Success. [09:46] find: "data/us/k/kl/kll/kllin@tipic.com/files/": File o directory non esistente [09:47] 50/50 Getting next username from tracker... downloading it:ammazzarvi [09:47] 50/50 PID 16984 finished 'us:kllin@tipic.com': Success. [09:47] is the @ confusing it? [10:07] hmm [10:07] but why does it look for it with find? [10:08] that's not the issue [10:08] 50/50 Getting next username from tracker... downloading it:Sam1279 [10:08] 50/50 Getting next username from tracker... downloading us:studyqueensland [10:08] 50/50 PID 1474 finished 'it:yhdts002': Success. [10:08] 50/50 PID 31601 finished 'it:Alias78': Success. [10:08] find: "data/us/s/st/stu/studyqueensland/files/": File o directory non esistente [10:08] 50/50 PID 2333 finished 'us:studyqueensland': Success. [10:09] perhaps it's just checking whether the files/ directory was successfully deleted or not [10:12] not sure [10:13] oh, I see [10:14] that's coming from dld-profile.sh [10:15] aww [10:15] found it [10:16] http://toolserver.org/~nemobis/wget-phase-1.log [10:16] doh [10:16] if grep -q "ERROR 50" "${userdir}/wget-phase-1.log" [10:17] and dld-streamer says "success"; I hope it doesn't tell to tracker that the user is done [10:18] it did [10:18] so what? [10:18] so we need to get a list of all of those users and redo them [10:18] aww [10:19] well, one "just" has to grep the whole data directory for those error 500, for each user check that there's no phase-2 or phase-3 and then launch a bunch of dld-single, I guess [10:20] I have no time to do it now, though [10:20] yes, that will work [10:20] but wait, I have infinite scrollback, I could just grep it [10:23] http://toolserver.org/~nemobis/streamer500.log [10:32] Nemo_bis: try setting LC_MESSAGES to en and see if you get english error messages out of wget [10:35] Nemo_bis: also, fix-dld.sh will do that for you [10:39] why do I need to change language? [10:42] I guess I'll run it at the end [10:42] you need to change the language because the downloader scripts are looking for specific strings that indicate that errors happeend [10:42] happened [10:43] if wget outputs traslated error messages, then the downloader scripts will fail to recognize errors [10:43] which means that a lot of the profiles you've reported as being done have actually not been done [10:45] ah [10:46] looks like I don't have en locale installed [10:46] but does this produce any error that fix-dld won't fix? [10:53] actually, setting LC_MESSAGES is probably bad [10:53] hm [10:53] so the only option is to adapt fix-dld and run it? [10:57] I think recompiling wget-warc so that it only ever uses english would be best [10:57] not that I can figure out how to do that [11:03] but in the mean time fixing up your copy of fix-dld.sh would let you fix them [11:46] would compiling wget-warc without nls (which seems also responsible of part of the dash problems) do the job? [14:38] hey waht is going on with splinder? [14:38] looks like the project is stalled [14:39] yep [14:39] slower than ever [14:39] neither of my boxes are doing anything [14:39] it looks like they're still working [14:39] ok it's not just me then? [14:41] ah, that was another download [14:41] what? [14:41] all wget-warc instances are at 0.000 KiB/s for me now [14:41] well hmph. [14:41] and I can't open it in my browser [14:42] damn. [14:44] oh, but looks like they issued an official statement that bit will close "only" on the 31st January [15:37] mmm [15:39] SketchCow: may I have an rsync slot? I have a few accounts to upload [15:42] Oh, there's way more to do - I will chip in a bit from work here, I doubt they'd care [15:43] so I'll take that rsync slot later, after the job is done [15:44] uh oh [15:44] splinder users/hour is down to 0 [15:44] see wiki [15:45] also [Nov 21 11 09:44] oh, but looks like they issued an official statement that bit will close "only" on the 31st January [15:45] ah ha [15:45] should we still grab it? [15:45] oh it's down [15:45] nice [15:45] also, they might have blackholed everyone that was doing mass downloading [15:45] well then maybe I do need that rsync slot, to give what I have so far [15:45] It's not much [15:45] but it is what I have. [15:46] no, this computer here has never participated [15:46] https://twitter.com/#!/chepalleblog/status/138633712630902784 [15:46] https://twitter.com/#!/Terrychan_/status/138639333560291328 [15:46] Normal people have the same problem. [15:46] are you saying that we are abnormal? [15:49] Well, visiting eight hundred thousand Splinder profiles in a week may be a little beyond what is normal... :) [15:51] It must be in a psychology manual somewhere. [16:05] Wow, we've REALLY pulled ahead on the splinder. [16:06] well it is very very easy to help [16:07] I agree. [16:07] Definitely shows the benefits of a client. [16:08] It's easier when you have a total-known. [16:09] well if the tracker just stored urls, and the clients could submit new ones [16:09] Using the client script made it extremely easy, I hadn't done this before [16:10] it would be a bit easier to adapt it [16:10] yeah I just convinced someone to join in because it's git-pull & shellscript [16:10] yeah, I guess the next step is generalizing it so you don't need a whole new codebase for each project [16:11] you could store the info server-side and have it fetch the project info from a project handle before it fetches project-specific stuff like the username [16:11] or make the client dumb [16:13] server stores list of things on the web to fetch, client just grabs it and uploads it. I guess the next bit would really be automating the uploads [16:14] So wait, splinder is down? [16:14] yep :/ [16:14] Or I'd be pulling more [16:16] I am seeing errors on this aws box [17:07] So help me here... Splinder is down but we're still getting webpages? [17:07] What's happening there. [17:07] Konklone: I'm pretty sure that Alard wrote most of the script when dealing with me.com stuff, reused it again for Splinder, and again for Anyhub, so the codebase is getting a lot of use. [17:08] Yes, it's generalized. I think alard is focusing first of relability, then portability and configuration. [17:09] Konklone: fyi, the tracker code is here -> https://github.com/ArchiveTeam/universal-tracker [17:10] How long as the ArchiveTeam been on github? [17:10] I could do a shoutout on the github blog [17:10] May 7, 2011 [17:11] well i mean actively using it [17:11] May 7, 2011 :P [17:11] maybe i'll do a shoutout for splinder-grab [17:11] the first project put up there was a set of scripts for archiving Goole Video [17:11] er, Google [17:13] hmm [17:13] alard: as a medium-term issue, we should look into making more tracker state accessible via HTTP [17:14] alard: I just noticed that a lot of integration tests rely on accessing tracker state via the tracker object, which I guess is fine for now but feels a bit weird if you consider integration testing from a black-box perspective [17:15] it [17:15] er, where the hell did that "it" come from [17:19] but! [17:19] until then, I guess I'll get the tracker running in Travis [17:29] We've been doing a lot of things on github. [17:29] Trying to build tools people will like. [17:32] SketchCow: i'll mention it in the blog today [17:32] SketchCow: maybe it'll get more people interested [17:34] kennethre: cool [17:34] exit [17:34] d'oh [17:34] sudo -iu build bash [17:34] c-a d didn't make it through apparently [17:38] wow, a lot of new splinder grabbers now [17:38] and wow, we're like 66% done [17:40] how many concurrent streams would you run on a pretty decently co-lo'd box? [17:40] doing 10 currently [17:40] I've been to 300 on linode w/o problem [17:40] 1000 on real hardware [17:40] well then [17:40] cranking it up :) [17:53] whoa, it jumped all the way to 10k users/hour, and then half way to 15k/hour [17:56] woah [17:59] has splinder.com been entirely DOSed now? [18:01] preservation of service, please [18:01] guess it's up, but sloow [18:03] so is splinder back now? [18:05] not sure [18:05] actually, it might be back but not working at all [18:06] http://www.us.splinder.com/profile/pjvkzdvWg@tipic.com/: [18:06] 2011-11-21 10:04:17 ERROR 404: Not Found. [18:08] yea, all of these are broken [18:08] everything we are downloading right now is useless [18:10] hey kid. i'm a computer. stop all the downloadin. [18:10] I stopped my downloads earlier when things were just not working at all [18:11] yipdw, alard: I haven't looked at the tracker code yet. is there a "no more work" response from the tracker? [18:25] Coderjoe: yeah, if there's no more work to be issued, then POST /request returns 404 [18:26] at some point I guess it might be appropriate to change that to a different response, but meh [18:27] because it's technically not a client error [18:27] but what's there is good enough for now [18:30] ok, next step is to update fix-dld.sh so that it detects downloads where the profile returned 404 [18:30] should I stop? [18:30] then we make dld-profile.sh bail in that case [18:30] chronomex: yes [18:30] forcibly? [18:30] doesn't matter [18:31] ok [18:31] everyone needs to stop [18:31] ---------------------------------- [18:32] EVERYBODY KILL YOUR SPLINDER DOWNLOADERS [18:32] ---------------------------------- [18:32] FORCIBLY [18:32] ---------------------------------- [18:32] HELP COMPUTER [18:32] STOP ALL THE DOWNLOADIN' [18:32] I have a meeting, so someone else needs to jump in and make the changes [18:32] I'll be back in an hour or so [18:33] oh yeah. [18:33] http://travis-ci.org/#!/ArchiveTeam/universal-tracker [18:43] ciao [18:43] list [18:50] yipdw: Yes, I wondered about using the tracker object in integration tests. Adding more info accessible over HTTP just to run a test also feels a bit weird. [18:50] I assume my existing dld's that are "stuck" getting large users don't have this problem, and can run to completion. [18:50] (all 500 of them, sigh) [18:52] wow, who is crawl?? [18:53] yipdw: Also, is rspec the way to go for testing the tracker object? I have a couple of little tests now, but haven't decided yet whether it's better to use stub methods or a real Redis connection. [18:54] alard: I think that if the Redis connection is an integral part of the tracker object, it is fair to use a real connection in unit tests [18:55] otherwise, the stubbing gets out of hand [18:55] in my experience [18:55] Yes, okay. It's also easier to use with the redis.pipelined method. [18:55] doing that tends to litter the test with a lot of stub/mock noise that gets in the way of understanding what the test means [18:57] alard: as far as the integration tests go -- yeah, providing more information over HTTP is one of those fuzzy "dunno if this is a good idea" things. my main justification for doing that is to specify what behaviors a user (without access to the source code or similar hooks) would see [18:58] so, "if the tracker has usernames left, POST /request gives me a username" and "if the tracker has no pending work, POST /request returns 404" would not test internal state of the tracker unless it was externally observable [18:58] Another thing: maybe it's better to force the test runner to create a redis_conf.rb. I accidentally lost my redis db because I placed the redis_conf.rb in the wrong directory. (No important damage done, but still.) [18:58] oh [18:58] yeah, that's fine [18:59] The rspec tests probably will need to share that configuration anyway. [18:59] we can throw that into config/database.yml or something [18:59] That may be useful, yes. Well, I'll be back later. [19:00] seeya [19:05] stop? really? [19:07] yes [19:07] splinder is belly up today [19:08] they also claim they will be around a while longer [19:08] like to january [19:09] but mostly we broke the site so hold off until they can fix it [19:22] I don't get it [19:22] people are still pulling data, so it appears [19:23] dnova: yes, but it's all empty [19:23] shit. [19:23] dnova: grabbing the profile results in a 404, so we don't download any blogs, images, etc [19:23] why are some >1mb [19:23] could be intermittant [19:24] I wonder if they are trying to move it to a more robust backend... would be weird if they are shutting down in a little over a month [19:24] well, shit. [19:24] ok. [19:24] actually, it might just be the us ones that are broken [19:24] well soooooo [19:24] didn't actually check any it ones [19:25] but it was giving an error message earlier [19:25] might it be ok now? [19:25] someone needs to update fix-dld.sh to go back and find these and redo them [19:25] hmm [19:26] the front page is back [19:27] has the tmpfs change been pulled into the main repo? (I'm at work and have not looked at the repos since last night) [19:28] I am going to let my scripts continue unless I hear a more definitive statement about what's going on [19:31] Obviously, a sanity checker is going to have to be implemented, to allow people to re-run stuff. [19:32] yeah that's why I think I should keep going... it can't be 5mb of 404s can it? [19:32] some of these are not trivially small [19:33] I leave it to alard to decide, he know the system better. [19:49] http://archiveteam.org/index.php?title=It_Died [19:49] Project status Online!? [19:49] whaaat [19:49] http://itdied.com/ "This domain has expired. Please renew it at Dynadot.com. " [19:50] oh no... metadeath [19:58] BANK OF ENGLAND NOTES: [19:58] welp [20:12] So I suspect that we're likely to have people upload questionable sets into the rsync slots [20:12] That's fine - as long as I have a script to run on there, I can shore them up. [20:12] .. Like a boss [20:12] talk to corporate [20:13] I respectfully refuse [20:13] greetings, I have a question about uploading. SketchCow directed me to ask alard, but I thought I would ask here so more people can see the answer [20:14] I have been given a rsync command line [20:14] I am running the streaming grabber [20:14] can I run the rsync while the streaming grabber is running, or should I stop it first? [20:14] You can run it at the same time [20:15] Just make sure you run it more than once though [20:15] Every time you run it, it'll sync what has changed since the last time you ran it [20:15] thanks! Next question: is OK to run the same command line from multiple machines? [20:16] Depends, I'd say :P [20:17] OK - I am running the streaming grabber on 3 separate machines [20:32] wait, is Splinder even up? [20:33] I still can't contact it [20:34] oh, there it goes [20:34] wow, 17 second response time [20:46] Angra: have you tried running the script more then once on the same box [20:46] worked for anyhub and splinder [20:47] yes, I have rerun from the same box, it seems to work OK [20:47] I do seem to be getting some errors: [20:47] sent 20146625 bytes received 490529 bytes 655147.75 bytes/sec total size is 4561197631 speedup is 221.02 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.8] [20:51] er [20:51] it will also upload the temporary files as well [20:52] there is a script for uploading just the complete stuff, though I don't know what state it is in [20:59] is it safe to run it now? [21:02] Coderjoe: are you saying you'd rather me not upload while the stream downloader is running? I'll do whatever is best for the project [21:05] Angra: It's probably best to use the upload script from the git repository. It should only upload things that are done, and I think it should also work with the stream downloader. [21:07] cd to your script's directory ; ./upload-finished.sh ::/splinder/ [21:11] http://www.archiveteam.org/index.php?title=Splinder implies that I should use batcave.textfiles.com for and Angra for is that right? [21:11] Yes. [21:12] OK I get an error when doing that: [21:12] :~/splinder-grab>./upload-finished.sh batcave.textfiles.com::Angra/splinder/ @ERROR: Unknown module 'Angra' rsync error: error starting client-server protocol (code 5) at main.c(1524) [sender=3.0.7] [21:12] Angra: Yes, I see, for module you should use the module name SketchCow has given to you. [21:12] From the list of module on batcave, I believe it's angra with a small a. [21:13] aha! yes! [21:13] thank you. conerting uploads to that script now [21:15] SketchCow: I'd promised you a Berlios finish up script to run.. here it is: http://216.41.255.233/berlios/burp [21:19] Hello [21:32] Yo, pat [21:34] are you gonna make the splinder deadline? [21:34] They APPEAR to have shifted it. [21:43] http://www.procionegobbo.it/blog/2011/11/splinder-chiude/ does not seem very official [21:44] ah, 31 Gennaio 2012 on slinder.com now [22:37] haha [22:37] http://www.archive.org/post/401968/we-have-2tb-of-data-to-upload [22:42] so it looks like splinder is still going? [22:52] Coderjoe: lol [23:11] Coderjoe: I don't see the problem as long as the 2 tb is public [23:11] but that's probably not what they meant [23:25] Coderjoe: heh [23:58] Man [23:59] Sublime Text 2 is the fucking shizzle