[00:04] This wget has been hanging for +12 hours now [00:04] and the last few lines in the log are [00:04] 2011-11-09 08:24:47 URL:http://web.me.com/bobfarmer/20110726Web%20Cards/ps01/ps01_446.htm [2326/2326] -> "data/b/bo/bob/bobfarmer/web.me.com/files/web.me.com/bobfarmer/20110726Web Cards/ps01/ps01_446.htm" [1] [00:04] 2011-11-09 12:54:27 ERROR 404: Not Found. [00:04] http://web.me.com/bobfarmer/20110726Web%20Cards/ps20/: [00:04] http://web.me.com/bobfarmer/20110726Web%20Cards/ps20/feed.xml: [00:04] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/: [00:04] 2011-11-11 12:10:48 ERROR 404: Not Found. [00:04] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/feed.xml: [00:04] 2011-11-11 12:10:49 ERROR 404: Not Found. [00:04] It is now 00:04 server time [00:04] so it's been nearly 12 hours since the last update [00:04] alard: Do you think it's dead or something? [00:13] underscor: Maybe, maybe it's trying to download a very big file. Have you looked at the url list? [00:16] alard: There'd be an entry for every file, right? [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21 [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/feed.xml [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/ps21_001.htm [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/ps21_002.htm [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/ps21_003.htm [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/ps21_004.htm [00:16] http://web.me.com/bobfarmer/20110726Web%20Cards/ps21/ps21_005.htm [00:17] None of those are particularly large [00:17] No, so it's probably hanging. [00:17] k [00:17] ctrl-c, dld-single? [00:17] Yes. [00:18] k :) [00:26] I can't read chinese no [01:45] hola [01:46] there is an old friend of mine that has been running a mailing list for a long time. I'm not sure how much longer he's going to do it. [01:47] He just got the server back up, but it's pretty old [01:47] http://www.team.net/archive/ [01:47] he may have even older email archives from before the "new" mailer [01:48] mjb@autox.team.net [01:48] can anyone help grab that stuff before mark peters out? [01:53] DrFaustus: very probably [01:53] I believe someone here already has a mailman archiver [01:55] ah, it's even set up to let us download mbox files [01:55] indeed [01:56] you'd have to contact mark to see if he still has even older archive files around [01:56] he's an old fart of a unix admin from university of utah [01:56] so it'd likely be in an easy formar [01:57] well, mbox is never as easy as it should be [01:57] but it's easy enough :) [01:58] fair enough [01:59] those lists hold about fifteen years of technical discussions on just about every classic british sports car made [02:31] thanks guys [03:47] whoa, when did the splinder grab go from 990,000 users to 1.3 million? [03:47] oh, splinder.us [03:47] I see [04:05] where are the splinder scripts? github? [04:41] db48x: https://github.com/ArchiveTeam/splinder-grab [05:52] Wow, the splinder todo went way up! [05:53] Luckily, splinder looks to only be about a terabyte and a half [06:53] heh [06:53] http://memac.heroku.com/ is reporting 666 MB/user [06:53] I always knew Mac users were satanic [06:54] hahaha [09:55] damn. wget-warc is eating all my ram. [10:10] shit, that fucker ran for a day and a half before it got OOM killed [11:55] alard: I notice that when I visit memac.heroku.com, it's getting log messages about splinder :) [11:56] db48x: It's the same source. (But it's not showing them, I hope?) [12:11] So I just realised, I have a sizable ball of google groups to upload still. [12:11] Also a few chunks of berlios [12:25] I'm off to a Doc appointment now, but feel free to /msg me a place to rsync to; I'll get to it tonight. Sorry to be so late about this. [13:12] alard: yea, it's not showing them [13:13] alard: you should set up seperate streams for them [13:13] Why? It works? [13:15] it's just extra work every time a message comes in [13:16] anyway, the real reason I'm looking is that the splinder tracker kills my browser when it's open [13:29] hmm. updating the chart is super expensive [13:31] Yes, I'm looking at a way to have fewer points in the graph, that should help somewhat. [13:35] it could just update less frequently [14:21] db48x: Should be faster now. [14:29] that it is [15:24] alard: are you also alart? :) [15:46] Yes, typo. :) [15:46] Wyatt|Wor: you need to ask SketchCow for an rsync on batcave [19:03] i think i will start logging http://store.steampowered.com/stats/ [19:03] might be interesting to make a 365 day graph [19:10] Recipient: Bovine Ignition Systems [19:10] Amount: $100.00 [19:10] lol [19:30] haha [20:19] huh, this is kind of weird [20:19] https://gist.github.com/7723903aa5ff2c0fbeb3 [20:20] I got an error on the malacarne profile, when it attempted to download the blog from porno.splinder.com. [20:21] oh, the docum profile has been made unavailable [20:21] ok [20:32] Paradoks: haha, nice name [21:04] so it looks like we're download about 21.8k Splinder users/day [21:04] somehow, that doesn't seem fast enough [21:05] to the Amazon [21:05] 'course, that's moot if we're maxing their pipe already :P [21:06] when do we have til again? [21:11] it was 14 days, right? [21:11] that's only 300k [21:12] uh oh [21:12] how much data/bandwidth is it roughly if i joined as downloader? [21:12] Schbirid: tiny [21:12] Yeah [21:12] Like 0.8 mb/user [21:13] http://splinder.heroku.com/ [21:13] db48x: Download moar! [21:13] I'm limited by iops [21:13] I guess I can leave off sorting poetry for a while [21:13] aww :( [21:13] • :D [21:14] I'm pulling 4mbps right now [21:15] * db48x cancels three other tasks [21:15] ok, how do i join in? [21:16] pull from the git repository [21:16] I'm running 96 clients right now [21:16] <3 [21:17] Schbirid: https://github.com/ArchiveTeam/splinder-grab [21:17] 18-25% iowait, so that's probably just about perfectly balanced [21:18] RX bytes:10762759777147 (10.7 TB) TX bytes:12421364281615 (12.4 TB) [21:20] db48x: done! how interruptable is it? [21:20] i switch off my pc at night [21:20] touch STOP and it'll stop cleanly [21:21] nice [21:21] Schbirid: BLASPHEMY [21:21] NO SHUTTING OFF IN HERE [21:21] 21:21:52 up 18 days, 14:46, 1 user, [21:21] 16:22:03 up 7 days, 23:35, 1 user, [21:21] 13:22:22 up 26 days, 14:59, 2 users [21:22] :) [21:22] 15:24:01 up 18 days, 2:17, 2 users, [21:22] 21:22:55 up 41 days, 14:43, 1 user, load average: 0.32, 0.13, 0.25 [21:22] me@avatar:~$ uptime [21:22] yipdw: :( [21:22] ouch, seems to want python2 or something? [21:23] db48x: http://pastebin.com/vV9fu51i [21:23] of course, what that really means is "41 days since last kernel upgrade" [21:23] haha [21:23] since who the hell uses ksplice etc [21:23] ofc [21:23] my python is 3.2.2 by default, 2 would be python2 [21:23] Yeah, it uses 2.[5-7].x, iirc [21:24] uses/needs [21:24] do i just add [21:25] #!/usr/bin/python2 [21:25] to the soup py file? [21:25] Oh, I see, you have it installed aready [21:25] Yeah, change it to wherever python 2.x lives [21:25] Schbirid: substitute python2 for python at dld-profile.sh:88 [21:26] 0 1:24PM:abuie@teamarchive-0:/2/TBAG/mobileme-grab 3944 π du -sh data [21:26] 1.3T data [21:26] totally missed that, cheers [21:26] ha [21:26] underscor: good for you, my python is bigger though [21:26] lol [21:26] Just a *little* mobileme data [21:27] mobileme is a name i never heard anyone call IT [21:27] okok, i will stop ;D [21:27] :P [21:27] first time I've ever used the EU West EC2 region [21:27] yipdw: Work well? [21:27] dunno yet [21:27] we'll see [21:28] I wonder if a micro will be good enough [21:28] yeah, probably [21:29] working well now, thanks [21:29] 102 hour tar? [21:29] :(:(:(:(:(:(:(:(:(:(:(:(:(:(:( [21:32] underscor: what do you use to manage downloader instances? GNU parallel or something? [21:32] I figure if I'm going to get raped by Amazon EC2, I might as well deserve it [21:33] yipdw: tmux panes [21:33] Lemme take a screenshot [21:33] oh [21:34] hmm [21:34] https://gist.github.com/3018d5389a62de4d2caa [21:34] could be worse, I guess [21:34] http://i.imgur.com/MpNcW.png [21:35] yikes [21:36] :D [21:36] I like that I can still keep an eye on them though [21:36] I guess [21:36] I'm not likely to invest that much effort though :P [21:36] hmm [21:36] I guess I could have monit monitor them for me [21:36] haha [21:37] and periodically run dld-single on failed ones [21:37] ABSTRACTION SOLVES LAZINESS [21:38] Is monit good for this? [21:38] it's overkill [21:38] IMO [21:38] I've never used it, but heard of it before [21:39] I just want something to automatically restart clients that stop due to errors [21:39] oh [21:39] but a loop in bash does that just as wel [21:39] l [21:39] :P [21:39] while true; ./dld-client yipdw ;done [21:39] Yeah [21:39] hahah [21:39] yeah, more or less [21:39] it'll just screw up badly when we're done [21:39] or, more precisely, when the tracker has nothing left [21:42] yeah [21:42] but hopefully you'll be around when we get closeish [21:43] :D [21:43] that heroku page totally needs a users/timeunit per participant :) [21:43] yeah [21:43] * underscor winds [21:43] wins* [21:43] hahah [21:45] "Quadruple Extra Large Hi-Memory On-Demand Instance" [21:45] jeez [21:45] ha [21:45] just call it "Super Size Bigass Instance With Extra Fries" [21:45] "Now With More Molecules" [21:45] hahahaha, the akamai edge servers I'm downloading from are 2 hops away [21:45] It's basically "peering point"-> [21:46] Splinder uses Akamai? [21:46] "akamai's router" [21:46] No, mobile-me does [21:46] oh [21:46] I was like "damn, I've been hitting the wrong thing" [21:46] haha [21:47] http://tracker.archive.org/tracker.png [21:47] You can see where I stopped splinder overnight, haha [21:47] And then just started up mobile me [21:47] good job sir [21:47] wow, 40 MB of Splinder data for henrymusica [21:47] http://tracker.archive.org/batcave.png [21:47] that's the biggest I've seen yet [21:47] Mobileme's data goes straight to batcave [21:48] Nice little peak where it's started [21:48] Schbirid: I DON'T SEE YOU ON THE TRACKER YET.........;.......... [21:49] i see me and i am just passing db48x [21:49] Oh, are you spirit? [21:49] haha [21:49] yes [21:49] oh [21:49] grr :P [21:49] =( [21:49] Use your irc nick! [21:49] hehe [21:50] only germans get it :\ [21:50] no idea why [21:50] Google translate says nothing [21:51] that is a more profound statement than you know [21:51] :D [21:51] its just spirit pronounshed like shad [21:51] I like how the "Users downloaded" line on splinder pretty much follows my line at the beginning [21:52] A global "users/hour" counter would be nice [21:52] * underscor loves making all these feature requests for alard_ [21:54] !!!!!! [21:54] I officially have the highest bandwidth-used port at IA [21:56] heh [21:57] for splinder, how many parallel instances should i run? bandwidth is tiny but maybe saturation is elsewhere? [21:58] Saturation is disk io [21:59] Run ~10 and see if your iowait shoots up [22:15] hmm [22:15] [ec2-user@ip-10-227-178-174 it]$ sudo iostat [22:15] Linux 2.6.35.14-97.44.amzn1.x86_64 (ip-10-227-178-174) 11/12/2011 _x86_64_ (1 CPU) [22:15] avg-cpu: %user %nice %system %iowait %steal %idle [22:15] 2.63 0.00 2.91 2.19 19.40 72.87 [22:16] that's with 6 dld-clients on a t1.micro [22:16] I guess I can double that [22:19] underscor: http://splinder.heroku.com/ [22:20] alard_: I love you [22:20] Remind me to buy you a beer when I turn 21 [22:20] underscor: I think you can do alcohol mail-order, the only person who needs to be over 21 is the recipient iiuc [22:21] haha [22:21] wow, we're only pulling 500 kB/s? [22:21] I don't know how well international alcohol mail-order would go over [22:21] yipdw: it uses a linear interpolation of reported data [22:22] ahh, ok [22:22] f.e. I've been downloading this one user for 4 days [22:22] so I guess down clients will [22:22] yeah, and that [22:22] jeez [22:22] yipdw: And I'm not even sure it's completely correct, so it may be helpful to check the numbers. [22:22] how big is the WARC for that user? [22:22] yipdw: huge. wget-warc died the first time around thanks to my OOM killer. [22:23] damn [22:23] 18G and growing [22:23] one of splinder's top users [22:23] web.me.com is "at least 23879 files" [22:23] oh [22:23] mobileme [22:23] yeah, mobileme [22:23] I was looking at the splinder dash [22:24] * chronomex not doing splinder [22:27] bwahaha [22:27] underscor's monopoly on the splinder board is broken [22:28] well, was [22:28] What happened? [22:28] a bunch of other download clients finished [22:28] oic [22:30] heh [23:06] heh, oops [23:06] just realized this about the micro EC2 instance I was running for splinder: [23:06] Mem: 611252k total, 537012k used, 74240k free, 27696k buffers [23:06] Swap: 0k total, 0k used, 0k free, 423180k cached [23:35] Good news for anyone not underscor: you can click a name to hide that downloader from the graph, so you can see yourself a little better. http://splinder.heroku.com/ [23:35] Oh yeah, I haven't offered my congratulations yet. alard, great work on getting a patch accepted to wget! [23:35] Thanks! [23:35] alard: That's awesome [23:35] Feels good to remove everyone else [23:35] ;D [23:36] underscor: I thought you already did? [23:36] huh? [23:36] Wait, what are we graphing here? Is there some large-scale fetch task I missed in the hurlyburly of moving? [23:37] 's funny how the graph changes when I remove myself [23:37] Wyatt|Wor: splinder.com is shutting down in like 13 days [23:42] Oh, okay. The wiki page hasn't been updated. I take it I have to make an account first, then point these github scripts at my account and let it run? [23:44] Don't need an account [23:44] Clone the repo, ./get-wget-warc.sh, ./dld-client.sh Wyatt [23:44] (run a few of the clients if your io can take it) [23:48] Understood; I'll get on that then. [23:51] underscor: Maybe it's time for some ops? [23:52] :) [23:55] is there a reason why I am not a member of github.com/archiveteam anymore?