[03:13] Greetings all. Getting a curl error "curl: (22) The requested URL returned error: 500". whole error here http://pastebin.com/scMx6bkf [03:15] seeing a bunch of those but it seems that eventually the upload does happen. [03:16] dragondon: one (or more) of the upload endpoints is full [03:16] I'll be fixing it asap [03:16] Uploads should still work eventualyl [03:16] ok, cool. [03:16] as they're roundrobined between boxes in the cluster [03:16] yeah, that's what I did see from others. [03:16] eventually* [03:18] bah. VirtualBox won't start up my warrior VM, or a newly downloaded warrior VM. [03:20] What error> [03:21] NS_ERROR_FAILURE [03:21] "VBoxManage: error: The virtual machine 'archiveteam-warrior-2_1' has terminated unexpectedly during startup with exit code 0" [03:21] can i play around with the timeouts? [03:22] Doesn't seem to be particularly obvious. I guess this error covers a wide range of possible issues. [03:22] nintendud: are you trying to run without X [03:22] Sue: yup [03:22] The warrior worked before. [03:22] VBoxHeadless [03:22] Oh. Crpa. [03:23] Crap* [03:23] VBoxManage dies without X unless you specify headless or start with VBoxHeadless [03:23] I forgot that was the command. [03:23] don't forget to start with & [03:23] i had that same problem at first [03:24] yeah, I have it running in screen [03:25] Why not just run the pipeline outside? [03:25] Sue: thanks for the help. herp derp on my end. [03:26] underscor: fun; nintendud: np [05:09] Umm, "No item received. Retrying after 30 seconds..." and "Retrying CurlUpload for Item gourmetsexpress after 30 seconds..." are all I am getting now [05:10] 4 workers are getting "No item" and two are "retrying" [05:13] restarted VM, now all are 'retrying" [05:16] that was for BT Internet homepages [05:16] switched back to Webshots and downloading data now just fine [05:33] apparently some servers full [05:33] and they working on it [07:37] [04:23:22] < Sue> don't forget to start with & [07:37] if you forget, ctrl z, then bg, then disown [08:44] anyone able to kill http://www.us.archive.org/log_show.php?task_id=124346847 here? [08:47] There's a timeout 87457739 in there. [08:47] (Is quite a long time, probably.) [08:53] That is only 1012 days [08:55] lol [09:27] it wasn't enough last time [09:27] [ PDT: 2012-10-16 16:16:59 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml' [09:28] sorry [ PDT: 2012-09-28 04:45:24 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml' [09:29] [...] nice /usr/local/petabox/deriver/derive.php /var/tmp/autoclean/derive/EB1911WMF [...] failed with exit code: 9 [...] TASK FAILED AT UTC: 2012-10-02 18:11:37 [09:30] underscor? [09:32] Nemo_bis: Why do you need it killed? [09:38] underscor: because it will fail surely [09:39] and I want to update the images split in volumes now, so that it will work [09:39] *upload [09:41] ah [09:41] okay [09:41] well, I'll kill it :P [09:41] underscor: thanks [09:42] Interrupting task for task_id: 124346847 1 derive.php SERVER iw600709.us.archive.org USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 5646 82.8 1.1 750384 414324 ? RN Oct16 517:23 python /usr/local/petabox/sw/books/ol_search/solr_post.py EB1911WMF /var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml KILLING 5646 [09:42] cc Nemo_bis [09:47] underscor: thanks [09:49] ?tf2 [09:49] oops, wrong chnnel [09:52] hmpf only 650 kB/s upload even from a USA server [10:55] looks like theblaze tv did some bad audio sync for episode 2012-10-15 [10:57] what is funny is that the real news and wilkow on-demand works fine [11:10] i found a way to sync it [11:10] its off by 1.5 seconds [11:48] ffmpeg will resync stuff... [12:36] Hmm, IGN is up for sale and there is some talk of them possibly closing the boards (85.8 million posts), so that might be something to keep an eye on [12:38] :O [12:39] what's a good setting for number of instances when you have 50mbit/25mbit internet speeds? [12:44] I'm assuming that is for Webshots, in which case I'm not really sure, I haven't really been able to see how much bandwidth it uses [12:46] It also depends on the distance to the webshots servers. [12:46] 1mbit per thread generally [12:46] yeah, for webshots [12:46] appro [12:47] aprox * [12:47] .. [12:47] approx *** [12:47] at least, in my experience [12:49] alard: also how do you stop it again? STOP file in the same dir as the pipeline.py? [12:49] because it seems to be ignoring it [12:49] It finishes the current jobs first. [12:49] yeah, but it seems to keep doing jobs [12:49] Can you use the web interface? [12:49] what port does that run on? [12:49] 8001 [12:52] And I think the STOP file needs to be in the directory you launced it from, not the pipeline directory [12:52] ahh. [12:53] that ... sounds like a possible bug :P [12:56] That could be. [12:56] curl -d "" http://127.0.0.1:8001/api/stop works too. [12:56] <_case> if anyone feels like pondering a wget question re: hotlinked page requisitesâ¦ http://stackoverflow.com/questions/12934528/recursive-wget-with-hotlinked-requisites [12:58] Have you tried --no-parent? (Just a guess.) [13:03] yeah, and now I'm killing disk IO [13:23] is it safe to go back working on the BT project yet? [13:27] Is there anything left to do? [13:40] No, BT is done, that is: we need more usernames. [13:41] ok, stopped webshots projct, started BT :) [13:42] dragondon: There's nothing to do there. :) [13:42] huh? all done? [13:42] We've worked through our list of usernames. [13:43] There might be users that are not on our list, but we'll have to discover those usernames first. That isn't done by the warrior. [13:46] oh, that's what you meant. I thought you meant you needed more usernames completed. [14:00] no work for happy workers :( [18:39] Webshots numbers: S[h]O[r]T has uploaded 100,000 items; we've uploaded 20,000 GB. Hurray! [18:51] <[1]deathy> I would give S[h]O[r]T the internet as a prize, but apparently he can download it by himself ... [18:59] haha [21:03] ugh my warrior vm crashed [22:18] We need more usernames. It can't be so many. [22:25] http://news.cnet.com/8301-1023_3-57533820-93/news-corp-puts-ign-entertainment-up-for-auction/ [22:25] Probably been linked here. [22:26] The new IGN network will probably shutter/close multiple sites [22:28] ah. just read above. [22:37] https://docs.google.com/a/textfiles.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync#gid=0 [22:37] Watch as I do final signoff! [22:37] Anything with the deep blue on the left is going into wayback! [22:38] SketchCow: what is a MegaWARC? [22:39] A MegaWARC is a concatenation of warc files, allowing us to put thousands of individual warc grabs as one file. [22:39] I see [22:45] SketchCow: I still have MobileMe files that have not been uploaded yet. Problem with my hard drive. I cloned it to another and will finish recovering the files as soon as I can. [22:47] Just letting you know so you don't put up an incomplete copy on WayBack [22:49] I should be able to recover just about all of the files [22:50] But a few might be impossible to retrieve. [22:52] So I apologize in advance for my screw up. :) [22:59] https://archive.org/details/archiveteam-qaudio-archive-1 etc. not going in? [23:03] Sure it is. [23:04] I'm sure some stuff has escaped my gaze, hence my asking people to look over my shoulder at the google doc. [23:04] also 2-7 [23:07] Right. [23:07] No, on it. [23:07] They're all fine, though, they already were working. [23:07] Now I'm just bundling them. [23:08] http://archive.org/details/archiveteam-qaudio-archive will have it soon. [23:15] http://archive.org/details/archiveteam-qaudio-archive now fixed. [23:55] thanks for putting my isos in the linux format collection