[00:10] Are Warrior project scripts stored anywhere? Looking for the Tumblr one [00:13] shaqfu: check github.com/archiveteam [00:13] specifically https://github.com/ArchiveTeam/tumblr-grab [00:14] AWesome, thanks [00:24] having warrior troubles in #warrior [00:24] please help [00:36] hneio: looks like something alard has to fix when he gets back. [00:37] OK [00:44] wget 1.14 takes the regex engine of w/e's invoking it, right? [00:48] (trying to adapt the wget from Warrior's tumblr script to bash) [01:29] Odd - the command does its job, but Tumblr's segfaulting wget :( [03:08] Oh, hm - does the Warrior project handle infinite scrolling? [03:25] The warrior uses wget-lua to fetch and process pages. It can handle pagination but does not have JavaScript processing ability so it cannot trigger the scroll event. [03:35] Thanks; seems like a job for PhantomJS, then [11:54] So I'm still getting the "INIT: Id "2" respawning too fast: disasbled for 5 minutes" message and I don't know why. [12:13] Howlin1: And it persists, even after rebooting the VM and such? [12:13] Yep [12:21] Hmmm [12:25] Yea it started out of no where in the middle of a download [13:23] Howlin1: If you log on your warrior and go to screen/tty2 (Alt+F2) - is there anything there? [13:32] Yea http://i.imgur.com/ODbVpX4.png [13:46] Howlin1: Ah, that explains it. I'll look into it [13:55] Is it something wrong on my end? [13:59] Howlin1: Could you go to tty3 (Alt+F3), log in and run curl http://warriorhq.archiveteam.org/ [13:59] It would be interesting to know if that returns something, or if it times out. [14:03] curl -v http://warriorhq.archiveteam.org/ might even be more interesting [14:05] What login do I use? [14:05] root / archiveteam [14:07] That returns the html to the screen [14:08] Same goes if I use curl -v [14:10] That's good [14:12] Could you try rm /home/warrior/projects/config.json [14:13] (and then restart the machine) [14:14] (ie. the warrior) [14:23] Atm it's hanging on 'Preparing the data partition' part [14:31] And on tty2? [14:32] It just says [14:33] Starting the web interface on 0.0.0.0:8001 [14:33] Warrior ID '' [14:33] Warrior ID ' ' [14:51] I checked the localhost:8001 and it was up, but the second I started a project it creshed and now it's back to INIT: Id "2" respawing..... [15:02] ArchiveTeam Warrior HQ? [15:02] holy shizzle [15:02] it's the botnet idea [15:08] Howlin1: Strange. Your warrior ID should be a number. [15:09] Oh it is I just took that out, didn't know if it was to be a private number or not. [15:10] No, it's not private, it's just a number to keep track of individual warriors. So you also removed it from your screenshot? (Your warrior ID looks like four spaces there.) [15:10] It's also not important to know the number, but it's important to know that it is a number and not something else. [15:11] It is a number 2987 [15:13] So when exactly does it stop working? When you select a project? [15:15] If I don't get the INIT: Id '2' thing then yes [15:35] holy shit, it working! [15:35] I am backing up posterous [15:44] :D [15:49] posterous.com-youtuo471.posterous.com-20130512-154428.warc.gz [15:49] :) [15:51] Is there anything I can do that will get the warrior to work? [15:55] I've been asking in #warrior [15:55] alard may or may not be working on it [15:56] Howlin1: try powering off the appliance [15:56] and restarting it [15:57] hrm, nerp [15:57] still broken for me [15:59] I have tried that and resetting the laptop and all, but nothing. [15:59] When you were working on a project, which one did you do? [16:09] archiveteam choice [16:09] I think it was doing the formspring project [16:10] Choice project is Formspring, yes. [16:11] That's the one that causes the error on it for me, but choosing posterous doesn't [16:12] Possible cause: the wget version used. [16:12] Posterous uses 20130120, Formspring uses 20130427 [16:13] Ah, hmm. https://github.com/ArchiveTeam/formspring-grab/commit/0d9fe8166c82039b4779398f76243ba86ec9fa83#L0R96 [16:19] That exit() seems less than ideal [16:20] It's from my local version of pipeline.py, where I used it to test the Wget version check. [18:53] so, here's a very simple hacked up script for dealing with DOS/Win floppy images- it lists the contents, volume label, and serial number (relies on mtools to do the actual work). http://paste.archivingyoursh.it/xihosirepu.bash [19:02] alard: I am converting the mobilemes, which means the index may not work as well [19:02] On the other hand, total warcness [19:02] TOTAL WARC [19:03] WAAARC IT UUUP [19:05] And therefore every item will get indexed within a week. [19:05] (They got the backlog fixed with the wayback) [19:07] :O [19:07] can we push the IGN stuff in now? XD [19:17] What do you mean by push in. [19:17] (I have a lot on the plate, reaquaint me) [19:29] SketchCow, there is a few hundred gigs of content from the gaming sites going down. It is not a priority to upload, SmileyG is just impatient ;) [19:29] Oh. [19:30] Plus we have to check the warcs for bad last records before upload and I gotta find the time to write a tool to do that otherwise it just hangs on derive like it did for the uploads I already did [19:34] atm someo f them are on my account ;D [20:13] I'm staying on top of the formspring and posterous uploads, so far. [20:17] SketchCow: We should redo the mobileme index when you're done. That should be relatively easy, if you're uploading the json index files. [20:19] http://archive.org/details/mobileme-hero-1343163513 [20:19] That's gone through the wringer, now being derived. [20:21] Man, there's 4,500+ of these things [20:22] Let's hope the tar files stay small. [20:22] I hope to make this as automatic as possible. [20:29] Not sure if it matters, but it looks like you're uploading the json.gz and tar first, and then the big warc.gz. I think it's easier on the system if you upload the largest file first. [20:31] The hint file will help in the future. [20:31] Hint setting. [20:40] Yes, but I think uploading in the right order will also save you one derive task (and a 50GB internal rsync to the derive server). [20:55] If I think I can do it, I'll do i [20:55] t [20:57] Spoiler: I can probably do it [20:58] Next, I'll be putting your pump software in place and running, because otherwise doooooooom [21:00] I'll be moving home in a few weeks and that means I'm back to a 10gb a month download limit so I won't be able to help with the backing up, but is there anything else I could help with that doesn't require a lot of bandwidth? [21:00] I'll tell you we can always use help cleaning up and maintaining the wiki, but it's rather unsexy and people burn out [21:44] http://archive.org/details/mobileme-hero-1343163513 [21:44] woooosh [21:54] chunk [23:16] Someone kick Chelsea27, it's a spammer. [23:17] mistym: get help in #help? or whats is it on efnet... [23:17] PepsiMax: Right, just meant in this channel. It tried to send me some fake porn when I joined. [23:17] how long should one URL take? [23:17] Step 3 of 7 - Downloaded 680 URLs [23:17] fuuu [23:18] mistym: /j #help [23:18] bam [23:19] o_O? [23:19] :p