#archiveteam 2013-05-12,Sun

↑back Search

Time Nickname Message
00:10 🔗 shaqfu Are Warrior project scripts stored anywhere? Looking for the Tumblr one
00:13 🔗 balrog shaqfu: check github.com/archiveteam
00:13 🔗 balrog specifically https://github.com/ArchiveTeam/tumblr-grab
00:14 🔗 shaqfu AWesome, thanks
00:24 🔗 hneio having warrior troubles in #warrior
00:24 🔗 hneio please help
00:36 🔗 balrog hneio: looks like something alard has to fix when he gets back.
00:37 🔗 hneio OK
00:44 🔗 shaqfu wget 1.14 takes the regex engine of w/e's invoking it, right?
00:48 🔗 shaqfu (trying to adapt the wget from Warrior's tumblr script to bash)
01:29 🔗 shaqfu Odd - the command does its job, but Tumblr's segfaulting wget :(
03:08 🔗 shaqfu Oh, hm - does the Warrior project handle infinite scrolling?
03:25 🔗 omf_ The warrior uses wget-lua to fetch and process pages. It can handle pagination but does not have JavaScript processing ability so it cannot trigger the scroll event.
03:35 🔗 shaqfu Thanks; seems like a job for PhantomJS, then
11:54 🔗 Howlin1 So I'm still getting the "INIT: Id "2" respawning too fast: disasbled for 5 minutes" message and I don't know why.
12:13 🔗 ersi Howlin1: And it persists, even after rebooting the VM and such?
12:13 🔗 Howlin1 Yep
12:21 🔗 ersi Hmmm
12:25 🔗 Howlin1 Yea it started out of no where in the middle of a download
13:23 🔗 ersi Howlin1: If you log on your warrior and go to screen/tty2 (Alt+F2) - is there anything there?
13:32 🔗 Howlin1 Yea http://i.imgur.com/ODbVpX4.png
13:46 🔗 ersi Howlin1: Ah, that explains it. I'll look into it
13:55 🔗 Howlin1 Is it something wrong on my end?
13:59 🔗 alard Howlin1: Could you go to tty3 (Alt+F3), log in and run curl http://warriorhq.archiveteam.org/
13:59 🔗 alard It would be interesting to know if that returns something, or if it times out.
14:03 🔗 ersi curl -v http://warriorhq.archiveteam.org/ might even be more interesting
14:05 🔗 Howlin1 What login do I use?
14:05 🔗 alard root / archiveteam
14:07 🔗 Howlin1 That returns the html to the screen
14:08 🔗 Howlin1 Same goes if I use curl -v
14:10 🔗 ersi That's good
14:12 🔗 alard Could you try rm /home/warrior/projects/config.json
14:13 🔗 alard (and then restart the machine)
14:14 🔗 ersi (ie. the warrior)
14:23 🔗 Howlin1 Atm it's hanging on 'Preparing the data partition' part
14:31 🔗 alard And on tty2?
14:32 🔗 Howlin1 It just says
14:33 🔗 Howlin1 Starting the web interface on 0.0.0.0:8001
14:33 🔗 Howlin1 Warrior ID ''
14:33 🔗 Howlin1 Warrior ID ' '
14:51 🔗 Howlin1 I checked the localhost:8001 and it was up, but the second I started a project it creshed and now it's back to INIT: Id "2" respawing.....
15:02 🔗 PepsiMax ArchiveTeam Warrior HQ?
15:02 🔗 PepsiMax holy shizzle
15:02 🔗 PepsiMax it's the botnet idea
15:08 🔗 alard Howlin1: Strange. Your warrior ID should be a number.
15:09 🔗 Howlin1 Oh it is I just took that out, didn't know if it was to be a private number or not.
15:10 🔗 alard No, it's not private, it's just a number to keep track of individual warriors. So you also removed it from your screenshot? (Your warrior ID looks like four spaces there.)
15:10 🔗 alard It's also not important to know the number, but it's important to know that it is a number and not something else.
15:11 🔗 Howlin1 It is a number 2987
15:13 🔗 alard So when exactly does it stop working? When you select a project?
15:15 🔗 Howlin1 If I don't get the INIT: Id '2' thing then yes
15:35 🔗 PepsiMax holy shit, it working!
15:35 🔗 PepsiMax I am backing up posterous
15:44 🔗 Aranje :D
15:49 🔗 PepsiMax posterous.com-youtuo471.posterous.com-20130512-154428.warc.gz
15:49 🔗 PepsiMax :)
15:51 🔗 Howlin1 Is there anything I can do that will get the warrior to work?
15:55 🔗 hneio I've been asking in #warrior
15:55 🔗 hneio alard may or may not be working on it
15:56 🔗 hneio Howlin1: try powering off the appliance
15:56 🔗 hneio and restarting it
15:57 🔗 hneio hrm, nerp
15:57 🔗 hneio still broken for me
15:59 🔗 Howlin1 I have tried that and resetting the laptop and all, but nothing.
15:59 🔗 Howlin1 When you were working on a project, which one did you do?
16:09 🔗 hneio archiveteam choice
16:09 🔗 hneio I think it was doing the formspring project
16:10 🔗 GLaDOS Choice project is Formspring, yes.
16:11 🔗 Howlin1 That's the one that causes the error on it for me, but choosing posterous doesn't
16:12 🔗 GLaDOS Possible cause: the wget version used.
16:12 🔗 GLaDOS Posterous uses 20130120, Formspring uses 20130427
16:13 🔗 alard Ah, hmm. https://github.com/ArchiveTeam/formspring-grab/commit/0d9fe8166c82039b4779398f76243ba86ec9fa83#L0R96
16:19 🔗 Deewiant That exit() seems less than ideal
16:20 🔗 alard It's from my local version of pipeline.py, where I used it to test the Wget version check.
18:53 🔗 dashcloud so, here's a very simple hacked up script for dealing with DOS/Win floppy images- it lists the contents, volume label, and serial number (relies on mtools to do the actual work). http://paste.archivingyoursh.it/xihosirepu.bash
19:02 🔗 SketchCow alard: I am converting the mobilemes, which means the index may not work as well
19:02 🔗 SketchCow On the other hand, total warcness
19:02 🔗 chronomex TOTAL WARC
19:03 🔗 ersi WAAARC IT UUUP
19:05 🔗 SketchCow And therefore every item will get indexed within a week.
19:05 🔗 SketchCow (They got the backlog fixed with the wayback)
19:07 🔗 SmileyG :O
19:07 🔗 SmileyG can we push the IGN stuff in now? XD
19:17 🔗 SketchCow What do you mean by push in.
19:17 🔗 SketchCow (I have a lot on the plate, reaquaint me)
19:29 🔗 omf_ SketchCow, there is a few hundred gigs of content from the gaming sites going down. It is not a priority to upload, SmileyG is just impatient ;)
19:29 🔗 SketchCow Oh.
19:30 🔗 omf_ Plus we have to check the warcs for bad last records before upload and I gotta find the time to write a tool to do that otherwise it just hangs on derive like it did for the uploads I already did
19:34 🔗 SmileyG atm someo f them are on my account ;D
20:13 🔗 SketchCow I'm staying on top of the formspring and posterous uploads, so far.
20:17 🔗 alard SketchCow: We should redo the mobileme index when you're done. That should be relatively easy, if you're uploading the json index files.
20:19 🔗 SketchCow http://archive.org/details/mobileme-hero-1343163513
20:19 🔗 SketchCow That's gone through the wringer, now being derived.
20:21 🔗 SketchCow Man, there's 4,500+ of these things
20:22 🔗 alard Let's hope the tar files stay small.
20:22 🔗 SketchCow I hope to make this as automatic as possible.
20:29 🔗 alard Not sure if it matters, but it looks like you're uploading the json.gz and tar first, and then the big warc.gz. I think it's easier on the system if you upload the largest file first.
20:31 🔗 SketchCow The hint file will help in the future.
20:31 🔗 SketchCow Hint setting.
20:40 🔗 alard Yes, but I think uploading in the right order will also save you one derive task (and a 50GB internal rsync to the derive server).
20:55 🔗 SketchCow If I think I can do it, I'll do i
20:55 🔗 SketchCow t
20:57 🔗 SketchCow Spoiler: I can probably do it
20:58 🔗 SketchCow Next, I'll be putting your pump software in place and running, because otherwise doooooooom
21:00 🔗 Howlin1 I'll be moving home in a few weeks and that means I'm back to a 10gb a month download limit so I won't be able to help with the backing up, but is there anything else I could help with that doesn't require a lot of bandwidth?
21:00 🔗 SketchCow I'll tell you we can always use help cleaning up and maintaining the wiki, but it's rather unsexy and people burn out
21:44 🔗 SketchCow http://archive.org/details/mobileme-hero-1343163513
21:44 🔗 SketchCow woooosh
21:54 🔗 chronomex chunk
23:16 🔗 mistym Someone kick Chelsea27, it's a spammer.
23:17 🔗 PepsiMax mistym: get help in #help? or whats is it on efnet...
23:17 🔗 mistym PepsiMax: Right, just meant in this channel. It tried to send me some fake porn when I joined.
23:17 🔗 PepsiMax how long should one URL take?
23:17 🔗 PepsiMax Step 3 of 7 - Downloaded 680 URLs
23:17 🔗 PepsiMax fuuu
23:18 🔗 PepsiMax mistym: /j #help
23:18 🔗 omf_ bam
23:19 🔗 SmileyG o_O?
23:19 🔗 PepsiMax :p

irclogger-viewer