Time |
Nickname |
Message |
00:10
🔗
|
shaqfu |
Are Warrior project scripts stored anywhere? Looking for the Tumblr one |
00:13
🔗
|
balrog |
shaqfu: check github.com/archiveteam |
00:13
🔗
|
balrog |
specifically https://github.com/ArchiveTeam/tumblr-grab |
00:14
🔗
|
shaqfu |
AWesome, thanks |
00:24
🔗
|
hneio |
having warrior troubles in #warrior |
00:24
🔗
|
hneio |
please help |
00:36
🔗
|
balrog |
hneio: looks like something alard has to fix when he gets back. |
00:37
🔗
|
hneio |
OK |
00:44
🔗
|
shaqfu |
wget 1.14 takes the regex engine of w/e's invoking it, right? |
00:48
🔗
|
shaqfu |
(trying to adapt the wget from Warrior's tumblr script to bash) |
01:29
🔗
|
shaqfu |
Odd - the command does its job, but Tumblr's segfaulting wget :( |
03:08
🔗
|
shaqfu |
Oh, hm - does the Warrior project handle infinite scrolling? |
03:25
🔗
|
omf_ |
The warrior uses wget-lua to fetch and process pages. It can handle pagination but does not have JavaScript processing ability so it cannot trigger the scroll event. |
03:35
🔗
|
shaqfu |
Thanks; seems like a job for PhantomJS, then |
11:54
🔗
|
Howlin1 |
So I'm still getting the "INIT: Id "2" respawning too fast: disasbled for 5 minutes" message and I don't know why. |
12:13
🔗
|
ersi |
Howlin1: And it persists, even after rebooting the VM and such? |
12:13
🔗
|
Howlin1 |
Yep |
12:21
🔗
|
ersi |
Hmmm |
12:25
🔗
|
Howlin1 |
Yea it started out of no where in the middle of a download |
13:23
🔗
|
ersi |
Howlin1: If you log on your warrior and go to screen/tty2 (Alt+F2) - is there anything there? |
13:32
🔗
|
Howlin1 |
Yea http://i.imgur.com/ODbVpX4.png |
13:46
🔗
|
ersi |
Howlin1: Ah, that explains it. I'll look into it |
13:55
🔗
|
Howlin1 |
Is it something wrong on my end? |
13:59
🔗
|
alard |
Howlin1: Could you go to tty3 (Alt+F3), log in and run curl http://warriorhq.archiveteam.org/ |
13:59
🔗
|
alard |
It would be interesting to know if that returns something, or if it times out. |
14:03
🔗
|
ersi |
curl -v http://warriorhq.archiveteam.org/ might even be more interesting |
14:05
🔗
|
Howlin1 |
What login do I use? |
14:05
🔗
|
alard |
root / archiveteam |
14:07
🔗
|
Howlin1 |
That returns the html to the screen |
14:08
🔗
|
Howlin1 |
Same goes if I use curl -v |
14:10
🔗
|
ersi |
That's good |
14:12
🔗
|
alard |
Could you try rm /home/warrior/projects/config.json |
14:13
🔗
|
alard |
(and then restart the machine) |
14:14
🔗
|
ersi |
(ie. the warrior) |
14:23
🔗
|
Howlin1 |
Atm it's hanging on 'Preparing the data partition' part |
14:31
🔗
|
alard |
And on tty2? |
14:32
🔗
|
Howlin1 |
It just says |
14:33
🔗
|
Howlin1 |
Starting the web interface on 0.0.0.0:8001 |
14:33
🔗
|
Howlin1 |
Warrior ID '' |
14:33
🔗
|
Howlin1 |
Warrior ID ' ' |
14:51
🔗
|
Howlin1 |
I checked the localhost:8001 and it was up, but the second I started a project it creshed and now it's back to INIT: Id "2" respawing..... |
15:02
🔗
|
PepsiMax |
ArchiveTeam Warrior HQ? |
15:02
🔗
|
PepsiMax |
holy shizzle |
15:02
🔗
|
PepsiMax |
it's the botnet idea |
15:08
🔗
|
alard |
Howlin1: Strange. Your warrior ID should be a number. |
15:09
🔗
|
Howlin1 |
Oh it is I just took that out, didn't know if it was to be a private number or not. |
15:10
🔗
|
alard |
No, it's not private, it's just a number to keep track of individual warriors. So you also removed it from your screenshot? (Your warrior ID looks like four spaces there.) |
15:10
🔗
|
alard |
It's also not important to know the number, but it's important to know that it is a number and not something else. |
15:11
🔗
|
Howlin1 |
It is a number 2987 |
15:13
🔗
|
alard |
So when exactly does it stop working? When you select a project? |
15:15
🔗
|
Howlin1 |
If I don't get the INIT: Id '2' thing then yes |
15:35
🔗
|
PepsiMax |
holy shit, it working! |
15:35
🔗
|
PepsiMax |
I am backing up posterous |
15:44
🔗
|
Aranje |
:D |
15:49
🔗
|
PepsiMax |
posterous.com-youtuo471.posterous.com-20130512-154428.warc.gz |
15:49
🔗
|
PepsiMax |
:) |
15:51
🔗
|
Howlin1 |
Is there anything I can do that will get the warrior to work? |
15:55
🔗
|
hneio |
I've been asking in #warrior |
15:55
🔗
|
hneio |
alard may or may not be working on it |
15:56
🔗
|
hneio |
Howlin1: try powering off the appliance |
15:56
🔗
|
hneio |
and restarting it |
15:57
🔗
|
hneio |
hrm, nerp |
15:57
🔗
|
hneio |
still broken for me |
15:59
🔗
|
Howlin1 |
I have tried that and resetting the laptop and all, but nothing. |
15:59
🔗
|
Howlin1 |
When you were working on a project, which one did you do? |
16:09
🔗
|
hneio |
archiveteam choice |
16:09
🔗
|
hneio |
I think it was doing the formspring project |
16:10
🔗
|
GLaDOS |
Choice project is Formspring, yes. |
16:11
🔗
|
Howlin1 |
That's the one that causes the error on it for me, but choosing posterous doesn't |
16:12
🔗
|
GLaDOS |
Possible cause: the wget version used. |
16:12
🔗
|
GLaDOS |
Posterous uses 20130120, Formspring uses 20130427 |
16:13
🔗
|
alard |
Ah, hmm. https://github.com/ArchiveTeam/formspring-grab/commit/0d9fe8166c82039b4779398f76243ba86ec9fa83#L0R96 |
16:19
🔗
|
Deewiant |
That exit() seems less than ideal |
16:20
🔗
|
alard |
It's from my local version of pipeline.py, where I used it to test the Wget version check. |
18:53
🔗
|
dashcloud |
so, here's a very simple hacked up script for dealing with DOS/Win floppy images- it lists the contents, volume label, and serial number (relies on mtools to do the actual work). http://paste.archivingyoursh.it/xihosirepu.bash |
19:02
🔗
|
SketchCow |
alard: I am converting the mobilemes, which means the index may not work as well |
19:02
🔗
|
SketchCow |
On the other hand, total warcness |
19:02
🔗
|
chronomex |
TOTAL WARC |
19:03
🔗
|
ersi |
WAAARC IT UUUP |
19:05
🔗
|
SketchCow |
And therefore every item will get indexed within a week. |
19:05
🔗
|
SketchCow |
(They got the backlog fixed with the wayback) |
19:07
🔗
|
SmileyG |
:O |
19:07
🔗
|
SmileyG |
can we push the IGN stuff in now? XD |
19:17
🔗
|
SketchCow |
What do you mean by push in. |
19:17
🔗
|
SketchCow |
(I have a lot on the plate, reaquaint me) |
19:29
🔗
|
omf_ |
SketchCow, there is a few hundred gigs of content from the gaming sites going down. It is not a priority to upload, SmileyG is just impatient ;) |
19:29
🔗
|
SketchCow |
Oh. |
19:30
🔗
|
omf_ |
Plus we have to check the warcs for bad last records before upload and I gotta find the time to write a tool to do that otherwise it just hangs on derive like it did for the uploads I already did |
19:34
🔗
|
SmileyG |
atm someo f them are on my account ;D |
20:13
🔗
|
SketchCow |
I'm staying on top of the formspring and posterous uploads, so far. |
20:17
🔗
|
alard |
SketchCow: We should redo the mobileme index when you're done. That should be relatively easy, if you're uploading the json index files. |
20:19
🔗
|
SketchCow |
http://archive.org/details/mobileme-hero-1343163513 |
20:19
🔗
|
SketchCow |
That's gone through the wringer, now being derived. |
20:21
🔗
|
SketchCow |
Man, there's 4,500+ of these things |
20:22
🔗
|
alard |
Let's hope the tar files stay small. |
20:22
🔗
|
SketchCow |
I hope to make this as automatic as possible. |
20:29
🔗
|
alard |
Not sure if it matters, but it looks like you're uploading the json.gz and tar first, and then the big warc.gz. I think it's easier on the system if you upload the largest file first. |
20:31
🔗
|
SketchCow |
The hint file will help in the future. |
20:31
🔗
|
SketchCow |
Hint setting. |
20:40
🔗
|
alard |
Yes, but I think uploading in the right order will also save you one derive task (and a 50GB internal rsync to the derive server). |
20:55
🔗
|
SketchCow |
If I think I can do it, I'll do i |
20:55
🔗
|
SketchCow |
t |
20:57
🔗
|
SketchCow |
Spoiler: I can probably do it |
20:58
🔗
|
SketchCow |
Next, I'll be putting your pump software in place and running, because otherwise doooooooom |
21:00
🔗
|
Howlin1 |
I'll be moving home in a few weeks and that means I'm back to a 10gb a month download limit so I won't be able to help with the backing up, but is there anything else I could help with that doesn't require a lot of bandwidth? |
21:00
🔗
|
SketchCow |
I'll tell you we can always use help cleaning up and maintaining the wiki, but it's rather unsexy and people burn out |
21:44
🔗
|
SketchCow |
http://archive.org/details/mobileme-hero-1343163513 |
21:44
🔗
|
SketchCow |
woooosh |
21:54
🔗
|
chronomex |
chunk |
23:16
🔗
|
mistym |
Someone kick Chelsea27, it's a spammer. |
23:17
🔗
|
PepsiMax |
mistym: get help in #help? or whats is it on efnet... |
23:17
🔗
|
mistym |
PepsiMax: Right, just meant in this channel. It tried to send me some fake porn when I joined. |
23:17
🔗
|
PepsiMax |
how long should one URL take? |
23:17
🔗
|
PepsiMax |
Step 3 of 7 - Downloaded 680 URLs |
23:17
🔗
|
PepsiMax |
fuuu |
23:18
🔗
|
PepsiMax |
mistym: /j #help |
23:18
🔗
|
omf_ |
bam |
23:19
🔗
|
SmileyG |
o_O? |
23:19
🔗
|
PepsiMax |
:p |