Time |
Nickname |
Message |
00:06
🔗
|
|
Coderjoe has quit IRC (Ping timeout: 600 seconds) |
00:06
🔗
|
|
wp494 has quit IRC (Ping timeout: 335 seconds) |
00:06
🔗
|
|
nyu has quit IRC (Quit: leaving) |
00:07
🔗
|
|
Coderjoe has joined #archiveteam |
00:13
🔗
|
|
ete_ has quit IRC (Read error: Connection reset by peer) |
00:13
🔗
|
|
wp494 has joined #archiveteam |
00:13
🔗
|
|
ete_ has joined #archiveteam |
00:17
🔗
|
|
brayden has joined #archiveteam |
01:20
🔗
|
chfoo |
installing a new nginx+passenger seems to have made the problem go away |
01:21
🔗
|
chfoo |
for continued tracker discussion, join #warrior |
01:24
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
01:38
🔗
|
|
Kazzy has quit IRC (Quit: ZNC - http://znc.in) |
01:46
🔗
|
|
Kazzy has joined #archiveteam |
01:58
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:11
🔗
|
|
nyu has joined #archiveteam |
02:19
🔗
|
|
Ymgve has quit IRC () |
02:19
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
02:21
🔗
|
|
dashcloud has joined #archiveteam |
02:27
🔗
|
|
Kazzy has quit IRC (Quit: ZNC - http://znc.in) |
02:36
🔗
|
|
Kazzy has joined #archiveteam |
02:57
🔗
|
|
db48x has joined #archiveteam |
03:02
🔗
|
|
BiggieJon has joined #archiveteam |
03:08
🔗
|
|
khaoohs_ has joined #archiveteam |
03:08
🔗
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
03:23
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
03:24
🔗
|
|
dashcloud has joined #archiveteam |
03:43
🔗
|
|
xk_id has quit IRC (Ping timeout: 480 seconds) |
04:18
🔗
|
|
nyu has quit IRC (leaving) |
04:29
🔗
|
SketchCow |
Did we grab ivillage? |
04:36
🔗
|
|
ete_ has quit IRC (Remote host closed the connection) |
04:43
🔗
|
|
mistym has joined #archiveteam |
04:44
🔗
|
yipdw |
SketchCow: still in progress, ~119,000 URLs to go |
04:44
🔗
|
SketchCow |
Thanks. |
04:57
🔗
|
|
Nertsy has quit IRC (Quit: Nertsy) |
05:03
🔗
|
|
Nertsy has joined #archiveteam |
05:05
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:11
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
05:16
🔗
|
|
brayden has joined #archiveteam |
05:27
🔗
|
VonScoot |
SketchCow: what's yer beef with Binstock? |
05:28
🔗
|
SketchCow |
ha ha HA ha ha ha |
05:28
🔗
|
SketchCow |
Imagine there's a limpwriting machine |
05:28
🔗
|
SketchCow |
imagine he sat under it, on high, for most of the day |
05:28
🔗
|
SketchCow |
and then wrote that editorial |
07:28
🔗
|
|
primus104 has joined #archiveteam |
07:48
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
07:59
🔗
|
|
primus104 has quit IRC (Leaving.) |
08:00
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
08:07
🔗
|
SketchCow |
VonScoot: It was an idiot editorial. Someone showing how completely idiot and cluless he was. |
08:07
🔗
|
SketchCow |
Which is fine, one does not expect the online-only endgame of a long-standing magazine to have a winner at the helm. |
08:13
🔗
|
|
signius has joined #archiveteam |
08:43
🔗
|
|
ersi has quit IRC (Read error: Operation timed out) |
08:45
🔗
|
|
ersi has joined #archiveteam |
08:45
🔗
|
|
swebb sets mode: +o ersi |
08:55
🔗
|
|
schbirid has joined #archiveteam |
09:30
🔗
|
SketchCow |
http://imgur.com/gallery/bpkHSif |
09:55
🔗
|
|
wp494 has quit IRC (Ping timeout: 272 seconds) |
10:11
🔗
|
|
primus104 has joined #archiveteam |
10:19
🔗
|
|
APerti has quit IRC () |
10:20
🔗
|
cadbury_ |
is it normal for the warrior to restart itself? |
10:24
🔗
|
midas |
yep |
10:25
🔗
|
|
wp494 has joined #archiveteam |
10:39
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:45
🔗
|
|
xk_id has joined #archiveteam |
11:02
🔗
|
|
fluff is now known as fluff_ |
11:32
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
11:34
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:35
🔗
|
ersi |
cadbury_: Yeah, it's to make sure it's updated and that the project code is updated. |
11:38
🔗
|
|
dashcloud has joined #archiveteam |
11:45
🔗
|
|
Ymgve has joined #archiveteam |
11:49
🔗
|
|
Selanda has quit IRC (Ping timeout: 252 seconds) |
11:50
🔗
|
cadbury_ |
nice, i shan't worry about it then |
11:50
🔗
|
cadbury_ |
presumably i can start as many warriors as i like? |
12:43
🔗
|
|
primus has quit IRC (Read error: Connection reset by peer) |
12:49
🔗
|
|
MorbusIff has quit IRC (Quit: http://www.disobey.com/) |
12:52
🔗
|
|
Morbus has joined #archiveteam |
12:52
🔗
|
|
brayden_ has joined #archiveteam |
12:58
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
13:00
🔗
|
|
brayden has joined #archiveteam |
13:01
🔗
|
|
Sellyme_ has quit IRC (Ping timeout: 246 seconds) |
13:04
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
13:06
🔗
|
|
Sellyme has joined #archiveteam |
13:07
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
13:11
🔗
|
balrog |
"""Activist investors are pushing for a Yahoo-AOL merge"??? |
13:14
🔗
|
db48x |
cadbury_: in theory, sure. however, running too many things on one IP address can get that address banned |
13:15
🔗
|
db48x |
cadbury_: that said, if you want to get a bit more involved you can run the software outside of the warrior VM, where you'll have a lot more flexibility |
13:20
🔗
|
|
ruukasu has joined #archiveteam |
13:24
🔗
|
|
ruukasu has quit IRC (Client Quit) |
13:25
🔗
|
|
ruukasu has joined #archiveteam |
13:25
🔗
|
|
ruukasu has quit IRC (Client Quit) |
13:26
🔗
|
|
ruukasu has joined #archiveteam |
13:26
🔗
|
|
ruukasu has quit IRC (Client Quit) |
13:27
🔗
|
joepie91 |
balrog: oh god |
13:27
🔗
|
joepie91 |
that can only go wrong |
13:27
🔗
|
joepie91 |
horribly, horribly wrong |
13:29
🔗
|
|
ruukasu has joined #archiveteam |
13:37
🔗
|
|
sankin has joined #archiveteam |
13:42
🔗
|
midas |
how could that go wr. |
13:43
🔗
|
midas |
gone. |
13:43
🔗
|
midas |
all gone. |
13:55
🔗
|
|
brayden has joined #archiveteam |
14:13
🔗
|
|
brayden has quit IRC (Ping timeout: 606 seconds) |
14:14
🔗
|
w0rp |
God help us all. |
14:16
🔗
|
|
brayden has joined #archiveteam |
14:20
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
14:32
🔗
|
cadbury_ |
db48x: what does running outside of the warrior change/do? |
14:34
🔗
|
Kazzy |
given you more control over what you run, essentially |
14:34
🔗
|
Kazzy |
instead of running 5 vm's, just run the scripts 5 times, less overhead |
14:35
🔗
|
cadbury_ |
oh, well that makes sense |
14:35
🔗
|
cadbury_ |
presumably you can have each script running on a different port for the webui or is that a separate process? |
14:35
🔗
|
Kazzy |
you can yes, or just disable it when you run the script |
14:39
🔗
|
db48x |
yea, you can ditch the web ui, and run more concurrent downloaders (and uploaders) than the web ui limits you to |
14:39
🔗
|
cadbury_ |
is there much advantage to running more? |
14:39
🔗
|
db48x |
it depends |
14:40
🔗
|
db48x |
some projects are really banhappy |
14:40
🔗
|
db48x |
occasionally we've been able to download so fast that we filled up our staging area |
14:41
🔗
|
db48x |
in both cases we have the tracker apply really strong rate limits |
14:42
🔗
|
db48x |
which means that running more workers won't really get the work done faster, although you might be able to steal a larger slice of the work |
14:43
🔗
|
cadbury_ |
i suppose one advantage would be being able to run 1 worker per project available |
14:44
🔗
|
db48x |
yea, you could do that |
14:45
🔗
|
db48x |
although I think only twitpic and urlteam are currently in progress |
14:45
🔗
|
cadbury_ |
multi-URL team scrapers would probably work without a problem |
14:46
🔗
|
db48x |
yea, urlteam is an interesting case |
14:46
🔗
|
db48x |
with those they're often scraping multiple shorteners at the same time, and you can have one work unit assigned to you for each of them |
14:47
🔗
|
cadbury_ |
i don't have enough spare hardware left over for more VMs |
14:47
🔗
|
db48x |
ooh, looks like they're doing a bunch of shortners right now, so you can go nuts |
14:49
🔗
|
db48x |
you can run the script directly: https://github.com/ArchiveTeam/terroroftinytown-client-grab |
14:50
🔗
|
|
Froggypwn has quit IRC (Read error: Connection reset by peer) |
14:51
🔗
|
cadbury_ |
the amount of code that actually makes that work is surprisingly small |
14:52
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
14:52
🔗
|
|
Froggypwn has joined #archiveteam |
14:55
🔗
|
db48x |
yep |
14:56
🔗
|
db48x |
pipeline.py contains the code that defines what steps are necessary to process a work unit |
14:56
🔗
|
db48x |
it's leaning heavily on Seesaw to provide most of the heavy lifting of running processes and managing concurrency and so on |
14:57
🔗
|
db48x |
the program that actual interrogates the url shortener comes from a different git repository, but it's not very long either |
15:01
🔗
|
db48x |
twitpic is here: https://github.com/ArchiveTeam/twitpic-grab2 |
15:03
🔗
|
db48x |
you can see that it's pipeline is a bit more complex |
15:28
🔗
|
|
Emcy_ has quit IRC (Read error: Connection reset by peer) |
15:34
🔗
|
|
mistym has joined #archiveteam |
15:40
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
15:47
🔗
|
|
khaoohs_ has quit IRC (Read error: Connection reset by peer) |
15:52
🔗
|
|
khaoohs has joined #archiveteam |
16:01
🔗
|
|
fluff_ is now known as fluff |
16:02
🔗
|
|
mistym has joined #archiveteam |
16:03
🔗
|
|
Emcy has joined #archiveteam |
16:14
🔗
|
|
xk_id has joined #archiveteam |
16:17
🔗
|
|
Nemo_bis has joined #archiveteam |
16:20
🔗
|
|
SPF|Cloud has joined #archiveteam |
16:26
🔗
|
|
SPF|Cloud is now known as Southpark |
16:26
🔗
|
|
Southpark is now known as SPF|Cloud |
16:26
🔗
|
Nemo_bis |
The torrent of https://archive.org/details/URLTeamTorrentRelease2013July doesn't include any file |
16:27
🔗
|
Nemo_bis |
SketchCow: can you regenerate the torrent? |
16:27
🔗
|
SketchCow |
I just set it off. |
16:28
🔗
|
SketchCow |
It's been a hell of a emscripten-DOSBOX Bender this week |
16:36
🔗
|
schbirid |
https://news.ycombinator.com/item?id=8767909 |
16:49
🔗
|
|
primus104 has quit IRC (Leaving.) |
16:51
🔗
|
|
aaaaaaaaa has joined #archiveteam |
16:54
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
17:11
🔗
|
|
mistym has joined #archiveteam |
17:29
🔗
|
|
primus104 has joined #archiveteam |
18:22
🔗
|
|
K4k has joined #archiveteam |
18:33
🔗
|
raylee |
SketchCow: youre porting dosbox to the browser? |
19:16
🔗
|
SketchCow |
Someone already has done it, and it's done. |
19:16
🔗
|
SketchCow |
Now I'm just trying to make it work with the archive.org structure, which has some unusual aspects. |
19:39
🔗
|
|
BlueMaxim has joined #archiveteam |
20:17
🔗
|
|
APerti has joined #archiveteam |
20:29
🔗
|
|
Ravenloft has joined #archiveteam |
20:35
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
20:53
🔗
|
|
Start has joined #archiveteam |
21:01
🔗
|
|
mistym has joined #archiveteam |
21:06
🔗
|
|
primus104 has quit IRC (Leaving.) |
21:10
🔗
|
joepie91 |
no data lost, but another Yahoo acquisition apparently |
21:10
🔗
|
joepie91 |
https://peercdn.com/ |
21:10
🔗
|
joepie91 |
PeerCDN Acquired by Yahoo! |
21:13
🔗
|
|
K4k has quit IRC (WeeChat 1.0.1) |
21:22
🔗
|
|
Start has quit IRC (Ping timeout: 365 seconds) |
21:23
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
21:27
🔗
|
|
Start has joined #archiveteam |
21:29
🔗
|
deathy |
that's since quite a few months actually |
21:32
🔗
|
deathy |
one of the founders started webtorrent I think right after. BitTorrent just using a browser |
21:40
🔗
|
|
Start has quit IRC (Quit: Leaving) |
21:40
🔗
|
godane |
i'm starting to upload more funny or die videos |
21:42
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
21:46
🔗
|
|
mistym has joined #archiveteam |
21:52
🔗
|
|
schbirid has joined #archiveteam |
21:53
🔗
|
|
sankin has quit IRC (Leaving.) |
21:58
🔗
|
|
primus104 has joined #archiveteam |
22:03
🔗
|
|
ruukasu has joined #archiveteam |
22:54
🔗
|
|
schbirid has quit IRC (Leaving) |
23:11
🔗
|
godane |
so nine to noon show on radionz is about 11gb a year |
23:21
🔗
|
godane |
good news is at this rate i will have the backlog of that show in the archive soon |
23:22
🔗
|
godane |
and then i just have to wait for christmas eve to start downloading the index of 2014 urls for that show |
23:22
🔗
|
godane |
they end on christmas eve and don't start back until jan ~20 |
23:39
🔗
|
|
rejon has joined #archiveteam |
23:43
🔗
|
|
APerti has quit IRC (Ping timeout: 370 seconds) |