Time |
Nickname |
Message |
00:08
🔗
|
|
mistym has joined #archiveteam |
00:09
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
00:45
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:02
🔗
|
|
godane has joined #archiveteam |
01:43
🔗
|
|
Ymgve has quit IRC () |
01:44
🔗
|
|
brayden has quit IRC (Read error: Connection reset by peer) |
01:45
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
01:46
🔗
|
|
mistym has joined #archiveteam |
01:47
🔗
|
|
brayden has joined #archiveteam |
02:04
🔗
|
|
garyrh has quit IRC (hub.se irc.ac.za) |
02:11
🔗
|
|
[BNC]gary has joined #archiveteam |
02:50
🔗
|
|
[BNC]gary is now known as garyrh |
03:02
🔗
|
|
kyan has quit IRC (Quit: Leaving) |
03:34
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
04:29
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:35
🔗
|
|
mistym has joined #archiveteam |
04:49
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
04:51
🔗
|
|
mistym has joined #archiveteam |
05:11
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
05:11
🔗
|
|
mistym has joined #archiveteam |
05:13
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
05:14
🔗
|
|
nertzy has joined #archiveteam |
05:42
🔗
|
|
mistym has joined #archiveteam |
06:47
🔗
|
|
SimpBrain has joined #archiveteam |
07:00
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
07:06
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
07:07
🔗
|
|
primus104 has joined #archiveteam |
07:09
🔗
|
|
dashcloud has joined #archiveteam |
07:38
🔗
|
|
rolfb has joined #archiveteam |
07:38
🔗
|
|
rolfb has quit IRC (Client Quit) |
07:43
🔗
|
|
primus104 has quit IRC (Leaving.) |
07:44
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
07:51
🔗
|
|
atomotic has joined #archiveteam |
08:00
🔗
|
|
mistym has joined #archiveteam |
08:09
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
08:10
🔗
|
|
schbirid has joined #archiveteam |
08:15
🔗
|
|
BlueMaxim has quit IRC (Ping timeout: 512 seconds) |
08:26
🔗
|
|
RichardG_ has joined #archiveteam |
08:28
🔗
|
|
john1 has quit IRC (Read error: Operation timed out) |
08:28
🔗
|
|
swebb has quit IRC (Ping timeout: 255 seconds) |
08:29
🔗
|
|
warthurto has quit IRC (Ping timeout: 255 seconds) |
08:29
🔗
|
|
okeuday has quit IRC (Read error: Operation timed out) |
08:32
🔗
|
|
RichardG has quit IRC (Ping timeout: 483 seconds) |
08:33
🔗
|
|
okeuday has joined #archiveteam |
08:35
🔗
|
|
warthurto has joined #archiveteam |
08:37
🔗
|
|
john1 has joined #archiveteam |
08:40
🔗
|
|
primus104 has joined #archiveteam |
08:58
🔗
|
|
berndj has quit IRC (Excess Flood) |
08:58
🔗
|
|
berndj has joined #archiveteam |
09:31
🔗
|
|
berndj has quit IRC (Remote host closed the connection) |
10:16
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
10:31
🔗
|
|
insane_al has joined #archiveteam |
10:55
🔗
|
|
Ymgve has joined #archiveteam |
11:28
🔗
|
|
Ctrl-S has quit IRC (Read error: Operation timed out) |
11:29
🔗
|
|
Ctrl-S has joined #archiveteam |
11:33
🔗
|
|
primus104 has quit IRC (Leaving.) |
12:04
🔗
|
|
atomotic has joined #archiveteam |
12:07
🔗
|
|
Control-S has joined #archiveteam |
12:12
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
12:15
🔗
|
|
Ctrl-S has quit IRC (Ping timeout: 600 seconds) |
12:15
🔗
|
|
Control-S is now known as Ctrl-S |
12:53
🔗
|
|
sankin has joined #archiveteam |
13:29
🔗
|
|
philpem has joined #archiveteam |
13:43
🔗
|
|
primus104 has joined #archiveteam |
13:52
🔗
|
|
Start has quit IRC (Disconnected.) |
13:55
🔗
|
|
primus104 has quit IRC (Leaving.) |
14:04
🔗
|
|
lrkj_ has joined #archiveteam |
14:06
🔗
|
|
mistym has joined #archiveteam |
14:06
🔗
|
|
lrkj has quit IRC (Ping timeout: 600 seconds) |
14:13
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
14:19
🔗
|
|
RichardG_ has quit IRC (Remote host closed the connection) |
14:22
🔗
|
|
scyther has joined #archiveteam |
14:24
🔗
|
|
RichardG has joined #archiveteam |
14:27
🔗
|
SketchCow |
Badoom |
14:28
🔗
|
SketchCow |
Greetings from the Internet Archive. You all have my fullest attention. |
15:06
🔗
|
|
Start has joined #archiveteam |
15:16
🔗
|
|
insane_al has quit IRC (Ping timeout: 306 seconds) |
15:17
🔗
|
|
signius has quit IRC (Ping timeout: 265 seconds) |
15:22
🔗
|
|
mistym has joined #archiveteam |
15:27
🔗
|
|
Start has quit IRC (Disconnected.) |
15:30
🔗
|
|
signius has joined #archiveteam |
15:37
🔗
|
|
nertzy has joined #archiveteam |
15:41
🔗
|
|
Start has joined #archiveteam |
15:41
🔗
|
|
K4k has joined #archiveteam |
15:47
🔗
|
|
SadDM has quit IRC (Remote host closed the connection) |
15:47
🔗
|
|
SadDM has joined #archiveteam |
15:51
🔗
|
|
Start has quit IRC (Disconnected.) |
15:56
🔗
|
|
aaaaaaaaa has joined #archiveteam |
16:03
🔗
|
|
helothere has joined #archiveteam |
16:03
🔗
|
helothere |
hello there |
16:04
🔗
|
helothere |
just a question, i'm just wondering, if you guys have the bandwidth and space to receive all payloads from volunteer warriors, why not do the scraping yourself? |
16:05
🔗
|
|
primus104 has joined #archiveteam |
16:06
🔗
|
SimpBrain |
more warriors = less problems |
16:06
🔗
|
augusztin |
helothere: banning :) |
16:06
🔗
|
DFJustin |
scraping takes more than just bandwidth and space - cpu and ram for example |
16:06
🔗
|
augusztin |
helothere: if 200 IP's scrape, it is less obvious than when 5 do the same |
16:07
🔗
|
SimpBrain |
easier for a warrior to scrape 1% each person. some sites get ban happy if you browse too many urls |
16:08
🔗
|
augusztin |
meanwhile Kenshin does all the Baraza LOL |
16:08
🔗
|
Nemo_bis |
filippo__: I had an amnesia about the name of your employer (missing a letter) and I found it only thanks to your github page. :P |
16:08
🔗
|
augusztin |
1372GB for Kenshin, 28GB for the 2nd guy :D |
16:08
🔗
|
SimpBrain |
he has a poll of ips |
16:08
🔗
|
SimpBrain |
pool |
16:08
🔗
|
Nemo_bis |
filippo__: They're lucky that they hire famous people saving their online reputation! :D |
16:09
🔗
|
augusztin |
SimpBrain: it still looks like a single man project mostly :P |
16:09
🔗
|
SimpBrain |
true, warrior can run multiple times, you can just tell it that node runs a certain ip |
16:10
🔗
|
SimpBrain |
well the pipeline script |
16:10
🔗
|
augusztin |
and actually scraping takes pretty much minimum bandwidth |
16:11
🔗
|
helothere |
I see. thanks for clarification |
16:11
🔗
|
augusztin |
even downloading furrafinity like right now only takes around 1-1.5MB/s downstream, which is nothing to me (150MBps internet) |
16:11
🔗
|
augusztin |
so only ~10% of my downstream bandwidth is used |
16:12
🔗
|
augusztin |
(and i allowed the warriot to run up to 30Mbps) |
16:18
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
16:31
🔗
|
|
helothere has quit IRC (Quit: Page closed) |
16:38
🔗
|
|
mistym has joined #archiveteam |
16:49
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
16:53
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
17:00
🔗
|
|
atomotic has joined #archiveteam |
17:01
🔗
|
augusztin |
http://www.theverge.com/2015/5/4/8543605/google-plus-collections-announced how long till we are going to try to save this ? :D |
17:01
🔗
|
augusztin |
i give it 2 years, tops :D |
17:03
🔗
|
yipdw |
I bet Pinterest will be bought first and then Google will kill both |
17:03
🔗
|
xmc |
oh dear |
17:03
🔗
|
Infreq |
fair assessment |
17:04
🔗
|
Infreq |
lest yahoo get their hands on it first... hue |
17:04
🔗
|
augusztin |
well, at least google is killing their own projects :) |
17:05
🔗
|
augusztin |
yahoo is buying others, then killing them :D |
17:05
🔗
|
Infreq |
tumblr is still strong but all the policy changes has really changed it |
17:06
🔗
|
|
rejon has quit IRC (Read error: Operation timed out) |
17:10
🔗
|
Nemo_bis |
augusztin: that's because Google doesn't *allow* competing projects to exist in the first place, nothing to buy |
17:10
🔗
|
yipdw |
eh they allow it insofar as they can't keep track of their own shit |
17:11
🔗
|
yipdw |
or some district wins the Google Hunger Games that year |
17:12
🔗
|
yipdw |
wait a minute that'd be fucking awesome |
17:12
🔗
|
yipdw |
#-bs |
17:13
🔗
|
augusztin |
http://www.slate.com/articles/technology/map_of_the_week/2013/03/google_reader_joins_graveyard_of_dead_google_products.html glass graveyard hole was a bit premature it seems |
18:18
🔗
|
|
atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) |
18:18
🔗
|
|
Jonimus has quit IRC (Ping timeout: 370 seconds) |
18:21
🔗
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
18:22
🔗
|
|
khaoohs has joined #archiveteam |
18:23
🔗
|
|
atomotic has joined #archiveteam |
18:30
🔗
|
|
Jonimus has joined #archiveteam |
18:58
🔗
|
|
scyther has joined #archiveteam |
18:58
🔗
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
18:59
🔗
|
|
khaoohs has joined #archiveteam |
19:07
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
19:08
🔗
|
|
primus104 has joined #archiveteam |
19:16
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
19:20
🔗
|
|
primus105 has joined #archiveteam |
19:27
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
19:30
🔗
|
|
mistym has joined #archiveteam |
19:39
🔗
|
|
Start has joined #archiveteam |
19:40
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
19:53
🔗
|
|
SN4T14_ has joined #archiveteam |
19:58
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
19:59
🔗
|
|
SN4T14__ has quit IRC (Ping timeout: 369 seconds) |
20:13
🔗
|
|
mistym has joined #archiveteam |
20:22
🔗
|
|
Start has quit IRC (Disconnected.) |
20:23
🔗
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
20:24
🔗
|
|
khaoohs has joined #archiveteam |
20:27
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:29
🔗
|
|
Start has joined #archiveteam |
20:29
🔗
|
|
twrist has quit IRC (Ping timeout: 240 seconds) |
20:29
🔗
|
|
twrist has joined #archiveteam |
20:32
🔗
|
|
habi has joined #archiveteam |
20:33
🔗
|
|
Start has quit IRC (Client Quit) |
20:33
🔗
|
|
habi has left |
20:40
🔗
|
|
RedType has quit IRC (Quit: leaving) |
20:44
🔗
|
|
habi has joined #archiveteam |
20:45
🔗
|
|
scyther has quit IRC (Leaving) |
20:46
🔗
|
|
RedType has joined #archiveteam |
20:49
🔗
|
|
habi has left |
20:57
🔗
|
|
nertzy has joined #archiveteam |
20:59
🔗
|
|
K4k has quit IRC (Quit: WeeChat 1.0.1) |
21:02
🔗
|
|
sankin has quit IRC (Leaving.) |
21:03
🔗
|
|
BlueMaxim has joined #archiveteam |
21:11
🔗
|
|
Start has joined #archiveteam |
21:11
🔗
|
|
Start has quit IRC (Client Quit) |
21:18
🔗
|
|
SimpBrain has quit IRC (Quit: Leaving) |
21:24
🔗
|
|
skiy has joined #archiveteam |
21:40
🔗
|
|
joepie91_ is now known as joepie91c |
21:40
🔗
|
|
joepie91c is now known as joepie91_ |
22:25
🔗
|
|
skiy_ has joined #archiveteam |
22:32
🔗
|
|
skiy has quit IRC (Read error: Operation timed out) |
22:52
🔗
|
|
skiy_ has quit IRC (Read error: Connection reset by peer) |
22:55
🔗
|
|
za3k has joined #archiveteam |
22:56
🔗
|
za3k |
hbrowse.com gives the go-ahead to be archived (http://www.hbrowse.com/forum/index.php?topic=3885.0). I turn out not to have space for it, but if someone else wants to archive it, plain wget should work, and I think it's not so huge. |
22:58
🔗
|
za3k |
I have a copy of ArXiV which appears to be complete but has consistent, mismatching checksums from the official manifest. If anyone wants to help me diagnose why I'd appreciate it; if anyone wants a copy of what I have, contact me. |
22:59
🔗
|
za3k |
Someone said I should post this here: https://github.com/mispy/twitter_ebooks. It has an 'archive' subcommand which grabs a twitter feed, archiving raw JSON as far back as the API allows. It handles updates in a bandwidth-efficient way. |
23:02
🔗
|
za3k |
I didn't write twitter_ebooks. I have a short how-to for transforming the JSON to plaintext here: https://blog.za3k.com/archiving-twitter/ |
23:03
🔗
|
|
za3k has quit IRC (Quit: Page closed) |
23:16
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
23:23
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
23:27
🔗
|
|
slash` has joined #archiveteam |
23:30
🔗
|
|
robv has joined #archiveteam |
23:35
🔗
|
|
robv has quit IRC (Quit: [LINE TERMINATED]) |
23:56
🔗
|
|
Start has joined #archiveteam |
23:56
🔗
|
|
Start has quit IRC (Remote host closed the connection) |
23:57
🔗
|
|
Start has joined #archiveteam |