Time |
Nickname |
Message |
00:03
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
00:09
🔗
|
|
Start has joined #archiveteam-bs |
00:53
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
01:29
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
02:09
🔗
|
|
vitzli has joined #archiveteam-bs |
02:10
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
02:16
🔗
|
godane |
http://www.wsj.com/articles/the-librarian-who-saved-timbuktus-cultural-treasures-from-al-qaeda-1460729998 |
02:29
🔗
|
Yoshimura |
terrorists are governments. |
02:30
🔗
|
Yoshimura |
Those are dumbfcks, destroying what they can. |
02:32
🔗
|
godane |
Yoshimura: btw i'm always on the roll: https://archive.org/details/@chris85 |
02:36
🔗
|
godane |
SketchCow: some more info on SNET in cuba: https://cachivachemedia.com/de-la-comunidad-del-anillo-a-snet-las-redes-en-la-tierra-media-fe44b4a62319#.hff3ls3la |
02:36
🔗
|
godane |
that website is being saved in archivebot |
02:36
🔗
|
godane |
i also saved the 4 podcasts they had so far to archivebot |
02:42
🔗
|
Yoshimura |
godane: While English sounds goood, this makes interesting chimes as the day changes. |
02:42
🔗
|
Yoshimura |
http://listen.hatnote.com/#no,uk,fr,sv,he,as,pa,ml,or,pl,sr,fi,de,ja,nl,ar,id,hi,te,mr,sa,mk,bg,ru,es,it,fa,zh,bn,ta,kn,gu,be,hu |
02:48
🔗
|
|
bwn has joined #archiveteam-bs |
02:58
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
03:55
🔗
|
JesseW |
thanks for the reminder of listen.hatnote |
03:58
🔗
|
* |
Yoshimura cheers. |
03:59
🔗
|
Yoshimura |
Btw, I did some re-evaluation on the Danish social site, and it seems there is much less data. A lot deleted photos and stuff. |
04:00
🔗
|
Yoshimura |
But still a lot of links to check, or to go through or just regular crawl with smart exctractor, but there is about one month remaining, I am currently running a fetch on the newest 300k(minus holes) images. Just to get their links. They have throttling for abusive bots, but have very very fast servers. |
04:01
🔗
|
Yoshimura |
So warrior should work really great in this case, but someone maybe said it would not be warrior. |
04:02
🔗
|
Yoshimura |
The regular throttling of archivebot would work, but not enough time to get millions of links and stuff. |
04:03
🔗
|
JesseW |
thank you for looking into it |
04:04
🔗
|
Yoshimura |
Aww. You so kind. |
04:05
🔗
|
JesseW |
eh, it's work that needs doing. |
04:09
🔗
|
|
Zebranky has quit IRC (Read error: Operation timed out) |
04:10
🔗
|
|
Kaz has quit IRC (Read error: Operation timed out) |
04:10
🔗
|
|
sigkell has quit IRC (Ping timeout: 260 seconds) |
04:10
🔗
|
Yoshimura |
JesseW: Yeah, work too, I have no idea how can I help more though. Related on main channel. ... Btw, running !ao only pipeline seems as best start for me and the pipes it seems. Would like to. Who can provide me with credentials for ssh to the tracker? |
04:10
🔗
|
|
sigkell has joined #archiveteam-bs |
04:11
🔗
|
|
bauruine has quit IRC (Ping timeout: 260 seconds) |
04:11
🔗
|
Yoshimura |
(It does not say what it is or what's it for, the readme, but apparent) |
04:13
🔗
|
|
Kazzy has quit IRC (Ping timeout: 260 seconds) |
04:13
🔗
|
Yoshimura |
Got machine next to 3x10Gbit backbone, the machine itself 100Mbit, so it is sad to watch it do 1Mbit/s xD |
04:17
🔗
|
JesseW |
heh; one thing you can do independently is run wpull jobs yourself, and upload them to IA. They won't (currently) go into the Wayback Machine, but it's a good way to grab stuff that needs saving. |
04:17
🔗
|
Yoshimura |
Yeah, but sounds dumb. |
04:17
🔗
|
JesseW |
clarify? |
04:17
🔗
|
Yoshimura |
Well, I can run a node, I do not fear anything running AO only pipeline. |
04:18
🔗
|
Yoshimura |
With regular I would fear running out of space. |
04:18
🔗
|
|
Kazzy has joined #archiveteam-bs |
04:19
🔗
|
JesseW |
ah, I was sugesting running *smaller*, manually selected wpull jobs |
04:19
🔗
|
Yoshimura |
So having ao only node to learn plus contribute seems like fine start. And yes, I can run crawls, I would maybe modify wpull though. I for example do not understand why wget saves crap temp file just to delete it. |
04:19
🔗
|
Yoshimura |
-O ... .tmp, instead -O - > /dev/null |
04:19
🔗
|
Yoshimura |
Did not figure out if its used for anything ever. And not sure if wpull does the same stupid thing or not. |
04:20
🔗
|
* |
JesseW is not familiar with that detail |
04:20
🔗
|
|
JesseW has left |
04:20
🔗
|
Yoshimura |
There is --delete-after, but sounds like that removes it after whole crawl, not each file. |
04:21
🔗
|
yipdw_ |
wget's main use case is saving responses in files; the --output-document option isn't that relevant in WARC mode, but data is still cached there for recording |
04:21
🔗
|
yipdw_ |
wpull also generates temporary request and response buffers and you'll have to ask chfoo about that one |
04:22
🔗
|
yipdw_ |
for WARC, you need an entire response before you can write the record, anyway |
04:22
🔗
|
yipdw_ |
so the buffering strategy is not that stupid |
04:22
🔗
|
Yoshimura |
yipdw_: Yeah, I know its not relevant in wget, that is why I wondered why it does still save it. |
04:23
🔗
|
yipdw_ |
it's an implementation detail of the WARC writer |
04:23
🔗
|
Yoshimura |
Buffering the single response yeah, that is fine, but whole crawl (several GB) does not sound right. |
04:23
🔗
|
yipdw_ |
--truncate-output may be useful |
04:23
🔗
|
Yoshimura |
Have to look more in depth how large files the warrior saves. |
04:24
🔗
|
yipdw_ |
depends on the project |
04:24
🔗
|
Yoshimura |
But it is tiring.. |
04:24
🔗
|
Yoshimura |
I meant the temporary files, not the warcs. |
04:24
🔗
|
Yoshimura |
If its single request or all of them. |
04:25
🔗
|
Yoshimura |
yipdw_: You can grant me user account for archive bot, btw? |
04:25
🔗
|
Yoshimura |
Or someone else is it? |
04:25
🔗
|
yipdw_ |
pipelines connect via SSH keys |
04:26
🔗
|
Yoshimura |
Yeah, that is what I meant. |
04:26
🔗
|
yipdw_ |
they're really just read/writing a Redis databas |
04:26
🔗
|
yipdw_ |
e |
04:26
🔗
|
yipdw_ |
and yes I have access to add those |
04:26
🔗
|
Yoshimura |
I realize its tunnel to redis : |
04:26
🔗
|
|
bauruine has joined #archiveteam-bs |
04:26
🔗
|
Yoshimura |
Would you do that please? Would like to start soon, but small running !ao only pipeline. |
04:27
🔗
|
yipdw_ |
I can do it once I have your public key |
04:27
🔗
|
Yoshimura |
Or if there is other way to first test, I can, but followed the install.md. Wonderful, how should I share it? |
04:27
🔗
|
|
Zebranky has joined #archiveteam-bs |
04:27
🔗
|
yipdw_ |
anything is fine |
04:28
🔗
|
yipdw_ |
you can test a pipeline without registering with the #archivebot instance |
04:28
🔗
|
yipdw_ |
unfortunately that requires backend setup, which I never simplified or automated |
04:29
🔗
|
yipdw_ |
move this to #archivebot |
04:34
🔗
|
|
Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) |
04:35
🔗
|
|
Kaz has joined #archiveteam-bs |
04:43
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
04:48
🔗
|
|
Yoshimura has joined #archiveteam-bs |
04:52
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:59
🔗
|
|
Sk1d has joined #archiveteam-bs |
05:27
🔗
|
|
JesseW has joined #archiveteam-bs |
05:37
🔗
|
JesseW |
Arghghgh -- finished practicing "er-" in Dutch... |
05:42
🔗
|
godane |
we are finally at 20k videos with funny or die archive |
05:45
🔗
|
|
vitzli has joined #archiveteam-bs |
05:49
🔗
|
|
bwn_ has joined #archiveteam-bs |
05:49
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
05:49
🔗
|
DFJustin |
the clown is everywhere https://i.imgur.com/GTP7Z9G.jpg |
06:17
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:34
🔗
|
|
bwn_ has quit IRC (Read error: Connection reset by peer) |
06:58
🔗
|
|
Medowar has joined #archiveteam-bs |
07:01
🔗
|
|
bwn_ has joined #archiveteam-bs |
07:04
🔗
|
|
schbirid has joined #archiveteam-bs |
07:18
🔗
|
|
metalcamp has joined #archiveteam-bs |
07:24
🔗
|
|
logchfoo2 starts logging #archiveteam-bs at Wed Apr 20 07:24:11 2016 |
07:24
🔗
|
|
logchfoo2 has joined #archiveteam-bs |
08:56
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
09:42
🔗
|
SketchCow |
Grabbing every single thing out of http://www.mixtapetorrent.com/ |
09:42
🔗
|
SketchCow |
What could go wrong. |
09:44
🔗
|
schbirid |
yo bytes might go down da drizzle |
09:48
🔗
|
SketchCow |
My torrent ripper is definitely going to have a field day. |
09:48
🔗
|
SketchCow |
There's 1,123 pages, each with 3-5 torrents on them |
10:39
🔗
|
|
brayden has joined #archiveteam-bs |
10:39
🔗
|
|
swebb sets mode: +o brayden |
10:43
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
10:57
🔗
|
|
VADemon has joined #archiveteam-bs |
11:08
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
11:23
🔗
|
|
RichardG has joined #archiveteam-bs |
11:23
🔗
|
godane |
SketchCow: you may want to move this one to one of the gaming collection instead of having it in Ephemeral VHS : https://archive.org/details/Bethesda_2015_E3_Showcase |
11:27
🔗
|
godane |
i'm starting to upload DTIC Archive stuff again: https://archive.org/details/DTIC_ADA036301 |
12:39
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
12:39
🔗
|
|
RichardG has joined #archiveteam-bs |
13:17
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:29
🔗
|
|
ErkDog has quit IRC (Remote host closed the connection) |
13:30
🔗
|
|
ErkDog has joined #archiveteam-bs |
13:37
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
13:38
🔗
|
|
beardicus has joined #archiveteam-bs |
14:02
🔗
|
|
pwnsrv has joined #archiveteam-bs |
14:04
🔗
|
|
pwnsrv_ has quit IRC (Ping timeout: 250 seconds) |
14:12
🔗
|
|
ErkDog has quit IRC (Remote host closed the connection) |
14:13
🔗
|
|
ErkDog has joined #archiveteam-bs |
14:43
🔗
|
|
Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) |
14:44
🔗
|
|
Yoshimura has joined #archiveteam-bs |
14:45
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:46
🔗
|
|
Yoshimura has quit IRC (Client Quit) |
14:48
🔗
|
|
Yoshimura has joined #archiveteam-bs |
14:50
🔗
|
|
JesseW has joined #archiveteam-bs |
15:15
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
15:38
🔗
|
|
Honno has joined #archiveteam-bs |
15:45
🔗
|
|
Start has joined #archiveteam-bs |
16:06
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:51
🔗
|
|
atrocity has quit IRC () |
17:03
🔗
|
|
Start has joined #archiveteam-bs |
17:27
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:20
🔗
|
|
BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) |
18:21
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
18:24
🔗
|
|
BnA-Rob1n has joined #archiveteam-bs |
18:34
🔗
|
|
BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) |
18:36
🔗
|
|
BnA-Rob1n has joined #archiveteam-bs |
18:44
🔗
|
|
useretail has quit IRC (Ping timeout: 244 seconds) |
18:45
🔗
|
|
ring has quit IRC (Read error: Operation timed out) |
18:45
🔗
|
|
ring has joined #archiveteam-bs |
18:52
🔗
|
|
useretail has joined #archiveteam-bs |
18:56
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
19:02
🔗
|
|
metalcamp has joined #archiveteam-bs |
19:13
🔗
|
|
BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) |
19:15
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
19:17
🔗
|
|
bwn_ has joined #archiveteam-bs |
19:34
🔗
|
|
BnA-Rob1n has joined #archiveteam-bs |
20:08
🔗
|
|
Medowar has quit IRC (Quit: Connection closed for inactivity) |
20:30
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
20:31
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
20:34
🔗
|
|
VADemon has quit IRC (left4dead) |
20:36
🔗
|
|
chfoo has quit IRC (Read error: Operation timed out) |
20:43
🔗
|
|
chfoo has joined #archiveteam-bs |
20:46
🔗
|
|
Medowar has joined #archiveteam-bs |
20:53
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
20:55
🔗
|
|
ErkDog has quit IRC (Quit: ECAN Solutions) |
20:58
🔗
|
|
ErkDog has joined #archiveteam-bs |
21:18
🔗
|
|
Stiletto has joined #archiveteam-bs |
21:24
🔗
|
|
Start has joined #archiveteam-bs |
21:24
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:36
🔗
|
|
atrocity has joined #archiveteam-bs |
21:51
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
22:08
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
22:15
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
22:26
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
22:53
🔗
|
|
RichardG has quit IRC (Ping timeout: 260 seconds) |
23:08
🔗
|
|
Medowar has quit IRC (Quit: Connection closed for inactivity) |
23:11
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 261 seconds) |
23:30
🔗
|
atrocity |
wednesday tom...what??? |
23:30
🔗
|
atrocity |
today is wednesday? wtf! |
23:30
🔗
|
atrocity |
i was working on my weekly wednesday youtube video (lootcrate!) and just realized that's today |
23:32
🔗
|
xmc |
oooops |
23:33
🔗
|
atrocity |
yeah, i'm seriously missing a day in my head from the last week. at least i skipped a dya of work, lol |
23:43
🔗
|
SketchCow |
17,000 torrents |
23:45
🔗
|
atrocity |
do trackers even allow you to connect to that many at once? lol |
23:45
🔗
|
Frogging |
yes |
23:47
🔗
|
atrocity |
would a consumer router allow you to have that many routes to track? lol |
23:49
🔗
|
Yoshimura |
atrocity: Tracker connection is not a prob, and number of torrents do not affect that. Unless you run all at the same time, of course. Good consumser grade do 4k since 2005/8, today likely much more with or without custom firmware and with enough power. |
23:53
🔗
|
atrocity |
i had to kill yuku. it was in an endless loop for the past few hours and hit over 100k files on 2 sessions |
23:53
🔗
|
Yoshimura |
few? |
23:53
🔗
|
Yoshimura |
Mine did loop two days. |
23:54
🔗
|
atrocity |
yeah, i've been waiting to reboot, and just checking in over and over hoping it would get it |
23:54
🔗
|
atrocity |
didn't, so now i'm waiting on this 1GB video to upload |
23:55
🔗
|
Yoshimura |
it will take up space, until you remove manually or reinit disk btw. |
23:56
🔗
|
Yoshimura |
Python syntax is breaking my mind. *puke* |
23:57
🔗
|
atrocity |
eh, that's not a problem, but i have to reboot, lol! |
23:57
🔗
|
atrocity |
brb |
23:57
🔗
|
|
atrocity has quit IRC () |