Time |
Nickname |
Message |
00:05
🔗
|
|
qw3rty113 has joined #archiveteam-bs |
00:07
🔗
|
|
icedice has joined #archiveteam-bs |
00:09
🔗
|
|
icedice has quit IRC (Client Quit) |
00:11
🔗
|
|
qw3rty112 has quit IRC (Ping timeout: 600 seconds) |
01:37
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
01:43
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
01:44
🔗
|
|
Jens has joined #archiveteam-bs |
02:02
🔗
|
|
antomatic has quit IRC (Read error: Connection reset by peer) |
02:03
🔗
|
|
antomatic has joined #archiveteam-bs |
02:03
🔗
|
|
swebb sets mode: +o antomatic |
02:24
🔗
|
|
VerifiedJ has left |
02:28
🔗
|
|
zhongfu has quit IRC (Ping timeout: 260 seconds) |
02:31
🔗
|
|
zhongfu has joined #archiveteam-bs |
02:57
🔗
|
|
ld1 has quit IRC (Quit: ld1) |
03:01
🔗
|
|
ld1 has joined #archiveteam-bs |
03:39
🔗
|
|
Smiley has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
BnAboyZ has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
kisspunch has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Zebranky has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
MrRadar2 has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
BnARobin has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
jtn2 has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Tenebrae has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Fusl has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
hook54321 has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
ez has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Polylith has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Sk1d has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Boppen has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
nyany has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Kagee has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
altlabel has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Xibalba has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
klondike has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
antomatic has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
robogoat has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
SN4T14 has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Lord_Nigh has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Rai-chan has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Aoede has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
Gfy has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
svchfoo1 has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
tsr has quit IRC (ircd.choopa.net se.hub) |
03:39
🔗
|
|
nightpool has quit IRC (ircd.choopa.net se.hub) |
03:52
🔗
|
godane |
SketchCow: i'm doing a update of mic.com grabs |
03:52
🔗
|
godane |
i need to grab the 110001 to 180000 at least |
03:53
🔗
|
godane |
current number is 187844 based on the front page |
03:53
🔗
|
godane |
that article number is from 3 hours ago |
04:14
🔗
|
|
antomatic has joined #archiveteam-bs |
04:14
🔗
|
|
robogoat has joined #archiveteam-bs |
04:14
🔗
|
|
nyany has joined #archiveteam-bs |
04:14
🔗
|
|
klondike has joined #archiveteam-bs |
04:14
🔗
|
|
SN4T14 has joined #archiveteam-bs |
04:14
🔗
|
|
Kagee has joined #archiveteam-bs |
04:14
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
04:14
🔗
|
|
altlabel has joined #archiveteam-bs |
04:14
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:14
🔗
|
|
Rai-chan has joined #archiveteam-bs |
04:14
🔗
|
|
Aoede has joined #archiveteam-bs |
04:14
🔗
|
|
Gfy has joined #archiveteam-bs |
04:14
🔗
|
|
Xibalba has joined #archiveteam-bs |
04:14
🔗
|
|
svchfoo1 has joined #archiveteam-bs |
04:14
🔗
|
|
Boppen has joined #archiveteam-bs |
04:14
🔗
|
|
tsr has joined #archiveteam-bs |
04:14
🔗
|
|
nightpool has joined #archiveteam-bs |
04:14
🔗
|
|
se.hub sets mode: +oo antomatic svchfoo1 |
04:14
🔗
|
|
swebb sets mode: +o antomatic |
04:15
🔗
|
|
Smiley has joined #archiveteam-bs |
04:15
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
04:15
🔗
|
|
kisspunch has joined #archiveteam-bs |
04:15
🔗
|
|
Zebranky has joined #archiveteam-bs |
04:15
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
04:15
🔗
|
|
BnARobin has joined #archiveteam-bs |
04:15
🔗
|
|
jtn2 has joined #archiveteam-bs |
04:15
🔗
|
|
Tenebrae has joined #archiveteam-bs |
04:15
🔗
|
|
Fusl has joined #archiveteam-bs |
04:15
🔗
|
|
hook54321 has joined #archiveteam-bs |
04:15
🔗
|
|
ez has joined #archiveteam-bs |
04:15
🔗
|
|
Polylith has joined #archiveteam-bs |
04:21
🔗
|
|
qw3rty114 has joined #archiveteam-bs |
04:21
🔗
|
|
ndiddy has quit IRC () |
04:25
🔗
|
|
qw3rty113 has quit IRC (Read error: Operation timed out) |
05:24
🔗
|
godane |
i'm at 10,450 items this month now |
06:04
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 250 seconds) |
06:09
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
06:23
🔗
|
riking |
so I was doing some archving stuff and realized I needed to archive some mspfa.com content. |
06:23
🔗
|
riking |
the funny thing is, all the images are hosted extermally. |
06:27
🔗
|
|
mabynogy has joined #archiveteam-bs |
06:28
🔗
|
riking |
so you have to parse through the json file for all the urls |
06:28
🔗
|
riking |
and i thought to myself, "this should really be automated." |
06:29
🔗
|
riking |
so then I came here to ask about what problems I should expect |
06:48
🔗
|
|
Boppen has quit IRC (Ping timeout: 186 seconds) |
06:51
🔗
|
|
Sk1d has quit IRC (Ping timeout: 186 seconds) |
06:51
🔗
|
|
Boppen has joined #archiveteam-bs |
06:51
🔗
|
|
Sk1d has joined #archiveteam-bs |
07:03
🔗
|
|
Boppen has quit IRC (Ping timeout: 186 seconds) |
07:04
🔗
|
|
Boppen has joined #archiveteam-bs |
07:11
🔗
|
|
Boppen has quit IRC (Read error: Connection reset by peer) |
07:11
🔗
|
|
Boppen has joined #archiveteam-bs |
07:36
🔗
|
|
ld1 has quit IRC (Ping timeout: 260 seconds) |
07:50
🔗
|
|
ld1 has joined #archiveteam-bs |
08:14
🔗
|
|
schbirid has joined #archiveteam-bs |
09:28
🔗
|
|
mabynogy has quit IRC (Quit: dpt.slasheva.com) |
09:30
🔗
|
jrwr |
riking: make you srcape look like legit traffic and try to save in the WARC format |
09:40
🔗
|
riking |
I did notice that wget a list of image URLs ran ridiculously fast.. |
09:41
🔗
|
|
BlueMax has quit IRC (Leaving) |
10:13
🔗
|
|
schbirid has quit IRC (Ping timeout: 260 seconds) |
10:22
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
11:21
🔗
|
|
schbirid has joined #archiveteam-bs |
11:59
🔗
|
|
icedice has joined #archiveteam-bs |
12:36
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
12:41
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
12:41
🔗
|
|
REiN^ has quit IRC (Read error: Operation timed out) |
12:41
🔗
|
|
chfoo has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
|
twigfoot has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
|
ivan has joined #archiveteam-bs |
12:42
🔗
|
|
Odd0002 has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
|
twigfoot has joined #archiveteam-bs |
12:42
🔗
|
|
JAA has quit IRC (Read error: Operation timed out) |
12:42
🔗
|
|
RKenshin has joined #archiveteam-bs |
12:43
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
rsznik has joined #archiveteam-bs |
12:43
🔗
|
|
squires has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
bsmith093 has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
sep332_ has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
unlobito has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
w0rp has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
|
Dimtree has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
will has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
rolfoid has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
JAA has joined #archiveteam-bs |
12:44
🔗
|
|
swebb sets mode: +o JAA |
12:44
🔗
|
|
Kenshin has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
RKenshin is now known as Kenshin |
12:44
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
12:44
🔗
|
|
C4K3 has quit IRC (Read error: Operation timed out) |
12:45
🔗
|
|
PotcFdk has quit IRC (Read error: Operation timed out) |
12:46
🔗
|
|
rsznick has quit IRC (Read error: Operation timed out) |
12:46
🔗
|
|
PoorHomie has quit IRC (Read error: Operation timed out) |
12:46
🔗
|
|
Jusque has quit IRC (Read error: Operation timed out) |
12:46
🔗
|
|
qw3rty114 has quit IRC (Read error: Operation timed out) |
12:47
🔗
|
|
robink has quit IRC (Read error: Operation timed out) |
12:47
🔗
|
|
will has joined #archiveteam-bs |
12:48
🔗
|
|
Odd0002 has joined #archiveteam-bs |
12:49
🔗
|
|
chfoo has joined #archiveteam-bs |
12:50
🔗
|
|
unlobito has joined #archiveteam-bs |
12:50
🔗
|
|
svchfoo1 sets mode: +o chfoo |
12:51
🔗
|
|
robink has joined #archiveteam-bs |
12:51
🔗
|
|
w0rp has joined #archiveteam-bs |
12:51
🔗
|
|
balrog_ has joined #archiveteam-bs |
12:51
🔗
|
|
swebb sets mode: +o balrog_ |
12:52
🔗
|
|
Jusque has joined #archiveteam-bs |
12:53
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
12:53
🔗
|
|
balrog_ is now known as balrog |
12:57
🔗
|
|
bsmith093 has joined #archiveteam-bs |
13:17
🔗
|
|
qw3rty114 has joined #archiveteam-bs |
13:17
🔗
|
|
beardicus has joined #archiveteam-bs |
13:21
🔗
|
|
C4K3 has joined #archiveteam-bs |
13:22
🔗
|
|
REiN^ has joined #archiveteam-bs |
13:22
🔗
|
|
rolfoid has joined #archiveteam-bs |
13:22
🔗
|
|
PoorHomie has joined #archiveteam-bs |
13:23
🔗
|
|
mabynogy has joined #archiveteam-bs |
13:27
🔗
|
|
squires has joined #archiveteam-bs |
13:28
🔗
|
|
bwn has joined #archiveteam-bs |
13:34
🔗
|
|
Mayonaise has joined #archiveteam-bs |
13:38
🔗
|
|
Dimtree has joined #archiveteam-bs |
13:44
🔗
|
|
PotcFdk has joined #archiveteam-bs |
13:53
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
15:57
🔗
|
|
RichardG has quit IRC (Ping timeout: 252 seconds) |
16:08
🔗
|
|
RichardG has joined #archiveteam-bs |
16:14
🔗
|
|
octothorp has quit IRC (Ping timeout: 252 seconds) |
16:22
🔗
|
|
octothorp has joined #archiveteam-bs |
16:31
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 255 seconds) |
16:31
🔗
|
|
Mateon1 has joined #archiveteam-bs |
16:56
🔗
|
|
rsznick has joined #archiveteam-bs |
16:59
🔗
|
|
rsznik has quit IRC (Read error: Operation timed out) |
17:17
🔗
|
|
schbirid has quit IRC (Leaving) |
17:18
🔗
|
|
schbirid has joined #archiveteam-bs |
17:30
🔗
|
|
c4rc4s has quit IRC (Quit: words) |
17:39
🔗
|
|
icedice has joined #archiveteam-bs |
17:39
🔗
|
|
icedice has quit IRC (Client Quit) |
17:45
🔗
|
|
c4rc4s has joined #archiveteam-bs |
17:48
🔗
|
|
jschwart has joined #archiveteam-bs |
18:06
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
18:06
🔗
|
|
Jens has joined #archiveteam-bs |
18:48
🔗
|
|
ola_norsk has joined #archiveteam-bs |
18:48
🔗
|
|
sep332_ has joined #archiveteam-bs |
18:58
🔗
|
|
Stilett0 has joined #archiveteam-bs |
19:24
🔗
|
ola_norsk |
is it just me or does using torrent to upload rather huge items work better than using IA tool or web interface? |
19:25
🔗
|
|
DedSec has quit IRC (Ping timeout: 260 seconds) |
19:27
🔗
|
ola_norsk |
(having less chance of item "breaking", i mean) |
19:27
🔗
|
ola_norsk |
example https://archive.org/details/2017-Phone_Losers_of_America_PLA_Media_Pack/ |
19:28
🔗
|
Smiley |
depends how you define huge, but maybe |
19:28
🔗
|
ola_norsk |
~140Gb |
19:28
🔗
|
|
DedSec has joined #archiveteam-bs |
19:31
🔗
|
ola_norsk |
i'm guessing files number is ~3000+ |
19:32
🔗
|
Smiley |
well I know web uploading over 50Gb is advised against. |
19:35
🔗
|
ola_norsk |
aye. But with torrent, perhaps it gives IA ability to start/stop and prioritize at will, without it relying on a user/browser being kept "alive" on the other end?...i don't know. |
19:38
🔗
|
ola_norsk |
i just notice that item is doing well, while some others (much smaller) i've uploaded using web interface got broken (example https://archive.org/details/2813_d64_C64_roms_wwwC64com ) |
19:49
🔗
|
ola_norsk |
Not to mention, i guess there's also the benefit of if someone had already uploaded that same torrent in the past (or future?), the same data wouldn't be needing to transfered a second time(?) |
19:52
🔗
|
ola_norsk |
E.g since the torrent hash is the same, IA's Transmission would only process the torrent that already exist |
19:53
🔗
|
|
ola_norsk has quit IRC (Torrents..The future is naowww!) |
19:55
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
20:00
🔗
|
|
Ravenloft has joined #archiveteam-bs |
20:01
🔗
|
|
RichardG has joined #archiveteam-bs |
20:08
🔗
|
riking |
jrwr: for things I'm planning on doing incremental archives of, how should I tell wget to not save files we already have? |
20:08
🔗
|
riking |
If I use the same WARC filename, it'll delete the old WARC. |
20:09
🔗
|
riking |
If I run a --mirror twice in the same directory, it creates .1 .2. 3 files |
20:10
🔗
|
riking |
ah sorry for ping that question's for anyone. |
20:11
🔗
|
riking |
actually wait, was I really running it with --mirror |
20:20
🔗
|
riking |
but anyways, how should I handle incremental WARCs? take a new one and merge them afterwards? |
20:24
🔗
|
riking |
Okay I wasn't running with --mirror that was my problem. |
20:25
🔗
|
riking |
still curious about the WARC thing. just create a new one every time? |
20:25
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
20:29
🔗
|
JAA |
riking: wpull has --warc-append which solves that issue, but I think that doesn't exist in wget. |
20:29
🔗
|
JAA |
If you want to use wpull, make sure to use version 1.2.3. The 2.0.x versions are broken. |
20:32
🔗
|
riking |
oh hey, special handling for youtube links. that was also on my list |
20:40
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:46
🔗
|
Jens |
Is anyone elses wpull using 100% CPU time? |
20:51
🔗
|
|
BlueMax has joined #archiveteam-bs |
20:59
🔗
|
riking |
uh oh. AttributeError: 'module' object has no attribute 'A' |
21:04
🔗
|
riking |
ERROR [Errno 36] File name too long: '2302/files/px.srvcs.tumblr.com/impixu?T=1518123827&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDpcL1wvdmFzdGVycm9yLnR1bWJsci5jb21cLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiXC8iLCJub3NjcmlwdCI6MX0=&U=GBL1eab3833' |
21:06
🔗
|
Jens |
wpull also eats 100% in my test vm. |
21:06
🔗
|
Jens |
Running newsgrabber. |
21:07
🔗
|
|
icedice has joined #archiveteam-bs |
21:07
🔗
|
riking |
Ooooo i'm running this in an ecryptfs |
21:18
🔗
|
|
mabynogy has quit IRC (Quit: dpt.slasheva.com) |
21:24
🔗
|
JAA |
Jens: Yes, wpull is frequently using 100% CPU here as well. |
21:24
🔗
|
Jens |
Bit tedious on my 1 CPU VM :/ |
21:24
🔗
|
JAA |
Switching the HTML parser tends to help, but not always. |
21:24
🔗
|
JAA |
(Don't do that though. Never run modified project code.) |
21:25
🔗
|
Jens |
Newsgrabber uses some precompiled wpull executable, so it's impossible to tinker with. |
21:25
🔗
|
JAA |
The HTML parser is controlled through an option. |
21:26
🔗
|
JAA |
But I don't know if the warrior has lxml installed, so... |
21:26
🔗
|
Jens |
Haven't used the warrior in ages. |
21:26
🔗
|
JAA |
Also, while the lxml parser is faster than the default html5lib, it's not as resistant and might misparse in some edge cases. |
21:28
🔗
|
|
BlueMax has quit IRC (Leaving) |
21:30
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
21:31
🔗
|
|
RichardG has joined #archiveteam-bs |
21:48
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
22:03
🔗
|
|
Stilett0 has quit IRC () |
22:10
🔗
|
riking |
Hah, youtube-dl runs so much faster limited to 3MB/sec |
22:11
🔗
|
riking |
actually.. question; what processing does archive.org do on video files? |
22:11
🔗
|
riking |
should I even bother trying to download multiple video qualities? |
22:21
🔗
|
astrid |
nah just download the highest quality |
22:22
🔗
|
astrid |
IA will downscale it to various bitrates as necessary |
22:23
🔗
|
Igloo |
I really fancy a five guys right now |
22:23
🔗
|
Igloo |
Just, a big tasty burger, with cheese |
22:23
🔗
|
Igloo |
and *ALL* the toppings |
22:23
🔗
|
astrid |
-> #-ot |
22:23
🔗
|
Igloo |
Wrong channel :x |
23:05
🔗
|
|
ranavalon has quit IRC (Quit: Leaving) |
23:16
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
23:31
🔗
|
|
VerifiedJ has left |