#archiveteam-bs 2018-02-08,Thu

↑back Search

Time Nickname Message
00:05 🔗 qw3rty113 has joined #archiveteam-bs
00:07 🔗 icedice has joined #archiveteam-bs
00:09 🔗 icedice has quit IRC (Client Quit)
00:11 🔗 qw3rty112 has quit IRC (Ping timeout: 600 seconds)
01:37 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
01:43 🔗 Jens has quit IRC (Remote host closed the connection)
01:44 🔗 Jens has joined #archiveteam-bs
02:02 🔗 antomatic has quit IRC (Read error: Connection reset by peer)
02:03 🔗 antomatic has joined #archiveteam-bs
02:03 🔗 swebb sets mode: +o antomatic
02:24 🔗 VerifiedJ has left
02:28 🔗 zhongfu has quit IRC (Ping timeout: 260 seconds)
02:31 🔗 zhongfu has joined #archiveteam-bs
02:57 🔗 ld1 has quit IRC (Quit: ld1)
03:01 🔗 ld1 has joined #archiveteam-bs
03:39 🔗 Smiley has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 BnAboyZ has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 kisspunch has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Zebranky has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 MrRadar2 has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 BnARobin has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 jtn2 has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Tenebrae has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Fusl has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 hook54321 has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 ez has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Polylith has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Sk1d has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Boppen has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 nyany has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Kagee has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 altlabel has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Xibalba has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 klondike has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 antomatic has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 robogoat has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 SN4T14 has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Lord_Nigh has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Rai-chan has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Aoede has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 Gfy has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 svchfoo1 has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 tsr has quit IRC (ircd.choopa.net se.hub)
03:39 🔗 nightpool has quit IRC (ircd.choopa.net se.hub)
03:52 🔗 godane SketchCow: i'm doing a update of mic.com grabs
03:52 🔗 godane i need to grab the 110001 to 180000 at least
03:53 🔗 godane current number is 187844 based on the front page
03:53 🔗 godane that article number is from 3 hours ago
04:14 🔗 antomatic has joined #archiveteam-bs
04:14 🔗 robogoat has joined #archiveteam-bs
04:14 🔗 nyany has joined #archiveteam-bs
04:14 🔗 klondike has joined #archiveteam-bs
04:14 🔗 SN4T14 has joined #archiveteam-bs
04:14 🔗 Kagee has joined #archiveteam-bs
04:14 🔗 Lord_Nigh has joined #archiveteam-bs
04:14 🔗 altlabel has joined #archiveteam-bs
04:14 🔗 Sk1d has joined #archiveteam-bs
04:14 🔗 Rai-chan has joined #archiveteam-bs
04:14 🔗 Aoede has joined #archiveteam-bs
04:14 🔗 Gfy has joined #archiveteam-bs
04:14 🔗 Xibalba has joined #archiveteam-bs
04:14 🔗 svchfoo1 has joined #archiveteam-bs
04:14 🔗 Boppen has joined #archiveteam-bs
04:14 🔗 tsr has joined #archiveteam-bs
04:14 🔗 nightpool has joined #archiveteam-bs
04:14 🔗 se.hub sets mode: +oo antomatic svchfoo1
04:14 🔗 swebb sets mode: +o antomatic
04:15 🔗 Smiley has joined #archiveteam-bs
04:15 🔗 BnAboyZ has joined #archiveteam-bs
04:15 🔗 kisspunch has joined #archiveteam-bs
04:15 🔗 Zebranky has joined #archiveteam-bs
04:15 🔗 MrRadar2 has joined #archiveteam-bs
04:15 🔗 BnARobin has joined #archiveteam-bs
04:15 🔗 jtn2 has joined #archiveteam-bs
04:15 🔗 Tenebrae has joined #archiveteam-bs
04:15 🔗 Fusl has joined #archiveteam-bs
04:15 🔗 hook54321 has joined #archiveteam-bs
04:15 🔗 ez has joined #archiveteam-bs
04:15 🔗 Polylith has joined #archiveteam-bs
04:21 🔗 qw3rty114 has joined #archiveteam-bs
04:21 🔗 ndiddy has quit IRC ()
04:25 🔗 qw3rty113 has quit IRC (Read error: Operation timed out)
05:24 🔗 godane i'm at 10,450 items this month now
06:04 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
06:09 🔗 Lord_Nigh has joined #archiveteam-bs
06:23 🔗 riking so I was doing some archving stuff and realized I needed to archive some mspfa.com content.
06:23 🔗 riking the funny thing is, all the images are hosted extermally.
06:27 🔗 mabynogy has joined #archiveteam-bs
06:28 🔗 riking so you have to parse through the json file for all the urls
06:28 🔗 riking and i thought to myself, "this should really be automated."
06:29 🔗 riking so then I came here to ask about what problems I should expect
06:48 🔗 Boppen has quit IRC (Ping timeout: 186 seconds)
06:51 🔗 Sk1d has quit IRC (Ping timeout: 186 seconds)
06:51 🔗 Boppen has joined #archiveteam-bs
06:51 🔗 Sk1d has joined #archiveteam-bs
07:03 🔗 Boppen has quit IRC (Ping timeout: 186 seconds)
07:04 🔗 Boppen has joined #archiveteam-bs
07:11 🔗 Boppen has quit IRC (Read error: Connection reset by peer)
07:11 🔗 Boppen has joined #archiveteam-bs
07:36 🔗 ld1 has quit IRC (Ping timeout: 260 seconds)
07:50 🔗 ld1 has joined #archiveteam-bs
08:14 🔗 schbirid has joined #archiveteam-bs
09:28 🔗 mabynogy has quit IRC (Quit: dpt.slasheva.com)
09:30 🔗 jrwr riking: make you srcape look like legit traffic and try to save in the WARC format
09:40 🔗 riking I did notice that wget a list of image URLs ran ridiculously fast..
09:41 🔗 BlueMax has quit IRC (Leaving)
10:13 🔗 schbirid has quit IRC (Ping timeout: 260 seconds)
10:22 🔗 pizzaiolo has joined #archiveteam-bs
11:21 🔗 schbirid has joined #archiveteam-bs
11:59 🔗 icedice has joined #archiveteam-bs
12:36 🔗 icedice has quit IRC (Quit: Leaving)
12:41 🔗 ivan has quit IRC (Read error: Operation timed out)
12:41 🔗 REiN^ has quit IRC (Read error: Operation timed out)
12:41 🔗 chfoo has quit IRC (Read error: Operation timed out)
12:42 🔗 twigfoot has quit IRC (Read error: Operation timed out)
12:42 🔗 ivan has joined #archiveteam-bs
12:42 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
12:42 🔗 twigfoot has joined #archiveteam-bs
12:42 🔗 JAA has quit IRC (Read error: Operation timed out)
12:42 🔗 RKenshin has joined #archiveteam-bs
12:43 🔗 beardicus has quit IRC (Read error: Operation timed out)
12:43 🔗 rsznik has joined #archiveteam-bs
12:43 🔗 squires has quit IRC (Read error: Operation timed out)
12:43 🔗 bsmith093 has quit IRC (Read error: Operation timed out)
12:43 🔗 sep332_ has quit IRC (Read error: Operation timed out)
12:43 🔗 unlobito has quit IRC (Read error: Operation timed out)
12:43 🔗 w0rp has quit IRC (Read error: Operation timed out)
12:43 🔗 Dimtree has quit IRC (Read error: Operation timed out)
12:44 🔗 bwn has quit IRC (Read error: Operation timed out)
12:44 🔗 will has quit IRC (Read error: Operation timed out)
12:44 🔗 rolfoid has quit IRC (Read error: Operation timed out)
12:44 🔗 JAA has joined #archiveteam-bs
12:44 🔗 swebb sets mode: +o JAA
12:44 🔗 Kenshin has quit IRC (Read error: Operation timed out)
12:44 🔗 RKenshin is now known as Kenshin
12:44 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
12:44 🔗 C4K3 has quit IRC (Read error: Operation timed out)
12:45 🔗 PotcFdk has quit IRC (Read error: Operation timed out)
12:46 🔗 rsznick has quit IRC (Read error: Operation timed out)
12:46 🔗 PoorHomie has quit IRC (Read error: Operation timed out)
12:46 🔗 Jusque has quit IRC (Read error: Operation timed out)
12:46 🔗 qw3rty114 has quit IRC (Read error: Operation timed out)
12:47 🔗 robink has quit IRC (Read error: Operation timed out)
12:47 🔗 will has joined #archiveteam-bs
12:48 🔗 Odd0002 has joined #archiveteam-bs
12:49 🔗 chfoo has joined #archiveteam-bs
12:50 🔗 unlobito has joined #archiveteam-bs
12:50 🔗 svchfoo1 sets mode: +o chfoo
12:51 🔗 robink has joined #archiveteam-bs
12:51 🔗 w0rp has joined #archiveteam-bs
12:51 🔗 balrog_ has joined #archiveteam-bs
12:51 🔗 swebb sets mode: +o balrog_
12:52 🔗 Jusque has joined #archiveteam-bs
12:53 🔗 balrog has quit IRC (Read error: Operation timed out)
12:53 🔗 balrog_ is now known as balrog
12:57 🔗 bsmith093 has joined #archiveteam-bs
13:17 🔗 qw3rty114 has joined #archiveteam-bs
13:17 🔗 beardicus has joined #archiveteam-bs
13:21 🔗 C4K3 has joined #archiveteam-bs
13:22 🔗 REiN^ has joined #archiveteam-bs
13:22 🔗 rolfoid has joined #archiveteam-bs
13:22 🔗 PoorHomie has joined #archiveteam-bs
13:23 🔗 mabynogy has joined #archiveteam-bs
13:27 🔗 squires has joined #archiveteam-bs
13:28 🔗 bwn has joined #archiveteam-bs
13:34 🔗 Mayonaise has joined #archiveteam-bs
13:38 🔗 Dimtree has joined #archiveteam-bs
13:44 🔗 PotcFdk has joined #archiveteam-bs
13:53 🔗 VerifiedJ has joined #archiveteam-bs
15:57 🔗 RichardG has quit IRC (Ping timeout: 252 seconds)
16:08 🔗 RichardG has joined #archiveteam-bs
16:14 🔗 octothorp has quit IRC (Ping timeout: 252 seconds)
16:22 🔗 octothorp has joined #archiveteam-bs
16:31 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
16:31 🔗 Mateon1 has joined #archiveteam-bs
16:56 🔗 rsznick has joined #archiveteam-bs
16:59 🔗 rsznik has quit IRC (Read error: Operation timed out)
17:17 🔗 schbirid has quit IRC (Leaving)
17:18 🔗 schbirid has joined #archiveteam-bs
17:30 🔗 c4rc4s has quit IRC (Quit: words)
17:39 🔗 icedice has joined #archiveteam-bs
17:39 🔗 icedice has quit IRC (Client Quit)
17:45 🔗 c4rc4s has joined #archiveteam-bs
17:48 🔗 jschwart has joined #archiveteam-bs
18:06 🔗 Jens has quit IRC (Remote host closed the connection)
18:06 🔗 Jens has joined #archiveteam-bs
18:48 🔗 ola_norsk has joined #archiveteam-bs
18:48 🔗 sep332_ has joined #archiveteam-bs
18:58 🔗 Stilett0 has joined #archiveteam-bs
19:24 🔗 ola_norsk is it just me or does using torrent to upload rather huge items work better than using IA tool or web interface?
19:25 🔗 DedSec has quit IRC (Ping timeout: 260 seconds)
19:27 🔗 ola_norsk (having less chance of item "breaking", i mean)
19:27 🔗 ola_norsk example https://archive.org/details/2017-Phone_Losers_of_America_PLA_Media_Pack/
19:28 🔗 Smiley depends how you define huge, but maybe
19:28 🔗 ola_norsk ~140Gb
19:28 🔗 DedSec has joined #archiveteam-bs
19:31 🔗 ola_norsk i'm guessing files number is ~3000+
19:32 🔗 Smiley well I know web uploading over 50Gb is advised against.
19:35 🔗 ola_norsk aye. But with torrent, perhaps it gives IA ability to start/stop and prioritize at will, without it relying on a user/browser being kept "alive" on the other end?...i don't know.
19:38 🔗 ola_norsk i just notice that item is doing well, while some others (much smaller) i've uploaded using web interface got broken (example https://archive.org/details/2813_d64_C64_roms_wwwC64com )
19:49 🔗 ola_norsk Not to mention, i guess there's also the benefit of if someone had already uploaded that same torrent in the past (or future?), the same data wouldn't be needing to transfered a second time(?)
19:52 🔗 ola_norsk E.g since the torrent hash is the same, IA's Transmission would only process the torrent that already exist
19:53 🔗 ola_norsk has quit IRC (Torrents..The future is naowww!)
19:55 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
20:00 🔗 Ravenloft has joined #archiveteam-bs
20:01 🔗 RichardG has joined #archiveteam-bs
20:08 🔗 riking jrwr: for things I'm planning on doing incremental archives of, how should I tell wget to not save files we already have?
20:08 🔗 riking If I use the same WARC filename, it'll delete the old WARC.
20:09 🔗 riking If I run a --mirror twice in the same directory, it creates .1 .2. 3 files
20:10 🔗 riking ah sorry for ping that question's for anyone.
20:11 🔗 riking actually wait, was I really running it with --mirror
20:20 🔗 riking but anyways, how should I handle incremental WARCs? take a new one and merge them afterwards?
20:24 🔗 riking Okay I wasn't running with --mirror that was my problem.
20:25 🔗 riking still curious about the WARC thing. just create a new one every time?
20:25 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
20:29 🔗 JAA riking: wpull has --warc-append which solves that issue, but I think that doesn't exist in wget.
20:29 🔗 JAA If you want to use wpull, make sure to use version 1.2.3. The 2.0.x versions are broken.
20:32 🔗 riking oh hey, special handling for youtube links. that was also on my list
20:40 🔗 schbirid has quit IRC (Quit: Leaving)
20:46 🔗 Jens Is anyone elses wpull using 100% CPU time?
20:51 🔗 BlueMax has joined #archiveteam-bs
20:59 🔗 riking uh oh. AttributeError: 'module' object has no attribute 'A'
21:04 🔗 riking ERROR [Errno 36] File name too long: '2302/files/px.srvcs.tumblr.com/impixu?T=1518123827&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDpcL1wvdmFzdGVycm9yLnR1bWJsci5jb21cLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiXC8iLCJub3NjcmlwdCI6MX0=&U=GBL1eab3833'
21:06 🔗 Jens wpull also eats 100% in my test vm.
21:06 🔗 Jens Running newsgrabber.
21:07 🔗 icedice has joined #archiveteam-bs
21:07 🔗 riking Ooooo i'm running this in an ecryptfs
21:18 🔗 mabynogy has quit IRC (Quit: dpt.slasheva.com)
21:24 🔗 JAA Jens: Yes, wpull is frequently using 100% CPU here as well.
21:24 🔗 Jens Bit tedious on my 1 CPU VM :/
21:24 🔗 JAA Switching the HTML parser tends to help, but not always.
21:24 🔗 JAA (Don't do that though. Never run modified project code.)
21:25 🔗 Jens Newsgrabber uses some precompiled wpull executable, so it's impossible to tinker with.
21:25 🔗 JAA The HTML parser is controlled through an option.
21:26 🔗 JAA But I don't know if the warrior has lxml installed, so...
21:26 🔗 Jens Haven't used the warrior in ages.
21:26 🔗 JAA Also, while the lxml parser is faster than the default html5lib, it's not as resistant and might misparse in some edge cases.
21:28 🔗 BlueMax has quit IRC (Leaving)
21:30 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
21:31 🔗 RichardG has joined #archiveteam-bs
21:48 🔗 icedice has quit IRC (Quit: Leaving)
22:03 🔗 Stilett0 has quit IRC ()
22:10 🔗 riking Hah, youtube-dl runs so much faster limited to 3MB/sec
22:11 🔗 riking actually.. question; what processing does archive.org do on video files?
22:11 🔗 riking should I even bother trying to download multiple video qualities?
22:21 🔗 astrid nah just download the highest quality
22:22 🔗 astrid IA will downscale it to various bitrates as necessary
22:23 🔗 Igloo I really fancy a five guys right now
22:23 🔗 Igloo Just, a big tasty burger, with cheese
22:23 🔗 Igloo and *ALL* the toppings
22:23 🔗 astrid -> #-ot
22:23 🔗 Igloo Wrong channel :x
23:05 🔗 ranavalon has quit IRC (Quit: Leaving)
23:16 🔗 ZexaronS has quit IRC (Quit: Leaving)
23:31 🔗 VerifiedJ has left

irclogger-viewer