[00:05] *** qw3rty113 has joined #archiveteam-bs [00:07] *** icedice has joined #archiveteam-bs [00:09] *** icedice has quit IRC (Client Quit) [00:11] *** qw3rty112 has quit IRC (Ping timeout: 600 seconds) [01:37] *** pizzaiolo has quit IRC (Remote host closed the connection) [01:43] *** Jens has quit IRC (Remote host closed the connection) [01:44] *** Jens has joined #archiveteam-bs [02:02] *** antomatic has quit IRC (Read error: Connection reset by peer) [02:03] *** antomatic has joined #archiveteam-bs [02:03] *** swebb sets mode: +o antomatic [02:24] *** VerifiedJ has left [02:28] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [02:31] *** zhongfu has joined #archiveteam-bs [02:57] *** ld1 has quit IRC (Quit: ld1) [03:01] *** ld1 has joined #archiveteam-bs [03:39] *** Smiley has quit IRC (ircd.choopa.net se.hub) [03:39] *** BnAboyZ has quit IRC (ircd.choopa.net se.hub) [03:39] *** kisspunch has quit IRC (ircd.choopa.net se.hub) [03:39] *** Zebranky has quit IRC (ircd.choopa.net se.hub) [03:39] *** MrRadar2 has quit IRC (ircd.choopa.net se.hub) [03:39] *** BnARobin has quit IRC (ircd.choopa.net se.hub) [03:39] *** jtn2 has quit IRC (ircd.choopa.net se.hub) [03:39] *** Tenebrae has quit IRC (ircd.choopa.net se.hub) [03:39] *** Fusl has quit IRC (ircd.choopa.net se.hub) [03:39] *** hook54321 has quit IRC (ircd.choopa.net se.hub) [03:39] *** ez has quit IRC (ircd.choopa.net se.hub) [03:39] *** Polylith has quit IRC (ircd.choopa.net se.hub) [03:39] *** Sk1d has quit IRC (ircd.choopa.net se.hub) [03:39] *** Boppen has quit IRC (ircd.choopa.net se.hub) [03:39] *** nyany has quit IRC (ircd.choopa.net se.hub) [03:39] *** Kagee has quit IRC (ircd.choopa.net se.hub) [03:39] *** altlabel has quit IRC (ircd.choopa.net se.hub) [03:39] *** Xibalba has quit IRC (ircd.choopa.net se.hub) [03:39] *** klondike has quit IRC (ircd.choopa.net se.hub) [03:39] *** antomatic has quit IRC (ircd.choopa.net se.hub) [03:39] *** robogoat has quit IRC (ircd.choopa.net se.hub) [03:39] *** SN4T14 has quit IRC (ircd.choopa.net se.hub) [03:39] *** Lord_Nigh has quit IRC (ircd.choopa.net se.hub) [03:39] *** Rai-chan has quit IRC (ircd.choopa.net se.hub) [03:39] *** Aoede has quit IRC (ircd.choopa.net se.hub) [03:39] *** Gfy has quit IRC (ircd.choopa.net se.hub) [03:39] *** svchfoo1 has quit IRC (ircd.choopa.net se.hub) [03:39] *** tsr has quit IRC (ircd.choopa.net se.hub) [03:39] *** nightpool has quit IRC (ircd.choopa.net se.hub) [03:52] SketchCow: i'm doing a update of mic.com grabs [03:52] i need to grab the 110001 to 180000 at least [03:53] current number is 187844 based on the front page [03:53] that article number is from 3 hours ago [04:14] *** antomatic has joined #archiveteam-bs [04:14] *** robogoat has joined #archiveteam-bs [04:14] *** nyany has joined #archiveteam-bs [04:14] *** klondike has joined #archiveteam-bs [04:14] *** SN4T14 has joined #archiveteam-bs [04:14] *** Kagee has joined #archiveteam-bs [04:14] *** Lord_Nigh has joined #archiveteam-bs [04:14] *** altlabel has joined #archiveteam-bs [04:14] *** Sk1d has joined #archiveteam-bs [04:14] *** Rai-chan has joined #archiveteam-bs [04:14] *** Aoede has joined #archiveteam-bs [04:14] *** Gfy has joined #archiveteam-bs [04:14] *** Xibalba has joined #archiveteam-bs [04:14] *** svchfoo1 has joined #archiveteam-bs [04:14] *** Boppen has joined #archiveteam-bs [04:14] *** tsr has joined #archiveteam-bs [04:14] *** nightpool has joined #archiveteam-bs [04:14] *** se.hub sets mode: +oo antomatic svchfoo1 [04:14] *** swebb sets mode: +o antomatic [04:15] *** Smiley has joined #archiveteam-bs [04:15] *** BnAboyZ has joined #archiveteam-bs [04:15] *** kisspunch has joined #archiveteam-bs [04:15] *** Zebranky has joined #archiveteam-bs [04:15] *** MrRadar2 has joined #archiveteam-bs [04:15] *** BnARobin has joined #archiveteam-bs [04:15] *** jtn2 has joined #archiveteam-bs [04:15] *** Tenebrae has joined #archiveteam-bs [04:15] *** Fusl has joined #archiveteam-bs [04:15] *** hook54321 has joined #archiveteam-bs [04:15] *** ez has joined #archiveteam-bs [04:15] *** Polylith has joined #archiveteam-bs [04:21] *** qw3rty114 has joined #archiveteam-bs [04:21] *** ndiddy has quit IRC () [04:25] *** qw3rty113 has quit IRC (Read error: Operation timed out) [05:24] i'm at 10,450 items this month now [06:04] *** Lord_Nigh has quit IRC (Ping timeout: 250 seconds) [06:09] *** Lord_Nigh has joined #archiveteam-bs [06:23] so I was doing some archving stuff and realized I needed to archive some mspfa.com content. [06:23] the funny thing is, all the images are hosted extermally. [06:27] *** mabynogy has joined #archiveteam-bs [06:28] so you have to parse through the json file for all the urls [06:28] and i thought to myself, "this should really be automated." [06:29] so then I came here to ask about what problems I should expect [06:48] *** Boppen has quit IRC (Ping timeout: 186 seconds) [06:51] *** Sk1d has quit IRC (Ping timeout: 186 seconds) [06:51] *** Boppen has joined #archiveteam-bs [06:51] *** Sk1d has joined #archiveteam-bs [07:03] *** Boppen has quit IRC (Ping timeout: 186 seconds) [07:04] *** Boppen has joined #archiveteam-bs [07:11] *** Boppen has quit IRC (Read error: Connection reset by peer) [07:11] *** Boppen has joined #archiveteam-bs [07:36] *** ld1 has quit IRC (Ping timeout: 260 seconds) [07:50] *** ld1 has joined #archiveteam-bs [08:14] *** schbirid has joined #archiveteam-bs [09:28] *** mabynogy has quit IRC (Quit: dpt.slasheva.com) [09:30] riking: make you srcape look like legit traffic and try to save in the WARC format [09:40] I did notice that wget a list of image URLs ran ridiculously fast.. [09:41] *** BlueMax has quit IRC (Leaving) [10:13] *** schbirid has quit IRC (Ping timeout: 260 seconds) [10:22] *** pizzaiolo has joined #archiveteam-bs [11:21] *** schbirid has joined #archiveteam-bs [11:59] *** icedice has joined #archiveteam-bs [12:36] *** icedice has quit IRC (Quit: Leaving) [12:41] *** ivan has quit IRC (Read error: Operation timed out) [12:41] *** REiN^ has quit IRC (Read error: Operation timed out) [12:41] *** chfoo has quit IRC (Read error: Operation timed out) [12:42] *** twigfoot has quit IRC (Read error: Operation timed out) [12:42] *** ivan has joined #archiveteam-bs [12:42] *** Odd0002 has quit IRC (Read error: Operation timed out) [12:42] *** twigfoot has joined #archiveteam-bs [12:42] *** JAA has quit IRC (Read error: Operation timed out) [12:42] *** RKenshin has joined #archiveteam-bs [12:43] *** beardicus has quit IRC (Read error: Operation timed out) [12:43] *** rsznik has joined #archiveteam-bs [12:43] *** squires has quit IRC (Read error: Operation timed out) [12:43] *** bsmith093 has quit IRC (Read error: Operation timed out) [12:43] *** sep332_ has quit IRC (Read error: Operation timed out) [12:43] *** unlobito has quit IRC (Read error: Operation timed out) [12:43] *** w0rp has quit IRC (Read error: Operation timed out) [12:43] *** Dimtree has quit IRC (Read error: Operation timed out) [12:44] *** bwn has quit IRC (Read error: Operation timed out) [12:44] *** will has quit IRC (Read error: Operation timed out) [12:44] *** rolfoid has quit IRC (Read error: Operation timed out) [12:44] *** JAA has joined #archiveteam-bs [12:44] *** swebb sets mode: +o JAA [12:44] *** Kenshin has quit IRC (Read error: Operation timed out) [12:44] *** RKenshin is now known as Kenshin [12:44] *** Mayonaise has quit IRC (Read error: Operation timed out) [12:44] *** C4K3 has quit IRC (Read error: Operation timed out) [12:45] *** PotcFdk has quit IRC (Read error: Operation timed out) [12:46] *** rsznick has quit IRC (Read error: Operation timed out) [12:46] *** PoorHomie has quit IRC (Read error: Operation timed out) [12:46] *** Jusque has quit IRC (Read error: Operation timed out) [12:46] *** qw3rty114 has quit IRC (Read error: Operation timed out) [12:47] *** robink has quit IRC (Read error: Operation timed out) [12:47] *** will has joined #archiveteam-bs [12:48] *** Odd0002 has joined #archiveteam-bs [12:49] *** chfoo has joined #archiveteam-bs [12:50] *** unlobito has joined #archiveteam-bs [12:50] *** svchfoo1 sets mode: +o chfoo [12:51] *** robink has joined #archiveteam-bs [12:51] *** w0rp has joined #archiveteam-bs [12:51] *** balrog_ has joined #archiveteam-bs [12:51] *** swebb sets mode: +o balrog_ [12:52] *** Jusque has joined #archiveteam-bs [12:53] *** balrog has quit IRC (Read error: Operation timed out) [12:53] *** balrog_ is now known as balrog [12:57] *** bsmith093 has joined #archiveteam-bs [13:17] *** qw3rty114 has joined #archiveteam-bs [13:17] *** beardicus has joined #archiveteam-bs [13:21] *** C4K3 has joined #archiveteam-bs [13:22] *** REiN^ has joined #archiveteam-bs [13:22] *** rolfoid has joined #archiveteam-bs [13:22] *** PoorHomie has joined #archiveteam-bs [13:23] *** mabynogy has joined #archiveteam-bs [13:27] *** squires has joined #archiveteam-bs [13:28] *** bwn has joined #archiveteam-bs [13:34] *** Mayonaise has joined #archiveteam-bs [13:38] *** Dimtree has joined #archiveteam-bs [13:44] *** PotcFdk has joined #archiveteam-bs [13:53] *** VerifiedJ has joined #archiveteam-bs [15:57] *** RichardG has quit IRC (Ping timeout: 252 seconds) [16:08] *** RichardG has joined #archiveteam-bs [16:14] *** octothorp has quit IRC (Ping timeout: 252 seconds) [16:22] *** octothorp has joined #archiveteam-bs [16:31] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [16:31] *** Mateon1 has joined #archiveteam-bs [16:56] *** rsznick has joined #archiveteam-bs [16:59] *** rsznik has quit IRC (Read error: Operation timed out) [17:17] *** schbirid has quit IRC (Leaving) [17:18] *** schbirid has joined #archiveteam-bs [17:30] *** c4rc4s has quit IRC (Quit: words) [17:39] *** icedice has joined #archiveteam-bs [17:39] *** icedice has quit IRC (Client Quit) [17:45] *** c4rc4s has joined #archiveteam-bs [17:48] *** jschwart has joined #archiveteam-bs [18:06] *** Jens has quit IRC (Remote host closed the connection) [18:06] *** Jens has joined #archiveteam-bs [18:48] *** ola_norsk has joined #archiveteam-bs [18:48] *** sep332_ has joined #archiveteam-bs [18:58] *** Stilett0 has joined #archiveteam-bs [19:24] is it just me or does using torrent to upload rather huge items work better than using IA tool or web interface? [19:25] *** DedSec has quit IRC (Ping timeout: 260 seconds) [19:27] (having less chance of item "breaking", i mean) [19:27] example https://archive.org/details/2017-Phone_Losers_of_America_PLA_Media_Pack/ [19:28] depends how you define huge, but maybe [19:28] ~140Gb [19:28] *** DedSec has joined #archiveteam-bs [19:31] i'm guessing files number is ~3000+ [19:32] well I know web uploading over 50Gb is advised against. [19:35] aye. But with torrent, perhaps it gives IA ability to start/stop and prioritize at will, without it relying on a user/browser being kept "alive" on the other end?...i don't know. [19:38] i just notice that item is doing well, while some others (much smaller) i've uploaded using web interface got broken (example https://archive.org/details/2813_d64_C64_roms_wwwC64com ) [19:49] Not to mention, i guess there's also the benefit of if someone had already uploaded that same torrent in the past (or future?), the same data wouldn't be needing to transfered a second time(?) [19:52] E.g since the torrent hash is the same, IA's Transmission would only process the torrent that already exist [19:53] *** ola_norsk has quit IRC (Torrents..The future is naowww!) [19:55] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:00] *** Ravenloft has joined #archiveteam-bs [20:01] *** RichardG has joined #archiveteam-bs [20:08] jrwr: for things I'm planning on doing incremental archives of, how should I tell wget to not save files we already have? [20:08] If I use the same WARC filename, it'll delete the old WARC. [20:09] If I run a --mirror twice in the same directory, it creates .1 .2. 3 files [20:10] ah sorry for ping that question's for anyone. [20:11] actually wait, was I really running it with --mirror [20:20] but anyways, how should I handle incremental WARCs? take a new one and merge them afterwards? [20:24] Okay I wasn't running with --mirror that was my problem. [20:25] still curious about the WARC thing. just create a new one every time? [20:25] *** jschwart has quit IRC (Quit: Konversation terminated!) [20:29] riking: wpull has --warc-append which solves that issue, but I think that doesn't exist in wget. [20:29] If you want to use wpull, make sure to use version 1.2.3. The 2.0.x versions are broken. [20:32] oh hey, special handling for youtube links. that was also on my list [20:40] *** schbirid has quit IRC (Quit: Leaving) [20:46] Is anyone elses wpull using 100% CPU time? [20:51] *** BlueMax has joined #archiveteam-bs [20:59] uh oh. AttributeError: 'module' object has no attribute 'A' [21:04] ERROR [Errno 36] File name too long: '2302/files/px.srvcs.tumblr.com/impixu?T=1518123827&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDpcL1wvdmFzdGVycm9yLnR1bWJsci5jb21cLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiXC8iLCJub3NjcmlwdCI6MX0=&U=GBL1eab3833' [21:06] wpull also eats 100% in my test vm. [21:06] Running newsgrabber. [21:07] *** icedice has joined #archiveteam-bs [21:07] Ooooo i'm running this in an ecryptfs [21:18] *** mabynogy has quit IRC (Quit: dpt.slasheva.com) [21:24] Jens: Yes, wpull is frequently using 100% CPU here as well. [21:24] Bit tedious on my 1 CPU VM :/ [21:24] Switching the HTML parser tends to help, but not always. [21:24] (Don't do that though. Never run modified project code.) [21:25] Newsgrabber uses some precompiled wpull executable, so it's impossible to tinker with. [21:25] The HTML parser is controlled through an option. [21:26] But I don't know if the warrior has lxml installed, so... [21:26] Haven't used the warrior in ages. [21:26] Also, while the lxml parser is faster than the default html5lib, it's not as resistant and might misparse in some edge cases. [21:28] *** BlueMax has quit IRC (Leaving) [21:30] *** RichardG has quit IRC (Read error: Connection reset by peer) [21:31] *** RichardG has joined #archiveteam-bs [21:48] *** icedice has quit IRC (Quit: Leaving) [22:03] *** Stilett0 has quit IRC () [22:10] Hah, youtube-dl runs so much faster limited to 3MB/sec [22:11] actually.. question; what processing does archive.org do on video files? [22:11] should I even bother trying to download multiple video qualities? [22:21] nah just download the highest quality [22:22] IA will downscale it to various bitrates as necessary [22:23] I really fancy a five guys right now [22:23] Just, a big tasty burger, with cheese [22:23] and *ALL* the toppings [22:23] -> #-ot [22:23] Wrong channel :x [23:05] *** ranavalon has quit IRC (Quit: Leaving) [23:16] *** ZexaronS has quit IRC (Quit: Leaving) [23:31] *** VerifiedJ has left