[00:02] *** pizzaiolo has joined #archiveteam-bs [00:18] sho: gpodder, it even has a cli shell with gpo [00:18] literally what it was built to do [00:19] *** Aranje has quit IRC (Quit: Three sheets to the wind) [00:30] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…) [00:31] godane: Good job [00:35] *** drumstick has quit IRC (Read error: Operation timed out) [00:52] godane, https://www.reddit.com/r/DataHoarder/comments/6r3dc5/youtube_request_so_the_channel_that_one_video/dl2jow8 [00:54] godane, started uploading those playlists already derive is taking it's sweet time so they are their but 'unpublished' so far [01:00] godane, last uploaded was: https://archive.org/history/youtube-20hZnSkhDgs [01:10] *** schbirid2 has joined #archiveteam-bs [01:13] *** Swizzle has joined #archiveteam-bs [01:14] *** schbirid has quit IRC (Read error: Operation timed out) [01:40] *** Asparagir has joined #archiveteam-bs [01:46] *** j08nY has quit IRC (Quit: Leaving) [01:57] *** Swizzle has quit IRC (Quit: Leaving) [02:01] *** drumstick has joined #archiveteam-bs [02:03] *** dboard has quit IRC (Remote host closed the connection!) [02:04] *** dboard has joined #archiveteam-bs [02:04] *** dboard has quit IRC (Read error: Connection reset by peer) [02:12] *** dboard has joined #archiveteam-bs [02:12] *** dboard has quit IRC (Connection closed) [02:13] odemg: i found tons of Late Night with David Letterman on youtube [02:13] how much is tons [02:14] https://www.youtube.com/channel/UCqkkzIyGnwkEShBIGYRRgqQ/videos [02:15] he's currently uploading too.. expect more!! [02:15] that is at least over 100+ videos there [02:15] i know [02:17] 512 videos [02:18] https://pastebin.com/raw/stksw9Y7 [02:20] *** dboard2 has joined #archiveteam-bs [02:32] *** drumstick has quit IRC (Read error: Operation timed out) [02:36] *** drumstick has joined #archiveteam-bs [02:50] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [03:11] *** drumstick has quit IRC (Read error: Operation timed out) [03:35] *** kristian_ has joined #archiveteam-bs [04:17] *** kristian_ has quit IRC (Quit: Leaving) [04:18] *** wabu has quit IRC (Read error: Operation timed out) [04:33] *** wabu has joined #archiveteam-bs [04:37] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:44] *** Sk1d has joined #archiveteam-bs [05:04] *** drumstick has joined #archiveteam-bs [06:15] *** Asparagir has quit IRC (Asparagir) [06:17] *** Mateon1 has quit IRC (Read error: Operation timed out) [06:17] *** Mateon1 has joined #archiveteam-bs [06:23] *** qw3rty14 has joined #archiveteam-bs [06:27] *** qw3rty13 has quit IRC (Read error: Operation timed out) [06:43] *** odemg has quit IRC (Read error: Operation timed out) [06:49] *** drumstick has quit IRC (Read error: Operation timed out) [06:52] *** REiN^ has joined #archiveteam-bs [06:56] *** odemg has joined #archiveteam-bs [07:44] *** BlueMaxim has quit IRC (Read error: Operation timed out) [07:45] *** BlueMaxim has joined #archiveteam-bs [07:49] *** drumstick has joined #archiveteam-bs [08:39] *** drumstick has quit IRC (Ping timeout: 633 seconds) [08:57] leffi: i have a problem with the youtube comment downloader not taking youtube id with dash (-) in front of them [08:57] -- doesn't work [08:57] \ doesn't work [08:57] " and ' don't work [09:00] godane: do you run it in a linux shell? [09:01] if so, try putting a single - in front of the id and have the id to be the last thing in your line. eg "youtube-dl -this --that - -ABCA" if "-ABCA" was such id [09:02] *** drumstick has joined #archiveteam-bs [09:10] python downloader.py --youtube-dl -KaK2SOsiw4 --output -KaK2SOsiw4.json [09:10] i'm using this: https://github.com/egbertbouman/youtube-comment-downloader [09:11] ah crap, two times the id [09:13] ah, wont work here [10:00] Hmm. Anyone here good with Heritrix at all please? [10:31] *** BlueMaxim has quit IRC (Read error: Operation timed out) [10:32] *** BlueMaxim has joined #archiveteam-bs [10:55] *** j08nY has joined #archiveteam-bs [11:09] *** username1 has joined #archiveteam-bs [11:14] *** schbirid2 has quit IRC (Read error: Operation timed out) [11:28] *** drumstick has quit IRC (Ping timeout: 246 seconds) [12:00] *** odemg has quit IRC (Read error: Operation timed out) [12:16] *** odemg has joined #archiveteam-bs [12:48] *** kristian_ has joined #archiveteam-bs [12:53] *** Stiletti has quit IRC (Read error: Connection reset by peer) [12:53] *** Stiletti has joined #archiveteam-bs [13:12] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:18] *** j08nY has quit IRC (Quit: Leaving) [13:38] *** Stiletti has quit IRC (Read error: Operation timed out) [13:38] *** Stiletti has joined #archiveteam-bs [14:03] *** Stiletti has quit IRC (Read error: Operation timed out) [14:03] *** Stiletti has joined #archiveteam-bs [14:30] *** Stiletti has quit IRC (Read error: Operation timed out) [14:30] *** Stiletti has joined #archiveteam-bs [14:57] *** kristian_ has quit IRC (Quit: Leaving) [16:03] *** Stiletti has quit IRC (Read error: Operation timed out) [16:03] *** Stiletti has joined #archiveteam-bs [16:42] *** Stiletti has quit IRC (Read error: Operation timed out) [16:43] *** Stiletti has joined #archiveteam-bs [16:52] *** pizzaiolo has joined #archiveteam-bs [16:53] *** pizzaiolo has left [17:11] *** Mateon1 has quit IRC (Remote host closed the connection) [17:12] *** Mateon1 has joined #archiveteam-bs [17:14] *** pizzaiolo has joined #archiveteam-bs [17:24] *** JensRex has quit IRC (Remote host closed the connection) [17:24] *** JensRex has joined #archiveteam-bs [18:08] *** Stiletti has quit IRC (Read error: Operation timed out) [18:08] *** Stiletti has joined #archiveteam-bs [18:11] *** Mateon1 has quit IRC (Ping timeout: 250 seconds) [18:34] *** Stiletti has quit IRC (Read error: Operation timed out) [18:34] *** Stiletti has joined #archiveteam-bs [19:29] *** Aranje has joined #archiveteam-bs [19:42] *** Asparagir has joined #archiveteam-bs [19:50] Anyone have a suggested tool for crawling urls off sites? [19:59] Do you mean crawling links that a website *links to*? Or only the site itself, plus its outbound links? If the latter, try wpull. [20:01] the latter [20:02] Okay [20:17] *** Mateon1 has joined #archiveteam-bs [20:34] *** pikhq has quit IRC (Ping timeout: 245 seconds) [20:40] *** pikhq has joined #archiveteam-bs [20:53] *** username1 is now known as schbirid [21:09] *** Stiletti has quit IRC (Read error: Operation timed out) [21:09] *** Stiletti has joined #archiveteam-bs [21:53] Asparagir: "don't need much computing power" -- I thought others (FalconK?) said before that pipelines were mainly CPU-bound? [21:54] I've been okay running the 4 GB memory / 60 GB disk space droplets on Digital Ocean. But bigger is better, especially if they happen to be running a lot of phantomjs jobs. [21:54] But you can't really control if you happen to get a lot of those phantomjs jobs or not. [21:55] Also, wpull has known (but still not patched) memory leaks. So you need a little wiggle room...and probably need to restart the whole shebang once every few months. [21:58] Hopefully more often for security updates [21:58] But yeah [22:29] *** Administr has joined #archiveteam-bs [22:33] *** HCross has quit IRC (Ping timeout: 268 seconds) [22:43] *** Administr has quit IRC (Ping timeout: 268 seconds) [22:44] *** HarryCros has joined #archiveteam-bs [22:55] *** drumstick has joined #archiveteam-bs [23:04] *** HCross has joined #archiveteam-bs [23:05] *** HarryCros has quit IRC (Ping timeout: 268 seconds) [23:31] *** HCross has quit IRC (Read error: Connection reset by peer) [23:32] *** HarryCros has joined #archiveteam-bs