[00:07] *** ats has joined #archiveteam [00:07] leffi: how do i save youtube comments? [00:08] i'm using youtube-dl and there is no comment option to save them with [00:13] *** BlueMaxim has joined #archiveteam [00:20] *** ld1 has quit IRC (Ping timeout: 260 seconds) [00:30] *** ld1 has joined #archiveteam [00:32] https://www.reddit.com/r/TheNewRight/comments/6r4f7e/ama_with_paul_nehlen_tonight_9pm_et_running/?ref=share&ref_source=link [01:14] godane: use this python script: https://github.com/egbertbouman/youtube-comment-downloader [01:30] *** username1 has joined #archiveteam [01:33] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:35] leffi: thanks i'm grabbing them now [01:35] i had to run this to get the youtube id codes: cat *.info.json | sed 's|, |\n|g' | grep '"id"' | grep -v '"0"' | sed 's|.*: "||g' | sed 's|"||g' [01:40] *** j08nY has quit IRC (Quit: Leaving) [02:02] *** Stiletti has quit IRC (Read error: Operation timed out) [02:08] *** ZexaronS has quit IRC (Quit: Leaving) [02:12] *** etudier has quit IRC (Ping timeout: 260 seconds) [02:31] *** ZexaronS has joined #archiveteam [03:51] *** qw3rty12 has joined #archiveteam [03:56] *** qw3rty11 has quit IRC (Read error: Operation timed out) [04:10] *** ZexaronS has quit IRC (Quit: Leaving) [04:11] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [04:38] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:45] *** Sk1d has joined #archiveteam [05:14] *** kevinr has quit IRC (Ping timeout: 260 seconds) [05:14] *** dashcloud has quit IRC (Read error: Operation timed out) [05:20] *** dashcloud has joined #archiveteam [05:24] *** Asparagir has quit IRC (Asparagir) [05:25] *** toohighto is now known as Nilgrah [05:25] *** Nilgrah is now known as RICKROSS [05:25] *** RICKROSS is now known as toohighto [05:39] *** toohighto has quit IRC (Ping timeout: 268 seconds) [06:10] *** Honno has joined #archiveteam [06:31] *** toohigh has joined #archiveteam [06:35] *** MMovie2 has quit IRC (Read error: Connection reset by peer) [06:56] godane: Just for future reference, you can use `jq` to parse json on the command line, it's much nicer than hacking together sed/grep. [06:59] *** kevinr has joined #archiveteam [07:15] *** username1 is now known as schbirid [07:23] *** zino has quit IRC (Read error: Operation timed out) [07:26] *** zino has joined #archiveteam [08:06] *** toohigh is now known as toohighto [08:08] *** Atom has quit IRC (Read error: Operation timed out) [08:42] *** fhs has quit IRC (Read error: Operation timed out) [08:55] *** kitties has quit IRC (Quit: Connection closed for inactivity) [09:28] *** Honno has quit IRC (Read error: Operation timed out) [10:18] *** Mateon1 has quit IRC (Ping timeout: 268 seconds) [10:18] *** Mateon1 has joined #archiveteam [10:40] *** j08nY has joined #archiveteam [11:00] SketchCow: In case you're not aware, FOS is still down (due to the power outage, I assume). [11:07] *** Froggypwn has joined #archiveteam [11:31] *** toohighto has quit IRC (Ping timeout: 268 seconds) [11:32] *** toohighto has joined #archiveteam [11:43] *** BlueMaxim has quit IRC (Read error: Operation timed out) [11:47] *** toohighto has quit IRC (Ping timeout: 268 seconds) [11:48] *** Stiletti has joined #archiveteam [12:10] *** pizzaiolo has joined #archiveteam [12:13] *** qwerty0 has joined #archiveteam [12:13] Yes, FOS is down until Andy can fix the baremetal host. [12:13] Some time today. [12:14] *** schbirid2 has joined #archiveteam [12:14] Hey, I'm having trouble finding a Warrior project with items available. What's active right now? [12:15] *** schbirid has quit IRC (Read error: Operation timed out) [12:17] *** ajshell1 has joined #archiveteam [12:20] *** toohighto has joined #archiveteam [12:21] atrocity: Looks like he's removing 120 older episodes, not all videos. All of it should still be archived of course. [12:23] godane: If you get a good workflow forr grabbing videos and comments, let me know. On my list of things to do is trying to archive all the Polaris videos that have been unlisted. Planning to use youtube-sync and patch it to include comment dumps. [12:42] *** Stiletti is now known as Stiletto [13:09] *** ajshell1 has quit IRC (Quit: Leaving) [13:23] *** ZexaronS has joined #archiveteam [13:26] *** Atom has joined #archiveteam [13:34] *** Atom-- has joined #archiveteam [13:38] *** Atom has quit IRC (Read error: Operation timed out) [14:06] *** RichardG_ has joined #archiveteam [14:08] haha, came on here to ask about that [14:09] I wonder what formats should be archived? [14:11] *** RichardG has quit IRC (Read error: Operation timed out) [14:20] *** RichardG_ is now known as RichardG [14:30] *** username1 has joined #archiveteam [14:32] *** schbirid2 has quit IRC (Read error: Operation timed out) [14:35] *** fhs has joined #archiveteam [14:54] *** nwf_ has quit IRC (Read error: Operation timed out) [14:59] *** fhs has quit IRC (Quit: leaving) [15:06] *** nwf_ has joined #archiveteam [15:15] *** TheLovina has quit IRC (Read error: Operation timed out) [15:28] *** TC01 has quit IRC (Read error: Operation timed out) [16:07] *** username1 is now known as schbirid [16:22] voltagex: Format of YouTube video? The source video is not available, so I think everyone are taking whatever "youtube-dl -f bestvideo+bestaudio" spits out. The container might be more debatable. I use mkv, and so does upstream youtube-sync. [16:23] I don't know if anyone has formalized putting page+comments+video in a warc yet, but I think arkiver (?) was working on a videobot, so maybe he has ideas. [16:39] *** HCross has quit IRC (Read error: Connection reset by peer) [16:45] *** HarryCros has joined #archiveteam [16:50] *** bitBaron has joined #archiveteam [17:02] *** bitBaron has quit IRC (Read error: Connection reset by peer) [17:03] *** bitBaron has joined #archiveteam [17:05] *** MMovie has joined #archiveteam [17:11] *** n00b278 has joined #archiveteam [17:12] *** ndiddy has quit IRC (Read error: Operation timed out) [17:31] *** Morbus has quit IRC (Ping timeout: 260 seconds) [17:33] *** Morbus has joined #archiveteam [17:34] anyone know anything about the urlte.am tracker being down? [17:35] zino: yes :) good you're mentioning it [17:36] I'll be at SHA until tuesday, youtube support including comments would be a nice project during my time there [17:49] *** bitBaron has quit IRC (Read error: Connection reset by peer) [17:56] *** biller has joined #archiveteam [18:01] hi. I am currently running grab-site(https://github.com/ludios/grab-site). After downloading 50GB I would like to change a commandline parameter(size of warc file). Is it possible to stop it(Ctrl+C) and resume it with different cmdline arguments but without starting from the beginning? [18:02] i.e. conintinue with the urls that where in the queuque(crawl frontier) when the old crawler stopped? [18:08] zino: true, still a lot of video data to download [18:32] *** godane has left [18:56] *** biller has quit IRC (Ping timeout: 268 seconds) [18:59] *** schbirid2 has joined #archiveteam [19:00] *** n00b278 has quit IRC (Quit: Page closed) [19:01] *** schbirid has quit IRC (Read error: Operation timed out) [19:14] *** username1 has joined #archiveteam [19:14] *** godane has joined #archiveteam [19:14] *** username1 is now known as schbirid [19:17] *** schbirid2 has quit IRC (Read error: Operation timed out) [19:19] *** godane has quit IRC (Quit: Leaving.) [19:21] *** schbirid2 has joined #archiveteam [19:25] *** schbirid has quit IRC (Read error: Operation timed out) [19:28] *** username1 has joined #archiveteam [19:29] *** biller has joined #archiveteam [19:30] *** schbirid2 has quit IRC (Read error: Operation timed out) [19:31] *** Honno has joined #archiveteam [19:39] *** ZexaronS has quit IRC (Quit: Leaving) [19:39] FOS is back [19:39] It hates you [19:39] But it's back [19:41] *** Honno_ has joined #archiveteam [19:42] *** ZexaronS has joined #archiveteam [19:44] thanks [19:44] dayviews is starting on it soon :D [19:49] *** Honno has quit IRC (Read error: Operation timed out) [20:17] *** username1 is now known as schbirid [20:17] *** Asparagir has joined #archiveteam [20:19] *** kristian_ has joined #archiveteam [20:30] *** Asparagir has quit IRC (Read error: Operation timed out) [20:33] *** Asparagir has joined #archiveteam [20:54] *** Balrog_ has joined #archiveteam [20:55] *** Balrog_ is now known as fazeeka [20:59] http://tilde.town/~xkeeper/#rhdn the RHDN admins are setting their footguns to full auto [21:01] assuming they are actually having funding problems and this isn't a cash grab, it might be a good idea to archive the hacks, translations, and utilities on the site [21:01] even if it is a cash grab it would be still be a good idea to grab the stuff before the hacks themselves get paywalled [21:02] I have all downloads, and the site is in ArchiveBot. [21:02] Unfortunately, they're *heavily* ratelimiting, so the latter is very, very slow. [21:03] gee I wonder why [21:04] incidentally, now that tumblr is censoring NSFW blogs I am reminded of the last time I was here; has anyone shown interest in developing grabbing tools for soup.io? [21:04] it might suddenly become popular [21:27] fazeeka, they're actually censoring or just making users log in and set the mature filter to 'on'? [21:28] they're making you log in according to the archiveteam page [21:29] I haven't been able to reproduce but for now I'm taking the editor's work for it [21:29] most of the NSFW blogs I know of have gotten banhammered in the past few months for some dumb reason [21:30] I am wondering because I've not noticed any change at all browsing my feed and it is rather nsfw for sure.. [21:30] you may have gotten "grandfathered in" [21:31] try signing out [21:31] you might even want to try signing out and then using a VPN [21:32] it's inconsistent, I've seen a lot of blogs with RTA headers and x-tumblr-whatever header set to nsfw or adult which still don't require login and work as before [21:33] could be A/B testing on yahoo's part [21:34] sorta like how facebook will randomly select demographics to test new "features" on [21:35] they might be walling of a sample of the NSFW blogs and seeing who walks over it [21:41] *** biller has quit IRC (Ping timeout: 268 seconds) [21:53] *** j08nY has quit IRC (Remote host closed the connection) [21:55] *** j08nY has joined #archiveteam [22:02] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [22:05] *** BlueMaxim has joined #archiveteam [22:17] *** Swizzle has joined #archiveteam [22:28] *** kitties has joined #archiveteam [22:37] *** Amigatari has joined #archiveteam [22:41] *** toohighto has quit IRC (Read error: Connection reset by peer) [22:49] *** bwn has quit IRC (Ping timeout: 268 seconds) [22:51] *** HCross has joined #archiveteam [22:52] *** fazeeka has quit IRC (Read error: Operation timed out) [22:54] *** HarryCros has quit IRC (Ping timeout: 268 seconds) [22:56] *** bwn has joined #archiveteam [23:24] arkiver: Did you manage to do anything about Dayviews? [23:25] working on it right now actually [23:25] it will be here https://github.com/ArchiveTeam/dayviews-grab [23:25] and then the project will be started [23:26] Oh, cooL! [23:26] I'm relieved. :) [23:26] it's only not really easy to split up profiles into parts for download [23:26] I understand that. [23:26] so one items on the tracker will be a whole profile [23:27] and profiles will be discovered while we archive [23:27] Alright, I see. [23:27] Any other interesting technical details? [23:27] not really [23:27] it's really not a horrible website fortunately [23:27] Yeah. [23:27] I quite like it. [23:28] But it has been dying for quite a while. [23:28] I was the guy who added it to the wiki like a year ago. [23:28] I knew it was dying. [23:28] no crazy POSTs anywhere, no horrible scripts [23:28] nice [23:28] There's a bunch of homegrown Swedish sites that died down, most famously Lunarstorm. [23:28] I'm pretty sure that site died before Archive Team was a thing though. [23:28] And you needed to be logged in to see any content at all. [23:28] And it changed quite often and shit. [23:28] why did you think this site was dying? [23:29] also I'm not planning on doing any logging in on dayviews [23:29] in that case we will not get the images behind a login [23:29] Well, most other Swedish community sites have been dying for a while. [23:29] yeah there's been another one that reminds me of this site [23:29] As far as I remember, you can access those that are behind a login wall by using the Googlebot user agent. [23:29] forgot the name though [23:30] This is one of the last ones I think. [23:30] ah, thanks for that [23:30] let's use googlebot :P [23:30] * arkiver loves google [23:30] Hamsterpaj.net still exists and has some similarities with Lunarstorm. [23:31] that site gives me 0 bytes [23:31] kamrat.nu looks stylisticly quite close to Lunarstorm. That site is today mostly known for pedophilic grooming being prevalent there though. [23:32] we should look into these sites [23:32] for signs of them dying slowly [23:33] * arkiver is afk for some food [23:33] also, let's move this to #archiveteam-bs [23:35] *** Stiletto has quit IRC (Read error: Connection reset by peer) [23:35] *** Stiletti has joined #archiveteam [23:42] *** BubuAnabe has quit IRC (Ping timeout: 268 seconds) [23:54] *** fazeeka has joined #archiveteam [23:56] *** Amigatari has quit IRC (Quit: Leaving)