#archiveteam 2017-08-03,Thu

↑back Search

Time Nickname Message
00:07 🔗 ats has joined #archiveteam
00:07 🔗 godane leffi: how do i save youtube comments?
00:08 🔗 godane i'm using youtube-dl and there is no comment option to save them with
00:13 🔗 BlueMaxim has joined #archiveteam
00:20 🔗 ld1 has quit IRC (Ping timeout: 260 seconds)
00:30 🔗 ld1 has joined #archiveteam
00:32 🔗 toohighto https://www.reddit.com/r/TheNewRight/comments/6r4f7e/ama_with_paul_nehlen_tonight_9pm_et_running/?ref=share&ref_source=link
01:14 🔗 leffi godane: use this python script: https://github.com/egbertbouman/youtube-comment-downloader
01:30 🔗 username1 has joined #archiveteam
01:33 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
01:35 🔗 godane leffi: thanks i'm grabbing them now
01:35 🔗 godane i had to run this to get the youtube id codes: cat *.info.json | sed 's|, |\n|g' | grep '"id"' | grep -v '"0"' | sed 's|.*: "||g' | sed 's|"||g'
01:40 🔗 j08nY has quit IRC (Quit: Leaving)
02:02 🔗 Stiletti has quit IRC (Read error: Operation timed out)
02:08 🔗 ZexaronS has quit IRC (Quit: Leaving)
02:12 🔗 etudier has quit IRC (Ping timeout: 260 seconds)
02:31 🔗 ZexaronS has joined #archiveteam
03:51 🔗 qw3rty12 has joined #archiveteam
03:56 🔗 qw3rty11 has quit IRC (Read error: Operation timed out)
04:10 🔗 ZexaronS has quit IRC (Quit: Leaving)
04:11 🔗 pizzaiolo has quit IRC (Quit: pizzaiolo)
04:38 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:45 🔗 Sk1d has joined #archiveteam
05:14 🔗 kevinr has quit IRC (Ping timeout: 260 seconds)
05:14 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:20 🔗 dashcloud has joined #archiveteam
05:24 🔗 Asparagir has quit IRC (Asparagir)
05:25 🔗 toohighto is now known as Nilgrah
05:25 🔗 Nilgrah is now known as RICKROSS
05:25 🔗 RICKROSS is now known as toohighto
05:39 🔗 toohighto has quit IRC (Ping timeout: 268 seconds)
06:10 🔗 Honno has joined #archiveteam
06:31 🔗 toohigh has joined #archiveteam
06:35 🔗 MMovie2 has quit IRC (Read error: Connection reset by peer)
06:56 🔗 Mateon1 godane: Just for future reference, you can use `jq` to parse json on the command line, it's much nicer than hacking together sed/grep.
06:59 🔗 kevinr has joined #archiveteam
07:15 🔗 username1 is now known as schbirid
07:23 🔗 zino has quit IRC (Read error: Operation timed out)
07:26 🔗 zino has joined #archiveteam
08:06 🔗 toohigh is now known as toohighto
08:08 🔗 Atom has quit IRC (Read error: Operation timed out)
08:42 🔗 fhs has quit IRC (Read error: Operation timed out)
08:55 🔗 kitties has quit IRC (Quit: Connection closed for inactivity)
09:28 🔗 Honno has quit IRC (Read error: Operation timed out)
10:18 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
10:18 🔗 Mateon1 has joined #archiveteam
10:40 🔗 j08nY has joined #archiveteam
11:00 🔗 JAA SketchCow: In case you're not aware, FOS is still down (due to the power outage, I assume).
11:07 🔗 Froggypwn has joined #archiveteam
11:31 🔗 toohighto has quit IRC (Ping timeout: 268 seconds)
11:32 🔗 toohighto has joined #archiveteam
11:43 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
11:47 🔗 toohighto has quit IRC (Ping timeout: 268 seconds)
11:48 🔗 Stiletti has joined #archiveteam
12:10 🔗 pizzaiolo has joined #archiveteam
12:13 🔗 qwerty0 has joined #archiveteam
12:13 🔗 SketchCow Yes, FOS is down until Andy can fix the baremetal host.
12:13 🔗 SketchCow Some time today.
12:14 🔗 schbirid2 has joined #archiveteam
12:14 🔗 qwerty0 Hey, I'm having trouble finding a Warrior project with items available. What's active right now?
12:15 🔗 schbirid has quit IRC (Read error: Operation timed out)
12:17 🔗 ajshell1 has joined #archiveteam
12:20 🔗 toohighto has joined #archiveteam
12:21 🔗 zino atrocity: Looks like he's removing 120 older episodes, not all videos. All of it should still be archived of course.
12:23 🔗 zino godane: If you get a good workflow forr grabbing videos and comments, let me know. On my list of things to do is trying to archive all the Polaris videos that have been unlisted. Planning to use youtube-sync and patch it to include comment dumps.
12:42 🔗 Stiletti is now known as Stiletto
13:09 🔗 ajshell1 has quit IRC (Quit: Leaving)
13:23 🔗 ZexaronS has joined #archiveteam
13:26 🔗 Atom has joined #archiveteam
13:34 🔗 Atom-- has joined #archiveteam
13:38 🔗 Atom has quit IRC (Read error: Operation timed out)
14:06 🔗 RichardG_ has joined #archiveteam
14:08 🔗 voltagex haha, came on here to ask about that
14:09 🔗 voltagex I wonder what formats should be archived?
14:11 🔗 RichardG has quit IRC (Read error: Operation timed out)
14:20 🔗 RichardG_ is now known as RichardG
14:30 🔗 username1 has joined #archiveteam
14:32 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
14:35 🔗 fhs has joined #archiveteam
14:54 🔗 nwf_ has quit IRC (Read error: Operation timed out)
14:59 🔗 fhs has quit IRC (Quit: leaving)
15:06 🔗 nwf_ has joined #archiveteam
15:15 🔗 TheLovina has quit IRC (Read error: Operation timed out)
15:28 🔗 TC01 has quit IRC (Read error: Operation timed out)
16:07 🔗 username1 is now known as schbirid
16:22 🔗 zino voltagex: Format of YouTube video? The source video is not available, so I think everyone are taking whatever "youtube-dl -f bestvideo+bestaudio" spits out. The container might be more debatable. I use mkv, and so does upstream youtube-sync.
16:23 🔗 zino I don't know if anyone has formalized putting page+comments+video in a warc yet, but I think arkiver (?) was working on a videobot, so maybe he has ideas.
16:39 🔗 HCross has quit IRC (Read error: Connection reset by peer)
16:45 🔗 HarryCros has joined #archiveteam
16:50 🔗 bitBaron has joined #archiveteam
17:02 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
17:03 🔗 bitBaron has joined #archiveteam
17:05 🔗 MMovie has joined #archiveteam
17:11 🔗 n00b278 has joined #archiveteam
17:12 🔗 ndiddy has quit IRC (Read error: Operation timed out)
17:31 🔗 Morbus has quit IRC (Ping timeout: 260 seconds)
17:33 🔗 Morbus has joined #archiveteam
17:34 🔗 n00b278 anyone know anything about the urlte.am tracker being down?
17:35 🔗 arkiver zino: yes :) good you're mentioning it
17:36 🔗 arkiver I'll be at SHA until tuesday, youtube support including comments would be a nice project during my time there
17:49 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
17:56 🔗 biller has joined #archiveteam
18:01 🔗 biller hi. I am currently running grab-site(https://github.com/ludios/grab-site). After downloading 50GB I would like to change a commandline parameter(size of warc file). Is it possible to stop it(Ctrl+C) and resume it with different cmdline arguments but without starting from the beginning?
18:02 🔗 biller i.e. conintinue with the urls that where in the queuque(crawl frontier) when the old crawler stopped?
18:08 🔗 atrocity zino: true, still a lot of video data to download
18:32 🔗 godane has left
18:56 🔗 biller has quit IRC (Ping timeout: 268 seconds)
18:59 🔗 schbirid2 has joined #archiveteam
19:00 🔗 n00b278 has quit IRC (Quit: Page closed)
19:01 🔗 schbirid has quit IRC (Read error: Operation timed out)
19:14 🔗 username1 has joined #archiveteam
19:14 🔗 godane has joined #archiveteam
19:14 🔗 username1 is now known as schbirid
19:17 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
19:19 🔗 godane has quit IRC (Quit: Leaving.)
19:21 🔗 schbirid2 has joined #archiveteam
19:25 🔗 schbirid has quit IRC (Read error: Operation timed out)
19:28 🔗 username1 has joined #archiveteam
19:29 🔗 biller has joined #archiveteam
19:30 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
19:31 🔗 Honno has joined #archiveteam
19:39 🔗 ZexaronS has quit IRC (Quit: Leaving)
19:39 🔗 SketchCow FOS is back
19:39 🔗 SketchCow It hates you
19:39 🔗 SketchCow But it's back
19:41 🔗 Honno_ has joined #archiveteam
19:42 🔗 ZexaronS has joined #archiveteam
19:44 🔗 arkiver thanks
19:44 🔗 arkiver dayviews is starting on it soon :D
19:49 🔗 Honno has quit IRC (Read error: Operation timed out)
20:17 🔗 username1 is now known as schbirid
20:17 🔗 Asparagir has joined #archiveteam
20:19 🔗 kristian_ has joined #archiveteam
20:30 🔗 Asparagir has quit IRC (Read error: Operation timed out)
20:33 🔗 Asparagir has joined #archiveteam
20:54 🔗 Balrog_ has joined #archiveteam
20:55 🔗 Balrog_ is now known as fazeeka
20:59 🔗 fazeeka http://tilde.town/~xkeeper/#rhdn the RHDN admins are setting their footguns to full auto
21:01 🔗 fazeeka assuming they are actually having funding problems and this isn't a cash grab, it might be a good idea to archive the hacks, translations, and utilities on the site
21:01 🔗 fazeeka even if it is a cash grab it would be still be a good idea to grab the stuff before the hacks themselves get paywalled
21:02 🔗 JAA I have all downloads, and the site is in ArchiveBot.
21:02 🔗 JAA Unfortunately, they're *heavily* ratelimiting, so the latter is very, very slow.
21:03 🔗 fazeeka gee I wonder why
21:04 🔗 fazeeka incidentally, now that tumblr is censoring NSFW blogs I am reminded of the last time I was here; has anyone shown interest in developing grabbing tools for soup.io?
21:04 🔗 fazeeka it might suddenly become popular
21:27 🔗 omglolbah fazeeka, they're actually censoring or just making users log in and set the mature filter to 'on'?
21:28 🔗 fazeeka they're making you log in according to the archiveteam page
21:29 🔗 fazeeka I haven't been able to reproduce but for now I'm taking the editor's work for it
21:29 🔗 fazeeka most of the NSFW blogs I know of have gotten banhammered in the past few months for some dumb reason
21:30 🔗 omglolbah I am wondering because I've not noticed any change at all browsing my feed and it is rather nsfw for sure..
21:30 🔗 fazeeka you may have gotten "grandfathered in"
21:31 🔗 fazeeka try signing out
21:31 🔗 fazeeka you might even want to try signing out and then using a VPN
21:32 🔗 klg it's inconsistent, I've seen a lot of blogs with RTA headers and x-tumblr-whatever header set to nsfw or adult which still don't require login and work as before
21:33 🔗 fazeeka could be A/B testing on yahoo's part
21:34 🔗 fazeeka sorta like how facebook will randomly select demographics to test new "features" on
21:35 🔗 fazeeka they might be walling of a sample of the NSFW blogs and seeing who walks over it
21:41 🔗 biller has quit IRC (Ping timeout: 268 seconds)
21:53 🔗 j08nY has quit IRC (Remote host closed the connection)
21:55 🔗 j08nY has joined #archiveteam
22:02 🔗 Froggypwn has quit IRC (Read error: Connection reset by peer)
22:05 🔗 BlueMaxim has joined #archiveteam
22:17 🔗 Swizzle has joined #archiveteam
22:28 🔗 kitties has joined #archiveteam
22:37 🔗 Amigatari has joined #archiveteam
22:41 🔗 toohighto has quit IRC (Read error: Connection reset by peer)
22:49 🔗 bwn has quit IRC (Ping timeout: 268 seconds)
22:51 🔗 HCross has joined #archiveteam
22:52 🔗 fazeeka has quit IRC (Read error: Operation timed out)
22:54 🔗 HarryCros has quit IRC (Ping timeout: 268 seconds)
22:56 🔗 bwn has joined #archiveteam
23:24 🔗 Amigatari arkiver: Did you manage to do anything about Dayviews?
23:25 🔗 arkiver working on it right now actually
23:25 🔗 arkiver it will be here https://github.com/ArchiveTeam/dayviews-grab
23:25 🔗 arkiver and then the project will be started
23:26 🔗 Amigatari Oh, cooL!
23:26 🔗 Amigatari I'm relieved. :)
23:26 🔗 arkiver it's only not really easy to split up profiles into parts for download
23:26 🔗 Amigatari I understand that.
23:26 🔗 arkiver so one items on the tracker will be a whole profile
23:27 🔗 arkiver and profiles will be discovered while we archive
23:27 🔗 Amigatari Alright, I see.
23:27 🔗 Amigatari Any other interesting technical details?
23:27 🔗 arkiver not really
23:27 🔗 arkiver it's really not a horrible website fortunately
23:27 🔗 Amigatari Yeah.
23:27 🔗 Amigatari I quite like it.
23:28 🔗 Amigatari But it has been dying for quite a while.
23:28 🔗 Amigatari I was the guy who added it to the wiki like a year ago.
23:28 🔗 Amigatari I knew it was dying.
23:28 🔗 arkiver no crazy POSTs anywhere, no horrible scripts
23:28 🔗 arkiver nice
23:28 🔗 Amigatari There's a bunch of homegrown Swedish sites that died down, most famously Lunarstorm.
23:28 🔗 Amigatari I'm pretty sure that site died before Archive Team was a thing though.
23:28 🔗 Amigatari And you needed to be logged in to see any content at all.
23:28 🔗 Amigatari And it changed quite often and shit.
23:28 🔗 arkiver why did you think this site was dying?
23:29 🔗 arkiver also I'm not planning on doing any logging in on dayviews
23:29 🔗 arkiver in that case we will not get the images behind a login
23:29 🔗 Amigatari Well, most other Swedish community sites have been dying for a while.
23:29 🔗 arkiver yeah there's been another one that reminds me of this site
23:29 🔗 Amigatari As far as I remember, you can access those that are behind a login wall by using the Googlebot user agent.
23:29 🔗 arkiver forgot the name though
23:30 🔗 Amigatari This is one of the last ones I think.
23:30 🔗 arkiver ah, thanks for that
23:30 🔗 arkiver let's use googlebot :P
23:30 🔗 * arkiver loves google
23:30 🔗 Amigatari Hamsterpaj.net still exists and has some similarities with Lunarstorm.
23:31 🔗 arkiver that site gives me 0 bytes
23:31 🔗 Amigatari kamrat.nu looks stylisticly quite close to Lunarstorm. That site is today mostly known for pedophilic grooming being prevalent there though.
23:32 🔗 arkiver we should look into these sites
23:32 🔗 arkiver for signs of them dying slowly
23:33 🔗 * arkiver is afk for some food
23:33 🔗 arkiver also, let's move this to #archiveteam-bs
23:35 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
23:35 🔗 Stiletti has joined #archiveteam
23:42 🔗 BubuAnabe has quit IRC (Ping timeout: 268 seconds)
23:54 🔗 fazeeka has joined #archiveteam
23:56 🔗 Amigatari has quit IRC (Quit: Leaving)

irclogger-viewer