[00:07] *** BlueMaxim has quit IRC (Read error: Connection reset by peer)
[00:11] *** marvinw is now known as ivan
[00:12] <ivan> JAA: want to write another web crawler using https://github.com/GoogleChrome/puppeteer? ;)
[00:15] <JAA> ivan: PurpleSym has been working on something like that. (Not with Node though, fortunately.)
[00:16] <JAA> https://github.com/PromyLOPh/crocoite
[00:17] <ivan> cool
[00:19] <JAA> I'm still considering building a minimal tool with aiohttp and warcio though, primarily for relatively simple stuff like APIs where the heavy machinery of a browser isn't needed.
[00:20] <JAA> But for now I'm busy rescuing the crashed CompuServe ArchiveBot job.
[00:22] <CoolCanuk> why anothe web crawler/
[00:26] <JAA> The browser-based one because wpull doesn't handle heavily scripted websites well. It also means we can support HTTP/2, which may improve performance as a side-effect. Furthermore, the traffic will look much more real, so it might help getting around (some) bans.
[00:26] <JAA> The other one because wpull has a ton of ugly bugs that make it quite annoying to work with at times.
[00:26] <CoolCanuk> :/
[00:26] *** bithippo has joined #archiveteam-bs
[00:27] <JAA> And I think aiohttp+warcio would also be more lightweight than wpull. It would be a quite simplistic and specialist tool for certain use cases only, in particular API archiving (I've done a few of those with wpull).
[00:27] <JAA> I have no intention of rebuilding wpull.
[00:27] <JAA> That time would be better spent in debugging wpull instead. It's a great tool, it just suffers from a number of issues that make it barely usable really.
[00:45] <godane> so i did 2 tapes
[00:45] <godane> one is Loch Ness Discovered tape
[00:45] <godane> another is called Titanic
[00:45] <godane> that aired on A&E
[00:46] <godane> both are from 1994
[00:49] *** pizzaiolo has joined #archiveteam-bs
[00:54] *** ZexaronS- has joined #archiveteam-bs
[00:55] *** ZexaronS has quit IRC (Read error: Operation timed out)
[01:00] <JAA> I'm impressed. My IP is still banned at Wine.Woot (since three weeks now).
[01:33] *** bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…)
[01:37] *** zalgo has joined #archiveteam-bs
[01:41] *** zalgo has quit IRC (Remote host closed the connection)
[01:45] *** zalgo has joined #archiveteam-bs
[02:03] *** pizzaiolo has quit IRC (Remote host closed the connection)
[02:06] *** zalgo has quit IRC (Remote host closed the connection)
[02:22] <ivan> I use grab-site and wpull 1.2.3 all the time and while it's useful only like 90% of the time I wouldn't call it barely usable
[02:23] <ivan> sure it would be nice to archive everything but there's plenty of stuff it can archive
[02:25] <ivan> without it maybe you'd be dealing with heritrix or httrack :-)
[02:27] <JAA> Oh yeah, 1.2.3 is pretty good. The plugin interface of 2.0 is much better though.
[02:27] <JAA> And yes, it's definitely the best tool we have.
[02:28] <JAA> (I'm using 1.2.3 for most of my manual grabs as well.)
[02:54] *** ZexaronS- has quit IRC (Leaving)
[03:22] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
[03:36] *** BlueMaxim has joined #archiveteam-bs
[04:09] *** qw3rty116 has joined #archiveteam-bs
[04:15] *** qw3rty115 has quit IRC (Read error: Operation timed out)
[04:35] *** ranavalon has quit IRC (Remote host closed the connection)
[04:36] *** ranavalon has joined #archiveteam-bs
[05:43] *** Dimtree has quit IRC (Read error: Operation timed out)
[06:01] *** Dimtree has joined #archiveteam-bs
[06:02] *** Dimtree has quit IRC (Client Quit)
[06:03] *** Dimtree has joined #archiveteam-bs
[06:37] *** Dimtree has quit IRC (Read error: Operation timed out)
[06:38] *** Pixi has quit IRC (Ping timeout: 255 seconds)
[06:41] *** Pixi has joined #archiveteam-bs
[06:49] *** Mateon1 has quit IRC (Remote host closed the connection)
[06:50] *** kimmer2 has quit IRC (Ping timeout: 633 seconds)
[06:50] *** Mateon1 has joined #archiveteam-bs
[07:00] *** kimmer2 has joined #archiveteam-bs
[07:10] *** omglolbah has quit IRC (Ping timeout: 250 seconds)
[07:10] *** tuluu has quit IRC (Read error: Operation timed out)
[07:11] *** tuluu has joined #archiveteam-bs
[07:12] *** Mateon1 has quit IRC (Read error: Operation timed out)
[07:13] *** Mateon1 has joined #archiveteam-bs
[07:15] *** Dimtree has joined #archiveteam-bs
[07:27] *** Asparagir has quit IRC (Asparagir)
[08:02] *** Mateon1 has quit IRC (Remote host closed the connection)
[08:04] *** Mateon1 has joined #archiveteam-bs
[08:14] <PurpleSym> JAA: Distributed archiving using celery and an IRC bot are functional, but not checked in yet. I’m setting up a testing environment currently.
[08:33] *** du_ has quit IRC (Ping timeout: 260 seconds)
[08:42] *** Mateon1 has quit IRC (Remote host closed the connection)
[08:43] *** Mateon1 has joined #archiveteam-bs
[10:14] *** godane has quit IRC (Read error: Operation timed out)
[10:25] *** godane has joined #archiveteam-bs
[10:57] *** pizzaiolo has joined #archiveteam-bs
[11:40] *** BlueMaxim has quit IRC (Quit: Leaving)
[11:56] *** refeed has joined #archiveteam-bs
[11:59] *** refeed has quit IRC (Client Quit)
[11:59] *** refeed has joined #archiveteam-bs
[12:06] *** _refeed_ has joined #archiveteam-bs
[12:06] *** refeed has quit IRC (Read error: Connection reset by peer)
[12:08] *** Mateon1 has quit IRC (Remote host closed the connection)
[12:09] *** Mateon1 has joined #archiveteam-bs
[12:20] *** __refeed_ has joined #archiveteam-bs
[12:20] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[12:29] *** _refeed_ has joined #archiveteam-bs
[12:29] *** __refeed_ has quit IRC (Read error: Connection reset by peer)
[12:33] *** refeed has joined #archiveteam-bs
[12:33] *** refeed has quit IRC (Connection closed)
[12:34] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[12:34] *** refeed has joined #archiveteam-bs
[12:35] *** refeed has quit IRC (Client Quit)
[12:48] *** Mateon1 has quit IRC (Remote host closed the connection)
[12:49] *** Mateon1 has joined #archiveteam-bs
[13:10] *** Nugamus has joined #archiveteam-bs
[13:18] *** godane has quit IRC (Quit: Leaving.)
[13:23] *** TheLovina has quit IRC (Read error: Operation timed out)
[14:05] *** du_ has joined #archiveteam-bs
[14:21] *** K4k has quit IRC (Ping timeout: 260 seconds)
[14:29] *** K4k has joined #archiveteam-bs
[14:32] *** Stilett0 has joined #archiveteam-bs
[14:35] *** K4k has quit IRC (Ping timeout: 250 seconds)
[14:35] <JAA> SketchCow: Another random "a" file in item archiveteam_archivebot_go_20171210190001 (see my messages in here about three days ago). This time, it's a WARC again.
[14:35] <JAA> See also #archivebot
[14:40] *** K4k has joined #archiveteam-bs
[14:48] <JAA> Figured it out, it was a typo on one of the ArchiveBot pipelines. I hope it didn't overwrite any files.
[14:49] <JAA> I'll send you a list of what each of those files should be named via email.
[15:05] *** Mateon1 has quit IRC (Remote host closed the connection)
[15:06] *** Stilett0 is now known as Stiletto
[15:32] <DrasticAc> Okay, my first pass of the parsed Miiverse Database finished up over the weekend. I have to redownload some WARCs from IA that were corrupted and validate one more time to make sure I got everything, but it's a pretty good representation of what we got last month. Total size so far of the database is 191 GB.
[15:33] <DrasticAc> We saved 2,117,420 deleted posts (by marking their non-existence), 128,727,868 posts, and 206,473,819 replies. 
[15:33] <DrasticAc> We saved 69,955,548 drawings, of which 27,086,692 were posted in replies, which Nintendo didn't send to users in their "archive". 
[15:33] <DrasticAc> Likewise, we saved 72,985,588 screenshots, with 14,089,107 posted in replies.
[15:34] <DrasticAc> Here's a spreadsheet of the amount of posts and replies per day. https://usercontent.irccloud-cdn.com/file/UJeNuOnR/TotalTimestamp.xlsx
[15:37] <DrasticAc> Once I validate the database, I'll throw it on IA.
[15:44] *** refeed has joined #archiveteam-bs
[15:44] *** refeed has quit IRC (Connection closed)
[15:45] *** refeed has joined #archiveteam-bs
[15:56] *** _refeed_ has joined #archiveteam-bs
[16:00] *** refeed has quit IRC (Read error: Connection reset by peer)
[16:31] *** __refeed_ has joined #archiveteam-bs
[16:34] *** refeed has joined #archiveteam-bs
[16:35] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[16:38] *** __refeed_ has quit IRC (Read error: Connection reset by peer)
[16:38] <SketchCow> JAA: HI
[16:38] <SketchCow> Yeah, I'm all for fixing these after the fact
[16:38] <SketchCow> FOS is dying under this Manga Plus Everything Else
[16:39] *** _refeed_ has joined #archiveteam-bs
[16:39] <SketchCow> So I'm going to just upload and then we'll issue fixes and re-checks
[16:39] <Igloo> SketchCow: i've raised the first pull to fix the 2.0.3 issue
[16:39] <Igloo> I'm writing the changes to uploader now
[16:42] *** refeed has quit IRC (Read error: Connection reset by peer)
[16:45] <SketchCow> Great
[16:46] <SketchCow> But yeah, we are due lots of audit of already uploaded shiznat, frankly
[16:46] <SketchCow> I do it when I can
[16:46] <SketchCow> Right now, though, FOS is definitely heaving and I'm trying to fix that
[16:46] <SketchCow> Some of it is just because it's the new FOS and automatic upload processes weren't running
[16:46] <SketchCow> Other is MANGA
[16:46] <SketchCow> MANGA baby
[16:47] <JAA> Sure, we can fix that down the line as well.
[16:48] <JAA> Can we prevent files being overwritten though?
[16:48] <JAA> We probably lost about a third of Cracked due to that, for example.
[16:48] <SketchCow> Is cracked deleted?
[16:48] <SketchCow> I mean, we can re-do it
[16:49] <JAA> No, it's still online.
[16:49] <SketchCow> But everything I see is that it is on the rsync command side, not the server side.
[16:49] <SketchCow> So some --ignore-existing or something
[16:49] <JAA> Hmm.
[16:49] <SketchCow> So it's the pipeline, really
[16:49] <JAA> Yeah, it should be possible to do something there.
[16:49] <SketchCow> I don't THINK there's an rsyncd setting (or rsyncd.conf setting) to stop overwriting, but I'll look
[16:50] <SketchCow> https://download.samba.org/pub/rsync/rsyncd.conf.html
[16:50] <SketchCow> If someone sees something, I'll toss it in
[16:51] <SketchCow> I mean, I looked and it REALLY looks like it's on the rsync command's side to not be the asshole
[16:54] <JAA> Maybe something could be done with "filter".
[16:54] <JAA> "Files excluded by the daemon filter chain (daemon-excluded files) are treated as non-existent if the client tries to pull them, are skipped with an error message if the client tries to push them (triggering exit code 23), and are never deleted from the module."
[16:56] <JAA> That filter would have to be updated with the current file list after every upload, I think.
[16:56] <SketchCow> This is too much work.
[16:56] <SketchCow> Client just needs a new setting.
[16:56] *** __refeed_ has joined #archiveteam-bs
[16:57] <sec0nd> https://storify.com/faq-eol
[16:59] <SketchCow> ha ha wow
[17:00] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[17:02] <JAA> SketchCow: I can come up with a number of ugly workarounds. For example, periodically moving the files to a different directory with mv -n. The rsync target directory would just be an "inbox". If that move command is running every few minutes, the probability of anything being overwritten should be very small.
[17:02] *** CoolCanuk has joined #archiveteam-bs
[17:02] <JAA> Or chattr +i every file. (Needs to be undone to delete it after upload to the IA systems though, obviously.)
[17:03] <CoolCanuk> I heard JAA's IP is banned from wine.woot
[17:03] <JAA> In any case, yes, we should add a flag to the client side, but I'm not a fan of relying on that.
[17:03] <SketchCow> I think that if we don't rely on the clients being functional, we're sunk anyway
[17:04] <JAA> Well yeah, but the clients should only be able to do what they're supposed to do, i.e. push new data onto FOS.
[17:05] <joepie91> how about just using `incoming chmod` to set `ugo-w` for each transfer, preventing further writes to that file?
[17:05] <joepie91> not sure if that'll work
[17:05] *** K4k has quit IRC (Read error: Operation timed out)
[17:06] <joepie91> it doesn't seem to have an incoming chown option or I'd have suggested changing its owner to a non-rsync owner
[17:06] <JAA> I'd be surprised if it did, to be honest.
[17:06] <joepie91> worth a shot?
[17:06] <JAA> It's the write permission of the directory that dictates whether you're able to overwrite a file.
[17:06] <joepie91> I... don't think so?
[17:07] <JAA> touch file && chmod -w file && rm -f file
[17:07] <JAA> I don't think rm will restore the write permission before it deletes the file.
[17:07] <JAA> Deleting a file simply means removing the directory entry, so...
[17:07] *** sep332 has joined #archiveteam-bs
[17:07] <JAA> And you *can* overwrite read-only files with rsync, so...
[17:07] <joepie91> hm
[17:10] <Igloo> JAA: i'll update the uploader.py too
[17:10] <JAA> Thanks
[17:11] *** bwn has quit IRC (Read error: Operation timed out)
[17:22] *** Cyn_ has joined #archiveteam-bs
[17:24] *** BnAboyZ has quit IRC (Quit: The Lounge - https://thelounge.github.io)
[17:48] *** icedice has joined #archiveteam-bs
[17:49] *** schbirid has joined #archiveteam-bs
[17:56] *** _refeed_ has joined #archiveteam-bs
[18:01] *** __refeed_ has quit IRC (Read error: Connection reset by peer)
[18:07] *** K4k has joined #archiveteam-bs
[18:25] *** icedice has quit IRC (Read error: Operation timed out)
[18:29] *** icedice has joined #archiveteam-bs
[18:31] *** icedice2 has joined #archiveteam-bs
[18:35] *** icedice has quit IRC (Ping timeout: 260 seconds)
[18:54] *** bwn has joined #archiveteam-bs
[18:56] *** __refeed_ has joined #archiveteam-bs
[19:01] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[19:02] <hook54321> Does anyone know if myspleen invites are open?
[19:15] *** icedice has joined #archiveteam-bs
[19:16] *** icedice has quit IRC (Client Quit)
[19:20] *** icedice2 has quit IRC (Read error: Operation timed out)
[19:30] *** omglolbah has joined #archiveteam-bs
[19:37] *** jschwart has joined #archiveteam-bs
[19:37] *** ola_norsk has joined #archiveteam-bs
[19:38] <ola_norsk> does anyone know if "<item>/history/file/*~1~" serve a special purpose on internet archive?
[19:38] <astrid> example?
[19:39] <ola_norsk> the 'history' folder does not seem to show up in item editor
[19:39] <astrid> pls link to an example, i have not seen this before
[19:39] <ola_norsk> 1 sec
[19:43] <ola_norsk> in the item sowhat-vidme_archive, "ia ls <item>" returns several files listed with such a directory (history/files/),  e.g the file "history/files/20171111_YouTubers explain YT Censorship_Iv3WZ.description.~1~"
[19:44] <ola_norsk> and the ".~1~" seems to have been added to original filename
[19:44] <ola_norsk> filenames*
[19:45] <astrid> looks like it retains changes in a directory /history/files/ https://archive.org/download/sowhat-vidme_archive/history/files/
[19:45] <astrid> but other items that i uploaded and modified don't have the directory
[19:46] <ola_norsk> first time i've seen it
[19:46] <astrid> i know .~1~ is used as a version suffix for backup files that emacs and some other tools create
[19:47] <ola_norsk> is it possible it's just a case of wrong detection of file format?
[19:47] <ola_norsk> or do you mean e.g -re-uploaed files?
[19:47] <astrid> thinking the second one
[19:48] <Igloo> On that job a lot of work was re-done
[19:48] <Igloo> So likely
[19:48] <ola_norsk> the guy apparently has 1080+ videos on his vidme
[19:48] <ola_norsk> so 'a lot of work' might also make sense :D
[19:49] <Igloo> We've done somewhere around 70TB of data from vid.me
[19:49] <Igloo> It's around 1.2PB in size
[19:50] <ola_norsk> i simply focused on some channels with videos that i know are not on youtube.. E.g "AfterPrisonJoe" from "AfterPrisonShow" on youtube
[19:51] <ola_norsk> wasn't it 600TB ?
[19:51] <Igloo> That was our estimate. We got hold of someone who worked there, It's a lot more...
[19:51] <ola_norsk> doh
[19:54] <ola_norsk> maybe 'job scripts' should simply be made the moment a new service pops up :D
[19:55] <ola_norsk> though there's no doubt a majority of that 1.2PB is most likely already existing on youtube
[19:56] *** _refeed_ has joined #archiveteam-bs
[20:00] *** __refeed_ has quit IRC (Read error: Connection reset by peer)
[20:01] <ola_norsk> a bit is the users fault though, as i see it. There's e.g no reason for a having a news commentary video in 1080p, max bitrate, and at 60fps..
[20:02] <ola_norsk> example, the 'reading scary creepypasta stories'
[20:03] <ola_norsk> most often composed of someone narrating over what's basically slideshows of still images
[20:04] <JAA> That should compress extremely well though.
[20:04] <ola_norsk> yeah, but if there's smoke effects over it
[20:06] *** __refeed_ has joined #archiveteam-bs
[20:08] <ola_norsk> i don't know how compression on vidme worked. But most of the files in the item i mentioned at join are basically the person browsing german newspapers while delivering english commentary. Some are 1 hour +
[20:09] <ola_norsk> and i think i saw one of those 1+h videoes being 1+ GB
[20:09] <ola_norsk> i'm not saying i pirate movies, but 1.3GB seems to be well for a 1080p movie
[20:10] *** _refeed_ has quit IRC (Read error: Connection reset by peer)
[20:12] <Frogging> 1.3GB for a full length movie at 1080p is bitrate starved
[20:14] <ola_norsk> 'starved' ?
[20:20] <Aoede> it looks like shit
[20:20] <Frogging> yeah, as in, not enough bitrate to maintain a high level of quality at that resolution, at least for a live action movie. if a 1.3GB video is 2 hours long, the average bitrate is about 1.4Mbps. it's subjective and does depend on the codec, but to me it seems low for 1080p.
[20:21] <ola_norsk> "eye of the beholder then" :)
[20:21] <zino> It measurably bad in the psychovisual model.
[20:22] <ola_norsk> 'measurably worse'...'bad' is relative
[20:22] <ola_norsk> but anyway, i'm not codec expert :D
[20:23] <JAA> Who doesn't like nice blocks and banding? /s
[20:24] <Frogging> JAA: VP9 pisses me off with how much banding it creates in dark scenes 
[20:24] <Frogging> even at relatively high nitrates 
[20:25] <Frogging> bit
[20:29] <JAA> Don't have much experience with VP9.
[20:29] <Frogging> feel like it's a step backwards from H.264 at least in that record 
[20:29] <Frogging> regard* (autocorrect) 
[20:30] <JAA> Uhm, did IA just change the /save page?
[20:30] <ola_norsk> :/
[20:30] <ola_norsk> web.archive.org/save ?
[20:31] <JAA> I'm getting a page with a textarea "List of archived elements in reverse order" instead of the usual save page.
[20:31] <JAA> Yeah
[20:31] <ola_norsk> i hope not
[20:31] <JAA> It also doesn't actually archive anything for me.
[20:31] <JAA> As far as I can tell anyway.
[20:31] <Igloo> Shows same as normal to me
[20:32] <ola_norsk> as long as it's just the looks
[20:33] *** Coderjo has quit IRC (Read error: Operation timed out)
[20:33] *** yipdw has quit IRC (Read error: Operation timed out)
[20:33] <ola_norsk> how else could i be capturing rather repetitiv tweets about #netneutrality :/
[20:33] <Igloo> #archivebot
[20:33] <Igloo> ;-)
[20:33] *** me has joined #archiveteam-bs
[20:33] <ola_norsk> wouldn't that mean i had to download it first?
[20:34] <JAA> https://share.riseup.net/#tuQ-OKPcwjop11jI_XzhzA
[20:34] <Igloo> Nope, Just feed it URLs
[20:34] <JAA> It's just sitting there forever.
[20:34] <Igloo> And then it does the rest
[20:34] <MrRadar2> For Twitter, make sure you use --phantomjs with archivebot
[20:34] <MrRadar2> Since Twitter uses JS for pretty much everything
[20:34] <ola_norsk> it would autoscroll?
[20:34] <Igloo> Indeed
[20:34] <JAA> No, because it's broken.
[20:35] <Igloo> Use the right pipeline and it works ;-)
[20:35] <JAA> Well, yeah.
[20:35] <JAA> Does it work on yours?
[20:35] <Igloo> On CA it does
[20:35] <Igloo> I haven't tested the rest
[20:36] <JAA> Let's move this to over there.
[20:36] *** atomicthu has quit IRC (Ping timeout: 260 seconds)
[20:36] <JAA> Nope, can't save anything into the Wayback Machine right now. Ew.
[20:38] *** atomicthu has joined #archiveteam-bs
[20:41] *** kimmer1 has joined #archiveteam-bs
[20:42] *** Coderjo has joined #archiveteam-bs
[20:46] *** kimmer12 has quit IRC (Read error: Operation timed out)
[20:46] <CoolCanuk> :O
[20:46] <CoolCanuk> did WE break it?!?
[20:47] <JAA> Lol no
[20:48] <Frogging> /save/ appears broken for me too
[20:52] <hook54321> same
[20:56] *** _refeed_ has joined #archiveteam-bs
[20:57] <JAA> Good to see that I'm not the only one. I've sent an email to info@ just in case they don't know about it already.
[21:00] *** __refeed_ has quit IRC (Read error: Connection reset by peer)
[21:11] *** _refeed_ has quit IRC (Leaving)
[21:20] <hook54321> Somebody2: I noticed in #archivebot logs a few months ago you tagged me and a few other people and said that the owner of autistics.org wanted to pass the site onto someone else, and that we could redirect stuff to pages on the wayback machine. If that's still open I'd be interested in that.
[21:36] <ola_norsk> the problem with 'tweep' is that it seems to be oriented to specified user account
[21:37] <ola_norsk> oooops', no it doesn't "python tweep.py -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets."
[21:40] <JAA> Yep, and it looks like you can just search for the hashtag there.
[21:41] <JAA> (For those out of the loop, we're talking about https://github.com/haccer/tweep )
[21:44] <ola_norsk> what if it just bounce around on twitter domain, checking every tweet? :D that could take some time :D
[21:45] <JAA> Have fun with that.
[21:45] <ola_norsk> skål!
[22:17] <JAA> Got a reply from Mark, they're "testing an update to page of our Save Page Now service". He asked me to try again, and my saves went through this time.
[22:23] *** Cyn_ has quit IRC (Ping timeout: 262 seconds)
[22:25] <ola_norsk> tweep appears to be broken :/
[22:25] <antomatic> Nyaaaa! YouTube won't let accounts upload more than 100 videos per day.
[22:26] <antomatic> Progress on the news archive is gonna be slow. :)
[22:27] <ola_norsk> antomatic: at least that's not a sign it's all going to hell :)
[22:30] *** schbirid has quit IRC (Quit: Leaving)
[22:31] *** PoorHomes has joined #archiveteam-bs
[22:32] *** PoorHomie has quit IRC (Quit: No)
[22:33] <ola_norsk> it's about high time users adopt a pessim-op-tic view of "the internet" https://youtu.be/1VD_pJOFnZ0
[22:33] <antomatic> If I upload 100 items each and every day, then I think I can get everything up to the end of 2017 uploaded... by the end of 2018. :)
[22:33] *** PoorHomes is now known as PoorHomie
[22:34] <ola_norsk> antomatic: if i had to guess, the cause might be users of several nations trying to reupload their catalog that they moved to vidme
[22:35] <antomatic> Heh. I think it's just a generic dont-spam-us-too-hard measure. :)
[22:36] <ola_norsk> antomatic: that, or youtube's wallet is getting a bit scrawny for Google to be happy with.. :/
[22:36] <antomatic> This is a waste of time anyway with the 3 strikes rule, because /someone/ is bound to object hard enough to it and the whole lot comes down
[22:37] <antomatic> still, gives the opportunity to trim the recordings nicely, collate some metadata and run out the caption transcripts.
[22:37] <hook54321> antomatic: Can you upload them to archive.org?
[22:38] <ola_norsk> the entire internet needs a trimming, there's too much monopoly going on
[22:38] <antomatic> I don't know if archive.org necessarily wants 30tb+ of news bulletins though. :)
[22:39] <astrid> did you email info@ like we discussed
[22:39] <antomatic> did we discussed? sorry, must have missed that
[22:40] <hook54321> Don't they already record tv news though?
[22:40] <antomatic> US/International yes, I think, but I don't know if they record UK domestic 
[22:41] <astrid> yeah we spoke about this a few days ago
[22:41] *** pizzaiolo has quit IRC (Ping timeout: 246 seconds)
[22:41] <astrid> look in your logs around 5th december
[22:41] <astrid> you and i
[22:42] <hook54321> I asked Jason if IA records Catalan TV once and he said "ish"
[22:43] <ola_norsk> 'so and so' recording?
[22:43] <antomatic> astird: ah, yes, I did see that. 
[22:43] <antomatic> thx
[22:44] <antomatic> wanted to check whether I had a decent process for transcoding & transcribing it all first
[22:44] <ola_norsk> where can i find which channels IA records? most of NRK (norwegian state television) is nation locked
[22:45] <antomatic> oops, sorry, astrid. thanks. typo. :)
[22:45] <astrid> IA will transcode it
[22:45] <astrid> transcription is something else
[22:46] <ola_norsk> youtube caption autogenerated is horrible :(
[22:46] <antomatic> Mm, some of the source files are a little proprietary, but I can bounce them out to .TS which I believe IA accepts
[22:46] <antomatic> I can extract the live captioning to SRT, which should be /something/ 
[22:47] <astrid> oh that's good
[22:47] <antomatic> although the accuracy is a little ropey. seems to have improved some over the last few years
[22:47] <antomatic> YouTube's autosubs are actually *amazingly* good - in their own way - compared to the broadcast subs 
[22:48] <ola_norsk> not only is it vtt, but their translatiation is shit. Even at rather clean english (i think?) https://youtu.be/fpZD_3D8_WQ
[22:48] <antomatic> But they totally fall to bits when background noise reaches a certain level
[22:48] <ola_norsk> ah...
[22:48] <antomatic> They've improved *enormously* over the last year or so
[22:48] <ola_norsk> that video got that :D
[22:48] <antomatic> still a long way from perfect though
[22:49] *** Pixi has quit IRC (Ping timeout: 255 seconds)
[22:49] <JAA> hook54321: CCMA?
[22:50] *** Pixi has joined #archiveteam-bs
[22:51] <ola_norsk> antomatic: and yet, there's Google Deepmind AI able to master chess after just 4 hours..
[22:51] * ola_norsk is not impressed
[22:54] <ola_norsk> if i was more Alex Jonesy, i'd be likely to say that when these supermachines beat human champs in Chess and Go, it's got to do with money paid to the looser...
[22:54] <antomatic> Did you see that video of Alex Jones 'interviewing' an Amazon Echo? :)
[22:54] <ola_norsk> lol no
[22:55] <ola_norsk> I like Alex 'coloidal silver niacent' Jones :D
[22:55] <antomatic> He's all like 'Alexa, who is Jeff Bezos?', 'Alexa, you are lying to me', etc. Nuts. :)
[22:56] <ola_norsk> Alex's got 'Brain Power' to prevent being beaten by any disinformation or misleading
[22:57] <antomatic> Hehe. :)
[22:57] <MrRadar2> Do you have a link to that "interview", sounds hilarious
[22:57] <ola_norsk> aye, givvus link
[22:58] <antomatic> https://www.youtube.com/watch?v=u5kNP7tyhk8
[22:59] <MrRadar2> thx
[22:59] <MrRadar2> I'll be watching that in a private browsing window so as to not pollute my Youtube history
[22:59] *** JAA sets mode: +b *!*@185.143.40.157
[23:00] <JAA> (Just a preventive ban for a spammer that showed up in #vidmeh and #miiworse)
[23:00] <ola_norsk> lol, i love how he's stern when saying "Alexa! ; .." ..like talking to a little girl child :D
[23:00] <MrRadar2> "Alexa, I have mainstream news articles that Amazon is owned by the CIA" XD
[23:01] <ola_norsk> "Aleeexaaaa; did you steal <item> out of fridge?"
[23:01] *** BlueMaxim has joined #archiveteam-bs
[23:02] <MrRadar2> LOL, Alex Jones fails the Turing test
[23:02] <antomatic> That *look* on his face, like he's seriously thinking everything through and knows that if he asks the right question he'll blow the whole thing open at any moment. :)
[23:03] <ola_norsk> lol
[23:03] <ola_norsk> "Alexaaaaa! Have you finished your homework?"
[23:04] <ola_norsk> "Alexaaa! Did you take out the trash?"
[23:04] <MrRadar2> Wait, did he just spin that whole crazy video out of that deal that Amazon made to have a dedicatd AWS region for the federal government???
[23:06] <ola_norsk> well, i guess it's settled, it's the "ultimate control system"
[23:06] <antomatic> Mm, there was a viral [jukin] the day before where someone asked their Alexa about the CIA and it turned off, apparently. 
[23:07] <antomatic> (well, just didn't answer, really)
[23:07] <ola_norsk> for it to detect even "Alexa!", it DOES mean that it's listening all the time though..
[23:08] <antomatic> mm, it's listening locally (not streaming to the cloud) until it hears 'Alexa', then it sends your voice  to the cloud to be recognised/answered. 
[23:08] <antomatic> supposedly.
[23:09] <antomatic> but of course, it's an internet-connected mic and how it works is just down to the software. 
[23:11] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
[23:11] <PoorHomie> But TBF, the mute button on the top is a hardwired switch
[23:11] <antomatic> ah, nice
[23:12] <PoorHomie> Which shows at least a little bit about security
[23:12] *** CoolCanuk has joined #archiveteam-bs
[23:12] <PoorHomie> Which shows at least Amazon cares a little bit about security *
[23:12] <antomatic> important for when Alex is discussing international affairs. :)
[23:12] <antomatic> Whoa. Alex. Alexa. How deep does the conspiracy go? :)
[23:13] <ola_norsk> someone needs to check out Alexa with wireshark! See when she says something to the internet and when she doesn't!
[23:13] <ola_norsk> lol
[23:13] <PoorHomie> it's been checked to death
[23:13] <ola_norsk> aye
[23:13] <antomatic> "Alexa. Order ladies' panties."
[23:13] <PoorHomie> it only streams when you say Alexa
[23:13] <PoorHomie> period
[23:14] <PoorHomie> But of course, whoever controls the update keys can change that at any second, without your knowledge
[23:14] *** JAA sets mode: +b botnickna!*@*
[23:14] *** JAA sets mode: +b *!*@185.143.40.*
[23:14] <astrid> do not adjust your set. we control the signing, we control the update.
[23:14] <ola_norsk> they should pick starwars names like R2D2 and shit like that..There's people named Siri and Alexa :D
[23:15] <PoorHomie> You can change the trigger word to "Amazon"
[23:15] <antomatic> Also 'Echo' and now 'Computer' apparently, which is especially excellent.
[23:15] <PoorHomie> but in my household that would be even more annoying
[23:16] <PoorHomie> we use the shit out of prime so every 4th word is amazon lol
[23:16] <astrid> PoorHomie: are you here to archive or to talk about irrelevant shit
[23:19] <ola_norsk> is there anyway to figure out at what point 'tweep' (python) fails ?
[23:19] <ola_norsk> without having to rewrite the entire damn thing :/
[23:20] *** JAA sets mode: +b *!*@37.237.65.*
[23:20] *** JAA sets mode: +b Ya_ALLAH_!*@*
[23:20] <ola_norsk> a python debugger of sorts?.. :/
[23:21] <PoorHomie> @astrid, excuse me? 
[23:21] <JAA> ola_norsk: How does it fail?
[23:21] <ola_norsk> JAA: quietly :/
[23:22] <ola_norsk> JAA: no output whatsoever, no file, no text output
[23:25] <JAA> Hmm, that's odd.
[23:25] <ola_norsk> aye
[23:26] <ola_norsk> the 'pip install image' command in requisties section on their page is apparently outdated though
[23:27] <ola_norsk> so that failed, and apparently that is reference to PIL, witch is currently 'Pillow" :/
[23:28] <ola_norsk> when running 'pip install image' it even looked for Django..so something is odd
[23:28] <ola_norsk> could be my machine is messed up though
[23:29] <JAA> You could just comment out the from PIL import Image line in the script if you don't use --pics.
[23:31] <ola_norsk> i've only tried the 2 non-user specific commands "python tweep.py -s pineapple" , "python tweep.py -s "Donald Trump" --verified --users"
[23:32] <ola_norsk> no --picture argument added to them
[23:32] <ola_norsk> both fail without any output
[23:32] <JAA> Hmm
[23:33] <ola_norsk> if there was output, i'd have something to go by
[23:33] <ola_norsk> JAA: what python version you use?
[23:34] *** Atom has quit IRC (Read error: Connection reset by peer)
[23:34] <ola_norsk> i tried using it on 2.7.14
[23:34] <JAA> ola_norsk: I can reproduce that with 2.7.something.
[23:34] *** BnAboyZ has joined #archiveteam-bs
[23:35] *** MrDignity has joined #archiveteam-bs
[23:36] <ola_norsk> PoorHomie: let's rolle the pinapple https://youtu.be/zwF_mGA-YQg
[23:36] <ola_norsk> PoorHomie: ;)
[23:38] <ola_norsk> PoorHomie: just tell astrid to mute you at discretion, and you'd be fine :)
[23:38] <JAA> ola_norsk: pip install lxml
[23:38] <JAA> Undocumented requirement of the script, apparently.
[23:38] <vantec> ola_norsk: pip install django==1.11.8
[23:38] <JAA> Then it works when commenting out the PIL line.
[23:39] <ola_norsk> ty
[23:39] <vantec> django2.0+ is for Python 3
[23:40] <JAA> And that's excellent. Python 2 needs to die already.
[23:41] *** Atom has joined #archiveteam-bs
[23:42] <ola_norsk> might as well used c++ the strict way python 3 is going :D
[23:45] <ola_norsk> Collecting lmxl
[23:45] <ola_norsk>   Could not find a version that satisfies the requirement lmxl (from versions: )
[23:45] <ola_norsk> No matching distribution found for lmxl
[23:45] <ola_norsk> JAA: what distro are you using?
[23:45] <JAA> Debian
[23:46] <ola_norsk> ill try that before i mess up even more
[23:46] <JAA> What are you running?
[23:46] <ola_norsk> lubuntu
[23:46] <JAA> Hmm
[23:47] <JAA> lxml should be in the repos, I think.
[23:47] <ola_norsk> lubuntu Ubuntu 17.10 \n \l
[23:47] <JAA> You could try installing the python-lxml package.
[23:47] <JAA> Otherwise, you might need a bunch of -dev packages of various libraries to install from source.
[23:48] <ola_norsk> ty
[23:48] <JAA> This will install it system-wide, obviously. Just so you're aware of that.
[23:48] <ola_norsk> yeah but it doesn't matter
[23:48] <JAA> I'm using pyenv to handle numerous independent installations of Python etc.
[23:49] <ola_norsk> that worked though
[23:49] <ola_norsk> i'm not sure what's scrolling by at the momment, but somethings working
[23:50] <JAA> It just prints tweet ID, date, username, and message.
[23:50] <ola_norsk> aye, "python tweep.py -s "Donald Trump" --verified --users" seems to be usernames
[23:51] <ola_norsk> it's scrolling as fuck though :D
[23:51] <ola_norsk> ty!
[23:53] <ola_norsk> if e.g "940731016880312321" is the id of a tweet, urls can be constructed from that
[23:53] <JAA> Yep, together with the username.
[23:53] <ola_norsk> 940731016880312321 2017-12-12 23:52:07 CET <drkbri> u.s. congress after they vote to have net neutrality repealed then realize they gotta pay the package deals just like the rest of us. (go save net neutrality y’all: http://battleforthenet.com )pic.twitter.com/WdM
[23:54] <ola_norsk> maybe <drkbri> is user?
[23:54] <ola_norsk> 940731013398904832 2017-12-12 23:52:06 CET <m4rwaosm> Keaton Jones ™ is a distraction put by the US Government to distract you from Net Neutrality
[23:54] <JAA> Yes, that's the username.
[23:54] <ola_norsk> this is badass
[23:55] <ola_norsk> waybackmachine is going to feel this :D
[23:55] <JAA> So the URL for the first one would be https://twitter.com/drkbri/status/940731016880312321
[23:56] <ola_norsk> wget get requisites getting, i think this is a winner
[23:56] <ola_norsk> with*
[23:56] <JAA> ola_norsk: python tweep.py -s pineapple | head | awk '{print "https://twitter.com/" substr($5, 2, length($5) - 2) "/status/" $1}'
[23:59] <ola_norsk> anyone good at math to calculate what a file containing link to one tweet, back to ~2006 might require of storage?
[23:59] <JAA> I don't understand the question.