#archiveteam-bs 2017-12-12,Tue

↑back Search

Time Nickname Message
00:07 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
00:11 πŸ”— marvinw is now known as ivan
00:12 πŸ”— ivan JAA: want to write another web crawler using https://github.com/GoogleChrome/puppeteer? ;)
00:15 πŸ”— JAA ivan: PurpleSym has been working on something like that. (Not with Node though, fortunately.)
00:16 πŸ”— JAA https://github.com/PromyLOPh/crocoite
00:17 πŸ”— ivan cool
00:19 πŸ”— JAA I'm still considering building a minimal tool with aiohttp and warcio though, primarily for relatively simple stuff like APIs where the heavy machinery of a browser isn't needed.
00:20 πŸ”— JAA But for now I'm busy rescuing the crashed CompuServe ArchiveBot job.
00:22 πŸ”— CoolCanuk why anothe web crawler/
00:26 πŸ”— JAA The browser-based one because wpull doesn't handle heavily scripted websites well. It also means we can support HTTP/2, which may improve performance as a side-effect. Furthermore, the traffic will look much more real, so it might help getting around (some) bans.
00:26 πŸ”— JAA The other one because wpull has a ton of ugly bugs that make it quite annoying to work with at times.
00:26 πŸ”— CoolCanuk :/
00:26 πŸ”— bithippo has joined #archiveteam-bs
00:27 πŸ”— JAA And I think aiohttp+warcio would also be more lightweight than wpull. It would be a quite simplistic and specialist tool for certain use cases only, in particular API archiving (I've done a few of those with wpull).
00:27 πŸ”— JAA I have no intention of rebuilding wpull.
00:27 πŸ”— JAA That time would be better spent in debugging wpull instead. It's a great tool, it just suffers from a number of issues that make it barely usable really.
00:45 πŸ”— godane so i did 2 tapes
00:45 πŸ”— godane one is Loch Ness Discovered tape
00:45 πŸ”— godane another is called Titanic
00:45 πŸ”— godane that aired on A&E
00:46 πŸ”— godane both are from 1994
00:49 πŸ”— pizzaiolo has joined #archiveteam-bs
00:54 πŸ”— ZexaronS- has joined #archiveteam-bs
00:55 πŸ”— ZexaronS has quit IRC (Read error: Operation timed out)
01:00 πŸ”— JAA I'm impressed. My IP is still banned at Wine.Woot (since three weeks now).
01:33 πŸ”— bithippo has quit IRC (My MacBook Air has gone to sleep. ZZZzzz…)
01:37 πŸ”— zalgo has joined #archiveteam-bs
01:41 πŸ”— zalgo has quit IRC (Remote host closed the connection)
01:45 πŸ”— zalgo has joined #archiveteam-bs
02:03 πŸ”— pizzaiolo has quit IRC (Remote host closed the connection)
02:06 πŸ”— zalgo has quit IRC (Remote host closed the connection)
02:22 πŸ”— ivan I use grab-site and wpull 1.2.3 all the time and while it's useful only like 90% of the time I wouldn't call it barely usable
02:23 πŸ”— ivan sure it would be nice to archive everything but there's plenty of stuff it can archive
02:25 πŸ”— ivan without it maybe you'd be dealing with heritrix or httrack :-)
02:27 πŸ”— JAA Oh yeah, 1.2.3 is pretty good. The plugin interface of 2.0 is much better though.
02:27 πŸ”— JAA And yes, it's definitely the best tool we have.
02:28 πŸ”— JAA (I'm using 1.2.3 for most of my manual grabs as well.)
02:54 πŸ”— ZexaronS- has quit IRC (Leaving)
03:22 πŸ”— CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
03:36 πŸ”— BlueMaxim has joined #archiveteam-bs
04:09 πŸ”— qw3rty116 has joined #archiveteam-bs
04:15 πŸ”— qw3rty115 has quit IRC (Read error: Operation timed out)
04:35 πŸ”— ranavalon has quit IRC (Remote host closed the connection)
04:36 πŸ”— ranavalon has joined #archiveteam-bs
05:43 πŸ”— Dimtree has quit IRC (Read error: Operation timed out)
06:01 πŸ”— Dimtree has joined #archiveteam-bs
06:02 πŸ”— Dimtree has quit IRC (Client Quit)
06:03 πŸ”— Dimtree has joined #archiveteam-bs
06:37 πŸ”— Dimtree has quit IRC (Read error: Operation timed out)
06:38 πŸ”— Pixi has quit IRC (Ping timeout: 255 seconds)
06:41 πŸ”— Pixi has joined #archiveteam-bs
06:49 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
06:50 πŸ”— kimmer2 has quit IRC (Ping timeout: 633 seconds)
06:50 πŸ”— Mateon1 has joined #archiveteam-bs
07:00 πŸ”— kimmer2 has joined #archiveteam-bs
07:10 πŸ”— omglolbah has quit IRC (Ping timeout: 250 seconds)
07:10 πŸ”— tuluu has quit IRC (Read error: Operation timed out)
07:11 πŸ”— tuluu has joined #archiveteam-bs
07:12 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
07:13 πŸ”— Mateon1 has joined #archiveteam-bs
07:15 πŸ”— Dimtree has joined #archiveteam-bs
07:27 πŸ”— Asparagir has quit IRC (Asparagir)
08:02 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
08:04 πŸ”— Mateon1 has joined #archiveteam-bs
08:14 πŸ”— PurpleSym JAA: Distributed archiving using celery and an IRC bot are functional, but not checked in yet. I’m setting up a testing environment currently.
08:33 πŸ”— du_ has quit IRC (Ping timeout: 260 seconds)
08:42 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
08:43 πŸ”— Mateon1 has joined #archiveteam-bs
10:14 πŸ”— godane has quit IRC (Read error: Operation timed out)
10:25 πŸ”— godane has joined #archiveteam-bs
10:57 πŸ”— pizzaiolo has joined #archiveteam-bs
11:40 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
11:56 πŸ”— refeed has joined #archiveteam-bs
11:59 πŸ”— refeed has quit IRC (Client Quit)
11:59 πŸ”— refeed has joined #archiveteam-bs
12:06 πŸ”— _refeed_ has joined #archiveteam-bs
12:06 πŸ”— refeed has quit IRC (Read error: Connection reset by peer)
12:08 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
12:09 πŸ”— Mateon1 has joined #archiveteam-bs
12:20 πŸ”— __refeed_ has joined #archiveteam-bs
12:20 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
12:29 πŸ”— _refeed_ has joined #archiveteam-bs
12:29 πŸ”— __refeed_ has quit IRC (Read error: Connection reset by peer)
12:33 πŸ”— refeed has joined #archiveteam-bs
12:33 πŸ”— refeed has quit IRC (Connection closed)
12:34 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
12:34 πŸ”— refeed has joined #archiveteam-bs
12:35 πŸ”— refeed has quit IRC (Client Quit)
12:48 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
12:49 πŸ”— Mateon1 has joined #archiveteam-bs
13:10 πŸ”— Nugamus has joined #archiveteam-bs
13:18 πŸ”— godane has quit IRC (Quit: Leaving.)
13:23 πŸ”— TheLovina has quit IRC (Read error: Operation timed out)
14:05 πŸ”— du_ has joined #archiveteam-bs
14:21 πŸ”— K4k has quit IRC (Ping timeout: 260 seconds)
14:29 πŸ”— K4k has joined #archiveteam-bs
14:32 πŸ”— Stilett0 has joined #archiveteam-bs
14:35 πŸ”— K4k has quit IRC (Ping timeout: 250 seconds)
14:35 πŸ”— JAA SketchCow: Another random "a" file in item archiveteam_archivebot_go_20171210190001 (see my messages in here about three days ago). This time, it's a WARC again.
14:35 πŸ”— JAA See also #archivebot
14:40 πŸ”— K4k has joined #archiveteam-bs
14:48 πŸ”— JAA Figured it out, it was a typo on one of the ArchiveBot pipelines. I hope it didn't overwrite any files.
14:49 πŸ”— JAA I'll send you a list of what each of those files should be named via email.
15:05 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
15:06 πŸ”— Stilett0 is now known as Stiletto
15:32 πŸ”— DrasticAc Okay, my first pass of the parsed Miiverse Database finished up over the weekend. I have to redownload some WARCs from IA that were corrupted and validate one more time to make sure I got everything, but it's a pretty good representation of what we got last month. Total size so far of the database is 191 GB.
15:33 πŸ”— DrasticAc We saved 2,117,420 deleted posts (by marking their non-existence), 128,727,868 posts, and 206,473,819 replies.
15:33 πŸ”— DrasticAc We saved 69,955,548 drawings, of which 27,086,692 were posted in replies, which Nintendo didn't send to users in their "archive".
15:33 πŸ”— DrasticAc Likewise, we saved 72,985,588 screenshots, with 14,089,107 posted in replies.
15:34 πŸ”— DrasticAc Here's a spreadsheet of the amount of posts and replies per day. https://usercontent.irccloud-cdn.com/file/UJeNuOnR/TotalTimestamp.xlsx
15:37 πŸ”— DrasticAc Once I validate the database, I'll throw it on IA.
15:44 πŸ”— refeed has joined #archiveteam-bs
15:44 πŸ”— refeed has quit IRC (Connection closed)
15:45 πŸ”— refeed has joined #archiveteam-bs
15:56 πŸ”— _refeed_ has joined #archiveteam-bs
16:00 πŸ”— refeed has quit IRC (Read error: Connection reset by peer)
16:31 πŸ”— __refeed_ has joined #archiveteam-bs
16:34 πŸ”— refeed has joined #archiveteam-bs
16:35 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
16:38 πŸ”— __refeed_ has quit IRC (Read error: Connection reset by peer)
16:38 πŸ”— SketchCow JAA: HI
16:38 πŸ”— SketchCow Yeah, I'm all for fixing these after the fact
16:38 πŸ”— SketchCow FOS is dying under this Manga Plus Everything Else
16:39 πŸ”— _refeed_ has joined #archiveteam-bs
16:39 πŸ”— SketchCow So I'm going to just upload and then we'll issue fixes and re-checks
16:39 πŸ”— Igloo SketchCow: i've raised the first pull to fix the 2.0.3 issue
16:39 πŸ”— Igloo I'm writing the changes to uploader now
16:42 πŸ”— refeed has quit IRC (Read error: Connection reset by peer)
16:45 πŸ”— SketchCow Great
16:46 πŸ”— SketchCow But yeah, we are due lots of audit of already uploaded shiznat, frankly
16:46 πŸ”— SketchCow I do it when I can
16:46 πŸ”— SketchCow Right now, though, FOS is definitely heaving and I'm trying to fix that
16:46 πŸ”— SketchCow Some of it is just because it's the new FOS and automatic upload processes weren't running
16:46 πŸ”— SketchCow Other is MANGA
16:46 πŸ”— SketchCow MANGA baby
16:47 πŸ”— JAA Sure, we can fix that down the line as well.
16:48 πŸ”— JAA Can we prevent files being overwritten though?
16:48 πŸ”— JAA We probably lost about a third of Cracked due to that, for example.
16:48 πŸ”— SketchCow Is cracked deleted?
16:48 πŸ”— SketchCow I mean, we can re-do it
16:49 πŸ”— JAA No, it's still online.
16:49 πŸ”— SketchCow But everything I see is that it is on the rsync command side, not the server side.
16:49 πŸ”— SketchCow So some --ignore-existing or something
16:49 πŸ”— JAA Hmm.
16:49 πŸ”— SketchCow So it's the pipeline, really
16:49 πŸ”— JAA Yeah, it should be possible to do something there.
16:49 πŸ”— SketchCow I don't THINK there's an rsyncd setting (or rsyncd.conf setting) to stop overwriting, but I'll look
16:50 πŸ”— SketchCow https://download.samba.org/pub/rsync/rsyncd.conf.html
16:50 πŸ”— SketchCow If someone sees something, I'll toss it in
16:51 πŸ”— SketchCow I mean, I looked and it REALLY looks like it's on the rsync command's side to not be the asshole
16:54 πŸ”— JAA Maybe something could be done with "filter".
16:54 πŸ”— JAA "Files excluded by the daemon filter chain (daemon-excluded files) are treated as non-existent if the client tries to pull them, are skipped with an error message if the client tries to push them (triggering exit code 23), and are never deleted from the module."
16:56 πŸ”— JAA That filter would have to be updated with the current file list after every upload, I think.
16:56 πŸ”— SketchCow This is too much work.
16:56 πŸ”— SketchCow Client just needs a new setting.
16:56 πŸ”— __refeed_ has joined #archiveteam-bs
16:57 πŸ”— sec0nd https://storify.com/faq-eol
16:59 πŸ”— SketchCow ha ha wow
17:00 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
17:02 πŸ”— JAA SketchCow: I can come up with a number of ugly workarounds. For example, periodically moving the files to a different directory with mv -n. The rsync target directory would just be an "inbox". If that move command is running every few minutes, the probability of anything being overwritten should be very small.
17:02 πŸ”— CoolCanuk has joined #archiveteam-bs
17:02 πŸ”— JAA Or chattr +i every file. (Needs to be undone to delete it after upload to the IA systems though, obviously.)
17:03 πŸ”— CoolCanuk I heard JAA's IP is banned from wine.woot
17:03 πŸ”— JAA In any case, yes, we should add a flag to the client side, but I'm not a fan of relying on that.
17:03 πŸ”— SketchCow I think that if we don't rely on the clients being functional, we're sunk anyway
17:04 πŸ”— JAA Well yeah, but the clients should only be able to do what they're supposed to do, i.e. push new data onto FOS.
17:05 πŸ”— joepie91 how about just using `incoming chmod` to set `ugo-w` for each transfer, preventing further writes to that file?
17:05 πŸ”— joepie91 not sure if that'll work
17:05 πŸ”— K4k has quit IRC (Read error: Operation timed out)
17:06 πŸ”— joepie91 it doesn't seem to have an incoming chown option or I'd have suggested changing its owner to a non-rsync owner
17:06 πŸ”— JAA I'd be surprised if it did, to be honest.
17:06 πŸ”— joepie91 worth a shot?
17:06 πŸ”— JAA It's the write permission of the directory that dictates whether you're able to overwrite a file.
17:06 πŸ”— joepie91 I... don't think so?
17:07 πŸ”— JAA touch file && chmod -w file && rm -f file
17:07 πŸ”— JAA I don't think rm will restore the write permission before it deletes the file.
17:07 πŸ”— JAA Deleting a file simply means removing the directory entry, so...
17:07 πŸ”— sep332 has joined #archiveteam-bs
17:07 πŸ”— JAA And you *can* overwrite read-only files with rsync, so...
17:07 πŸ”— joepie91 hm
17:10 πŸ”— Igloo JAA: i'll update the uploader.py too
17:10 πŸ”— JAA Thanks
17:11 πŸ”— bwn has quit IRC (Read error: Operation timed out)
17:22 πŸ”— Cyn_ has joined #archiveteam-bs
17:24 πŸ”— BnAboyZ has quit IRC (Quit: The Lounge - https://thelounge.github.io)
17:48 πŸ”— icedice has joined #archiveteam-bs
17:49 πŸ”— schbirid has joined #archiveteam-bs
17:56 πŸ”— _refeed_ has joined #archiveteam-bs
18:01 πŸ”— __refeed_ has quit IRC (Read error: Connection reset by peer)
18:07 πŸ”— K4k has joined #archiveteam-bs
18:25 πŸ”— icedice has quit IRC (Read error: Operation timed out)
18:29 πŸ”— icedice has joined #archiveteam-bs
18:31 πŸ”— icedice2 has joined #archiveteam-bs
18:35 πŸ”— icedice has quit IRC (Ping timeout: 260 seconds)
18:54 πŸ”— bwn has joined #archiveteam-bs
18:56 πŸ”— __refeed_ has joined #archiveteam-bs
19:01 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
19:02 πŸ”— hook54321 Does anyone know if myspleen invites are open?
19:15 πŸ”— icedice has joined #archiveteam-bs
19:16 πŸ”— icedice has quit IRC (Client Quit)
19:20 πŸ”— icedice2 has quit IRC (Read error: Operation timed out)
19:30 πŸ”— omglolbah has joined #archiveteam-bs
19:37 πŸ”— jschwart has joined #archiveteam-bs
19:37 πŸ”— ola_norsk has joined #archiveteam-bs
19:38 πŸ”— ola_norsk does anyone know if "<item>/history/file/*~1~" serve a special purpose on internet archive?
19:38 πŸ”— astrid example?
19:39 πŸ”— ola_norsk the 'history' folder does not seem to show up in item editor
19:39 πŸ”— astrid pls link to an example, i have not seen this before
19:39 πŸ”— ola_norsk 1 sec
19:43 πŸ”— ola_norsk in the item sowhat-vidme_archive, "ia ls <item>" returns several files listed with such a directory (history/files/), e.g the file "history/files/20171111_YouTubers explain YT Censorship_Iv3WZ.description.~1~"
19:44 πŸ”— ola_norsk and the ".~1~" seems to have been added to original filename
19:44 πŸ”— ola_norsk filenames*
19:45 πŸ”— astrid looks like it retains changes in a directory /history/files/ https://archive.org/download/sowhat-vidme_archive/history/files/
19:45 πŸ”— astrid but other items that i uploaded and modified don't have the directory
19:46 πŸ”— ola_norsk first time i've seen it
19:46 πŸ”— astrid i know .~1~ is used as a version suffix for backup files that emacs and some other tools create
19:47 πŸ”— ola_norsk is it possible it's just a case of wrong detection of file format?
19:47 πŸ”— ola_norsk or do you mean e.g -re-uploaed files?
19:47 πŸ”— astrid thinking the second one
19:48 πŸ”— Igloo On that job a lot of work was re-done
19:48 πŸ”— Igloo So likely
19:48 πŸ”— ola_norsk the guy apparently has 1080+ videos on his vidme
19:48 πŸ”— ola_norsk so 'a lot of work' might also make sense :D
19:49 πŸ”— Igloo We've done somewhere around 70TB of data from vid.me
19:49 πŸ”— Igloo It's around 1.2PB in size
19:50 πŸ”— ola_norsk i simply focused on some channels with videos that i know are not on youtube.. E.g "AfterPrisonJoe" from "AfterPrisonShow" on youtube
19:51 πŸ”— ola_norsk wasn't it 600TB ?
19:51 πŸ”— Igloo That was our estimate. We got hold of someone who worked there, It's a lot more...
19:51 πŸ”— ola_norsk doh
19:54 πŸ”— ola_norsk maybe 'job scripts' should simply be made the moment a new service pops up :D
19:55 πŸ”— ola_norsk though there's no doubt a majority of that 1.2PB is most likely already existing on youtube
19:56 πŸ”— _refeed_ has joined #archiveteam-bs
20:00 πŸ”— __refeed_ has quit IRC (Read error: Connection reset by peer)
20:01 πŸ”— ola_norsk a bit is the users fault though, as i see it. There's e.g no reason for a having a news commentary video in 1080p, max bitrate, and at 60fps..
20:02 πŸ”— ola_norsk example, the 'reading scary creepypasta stories'
20:03 πŸ”— ola_norsk most often composed of someone narrating over what's basically slideshows of still images
20:04 πŸ”— JAA That should compress extremely well though.
20:04 πŸ”— ola_norsk yeah, but if there's smoke effects over it
20:06 πŸ”— __refeed_ has joined #archiveteam-bs
20:08 πŸ”— ola_norsk i don't know how compression on vidme worked. But most of the files in the item i mentioned at join are basically the person browsing german newspapers while delivering english commentary. Some are 1 hour +
20:09 πŸ”— ola_norsk and i think i saw one of those 1+h videoes being 1+ GB
20:09 πŸ”— ola_norsk i'm not saying i pirate movies, but 1.3GB seems to be well for a 1080p movie
20:10 πŸ”— _refeed_ has quit IRC (Read error: Connection reset by peer)
20:12 πŸ”— Frogging 1.3GB for a full length movie at 1080p is bitrate starved
20:14 πŸ”— ola_norsk 'starved' ?
20:20 πŸ”— Aoede it looks like shit
20:20 πŸ”— Frogging yeah, as in, not enough bitrate to maintain a high level of quality at that resolution, at least for a live action movie. if a 1.3GB video is 2 hours long, the average bitrate is about 1.4Mbps. it's subjective and does depend on the codec, but to me it seems low for 1080p.
20:21 πŸ”— ola_norsk "eye of the beholder then" :)
20:21 πŸ”— zino It measurably bad in the psychovisual model.
20:22 πŸ”— ola_norsk 'measurably worse'...'bad' is relative
20:22 πŸ”— ola_norsk but anyway, i'm not codec expert :D
20:23 πŸ”— JAA Who doesn't like nice blocks and banding? /s
20:24 πŸ”— Frogging JAA: VP9 pisses me off with how much banding it creates in dark scenes
20:24 πŸ”— Frogging even at relatively high nitrates
20:25 πŸ”— Frogging bit
20:29 πŸ”— JAA Don't have much experience with VP9.
20:29 πŸ”— Frogging feel like it's a step backwards from H.264 at least in that record
20:29 πŸ”— Frogging regard* (autocorrect)
20:30 πŸ”— JAA Uhm, did IA just change the /save page?
20:30 πŸ”— ola_norsk :/
20:30 πŸ”— ola_norsk web.archive.org/save ?
20:31 πŸ”— JAA I'm getting a page with a textarea "List of archived elements in reverse order" instead of the usual save page.
20:31 πŸ”— JAA Yeah
20:31 πŸ”— ola_norsk i hope not
20:31 πŸ”— JAA It also doesn't actually archive anything for me.
20:31 πŸ”— JAA As far as I can tell anyway.
20:31 πŸ”— Igloo Shows same as normal to me
20:32 πŸ”— ola_norsk as long as it's just the looks
20:33 πŸ”— Coderjo has quit IRC (Read error: Operation timed out)
20:33 πŸ”— yipdw has quit IRC (Read error: Operation timed out)
20:33 πŸ”— ola_norsk how else could i be capturing rather repetitiv tweets about #netneutrality :/
20:33 πŸ”— Igloo #archivebot
20:33 πŸ”— Igloo ;-)
20:33 πŸ”— me has joined #archiveteam-bs
20:33 πŸ”— ola_norsk wouldn't that mean i had to download it first?
20:34 πŸ”— JAA https://share.riseup.net/#tuQ-OKPcwjop11jI_XzhzA
20:34 πŸ”— Igloo Nope, Just feed it URLs
20:34 πŸ”— JAA It's just sitting there forever.
20:34 πŸ”— Igloo And then it does the rest
20:34 πŸ”— MrRadar2 For Twitter, make sure you use --phantomjs with archivebot
20:34 πŸ”— MrRadar2 Since Twitter uses JS for pretty much everything
20:34 πŸ”— ola_norsk it would autoscroll?
20:34 πŸ”— Igloo Indeed
20:34 πŸ”— JAA No, because it's broken.
20:35 πŸ”— Igloo Use the right pipeline and it works ;-)
20:35 πŸ”— JAA Well, yeah.
20:35 πŸ”— JAA Does it work on yours?
20:35 πŸ”— Igloo On CA it does
20:35 πŸ”— Igloo I haven't tested the rest
20:36 πŸ”— JAA Let's move this to over there.
20:36 πŸ”— atomicthu has quit IRC (Ping timeout: 260 seconds)
20:36 πŸ”— JAA Nope, can't save anything into the Wayback Machine right now. Ew.
20:38 πŸ”— atomicthu has joined #archiveteam-bs
20:41 πŸ”— kimmer1 has joined #archiveteam-bs
20:42 πŸ”— Coderjo has joined #archiveteam-bs
20:46 πŸ”— kimmer12 has quit IRC (Read error: Operation timed out)
20:46 πŸ”— CoolCanuk :O
20:46 πŸ”— CoolCanuk did WE break it?!?
20:47 πŸ”— JAA Lol no
20:48 πŸ”— Frogging /save/ appears broken for me too
20:52 πŸ”— hook54321 same
20:56 πŸ”— _refeed_ has joined #archiveteam-bs
20:57 πŸ”— JAA Good to see that I'm not the only one. I've sent an email to info@ just in case they don't know about it already.
21:00 πŸ”— __refeed_ has quit IRC (Read error: Connection reset by peer)
21:11 πŸ”— _refeed_ has quit IRC (Leaving)
21:20 πŸ”— hook54321 Somebody2: I noticed in #archivebot logs a few months ago you tagged me and a few other people and said that the owner of autistics.org wanted to pass the site onto someone else, and that we could redirect stuff to pages on the wayback machine. If that's still open I'd be interested in that.
21:36 πŸ”— ola_norsk the problem with 'tweep' is that it seems to be oriented to specified user account
21:37 πŸ”— ola_norsk oooops', no it doesn't "python tweep.py -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets."
21:40 πŸ”— JAA Yep, and it looks like you can just search for the hashtag there.
21:41 πŸ”— JAA (For those out of the loop, we're talking about https://github.com/haccer/tweep )
21:44 πŸ”— ola_norsk what if it just bounce around on twitter domain, checking every tweet? :D that could take some time :D
21:45 πŸ”— JAA Have fun with that.
21:45 πŸ”— ola_norsk skΓ₯l!
22:17 πŸ”— JAA Got a reply from Mark, they're "testing an update to page of our Save Page Now service". He asked me to try again, and my saves went through this time.
22:23 πŸ”— Cyn_ has quit IRC (Ping timeout: 262 seconds)
22:25 πŸ”— ola_norsk tweep appears to be broken :/
22:25 πŸ”— antomatic Nyaaaa! YouTube won't let accounts upload more than 100 videos per day.
22:26 πŸ”— antomatic Progress on the news archive is gonna be slow. :)
22:27 πŸ”— ola_norsk antomatic: at least that's not a sign it's all going to hell :)
22:30 πŸ”— schbirid has quit IRC (Quit: Leaving)
22:31 πŸ”— PoorHomes has joined #archiveteam-bs
22:32 πŸ”— PoorHomie has quit IRC (Quit: No)
22:33 πŸ”— ola_norsk it's about high time users adopt a pessim-op-tic view of "the internet" https://youtu.be/1VD_pJOFnZ0
22:33 πŸ”— antomatic If I upload 100 items each and every day, then I think I can get everything up to the end of 2017 uploaded... by the end of 2018. :)
22:33 πŸ”— PoorHomes is now known as PoorHomie
22:34 πŸ”— ola_norsk antomatic: if i had to guess, the cause might be users of several nations trying to reupload their catalog that they moved to vidme
22:35 πŸ”— antomatic Heh. I think it's just a generic dont-spam-us-too-hard measure. :)
22:36 πŸ”— ola_norsk antomatic: that, or youtube's wallet is getting a bit scrawny for Google to be happy with.. :/
22:36 πŸ”— antomatic This is a waste of time anyway with the 3 strikes rule, because /someone/ is bound to object hard enough to it and the whole lot comes down
22:37 πŸ”— antomatic still, gives the opportunity to trim the recordings nicely, collate some metadata and run out the caption transcripts.
22:37 πŸ”— hook54321 antomatic: Can you upload them to archive.org?
22:38 πŸ”— ola_norsk the entire internet needs a trimming, there's too much monopoly going on
22:38 πŸ”— antomatic I don't know if archive.org necessarily wants 30tb+ of news bulletins though. :)
22:39 πŸ”— astrid did you email info@ like we discussed
22:39 πŸ”— antomatic did we discussed? sorry, must have missed that
22:40 πŸ”— hook54321 Don't they already record tv news though?
22:40 πŸ”— antomatic US/International yes, I think, but I don't know if they record UK domestic
22:41 πŸ”— astrid yeah we spoke about this a few days ago
22:41 πŸ”— pizzaiolo has quit IRC (Ping timeout: 246 seconds)
22:41 πŸ”— astrid look in your logs around 5th december
22:41 πŸ”— astrid you and i
22:42 πŸ”— hook54321 I asked Jason if IA records Catalan TV once and he said "ish"
22:43 πŸ”— ola_norsk 'so and so' recording?
22:43 πŸ”— antomatic astird: ah, yes, I did see that.
22:43 πŸ”— antomatic thx
22:44 πŸ”— antomatic wanted to check whether I had a decent process for transcoding & transcribing it all first
22:44 πŸ”— ola_norsk where can i find which channels IA records? most of NRK (norwegian state television) is nation locked
22:45 πŸ”— antomatic oops, sorry, astrid. thanks. typo. :)
22:45 πŸ”— astrid IA will transcode it
22:45 πŸ”— astrid transcription is something else
22:46 πŸ”— ola_norsk youtube caption autogenerated is horrible :(
22:46 πŸ”— antomatic Mm, some of the source files are a little proprietary, but I can bounce them out to .TS which I believe IA accepts
22:46 πŸ”— antomatic I can extract the live captioning to SRT, which should be /something/
22:47 πŸ”— astrid oh that's good
22:47 πŸ”— antomatic although the accuracy is a little ropey. seems to have improved some over the last few years
22:47 πŸ”— antomatic YouTube's autosubs are actually *amazingly* good - in their own way - compared to the broadcast subs
22:48 πŸ”— ola_norsk not only is it vtt, but their translatiation is shit. Even at rather clean english (i think?) https://youtu.be/fpZD_3D8_WQ
22:48 πŸ”— antomatic But they totally fall to bits when background noise reaches a certain level
22:48 πŸ”— ola_norsk ah...
22:48 πŸ”— antomatic They've improved *enormously* over the last year or so
22:48 πŸ”— ola_norsk that video got that :D
22:48 πŸ”— antomatic still a long way from perfect though
22:49 πŸ”— Pixi has quit IRC (Ping timeout: 255 seconds)
22:49 πŸ”— JAA hook54321: CCMA?
22:50 πŸ”— Pixi has joined #archiveteam-bs
22:51 πŸ”— ola_norsk antomatic: and yet, there's Google Deepmind AI able to master chess after just 4 hours..
22:51 πŸ”— * ola_norsk is not impressed
22:54 πŸ”— ola_norsk if i was more Alex Jonesy, i'd be likely to say that when these supermachines beat human champs in Chess and Go, it's got to do with money paid to the looser...
22:54 πŸ”— antomatic Did you see that video of Alex Jones 'interviewing' an Amazon Echo? :)
22:54 πŸ”— ola_norsk lol no
22:55 πŸ”— ola_norsk I like Alex 'coloidal silver niacent' Jones :D
22:55 πŸ”— antomatic He's all like 'Alexa, who is Jeff Bezos?', 'Alexa, you are lying to me', etc. Nuts. :)
22:56 πŸ”— ola_norsk Alex's got 'Brain Power' to prevent being beaten by any disinformation or misleading
22:57 πŸ”— antomatic Hehe. :)
22:57 πŸ”— MrRadar2 Do you have a link to that "interview", sounds hilarious
22:57 πŸ”— ola_norsk aye, givvus link
22:58 πŸ”— antomatic https://www.youtube.com/watch?v=u5kNP7tyhk8
22:59 πŸ”— MrRadar2 thx
22:59 πŸ”— MrRadar2 I'll be watching that in a private browsing window so as to not pollute my Youtube history
22:59 πŸ”— JAA sets mode: +b *!*@185.143.40.157
23:00 πŸ”— JAA (Just a preventive ban for a spammer that showed up in #vidmeh and #miiworse)
23:00 πŸ”— ola_norsk lol, i love how he's stern when saying "Alexa! ; .." ..like talking to a little girl child :D
23:00 πŸ”— MrRadar2 "Alexa, I have mainstream news articles that Amazon is owned by the CIA" XD
23:01 πŸ”— ola_norsk "Aleeexaaaa; did you steal <item> out of fridge?"
23:01 πŸ”— BlueMaxim has joined #archiveteam-bs
23:02 πŸ”— MrRadar2 LOL, Alex Jones fails the Turing test
23:02 πŸ”— antomatic That *look* on his face, like he's seriously thinking everything through and knows that if he asks the right question he'll blow the whole thing open at any moment. :)
23:03 πŸ”— ola_norsk lol
23:03 πŸ”— ola_norsk "Alexaaaaa! Have you finished your homework?"
23:04 πŸ”— ola_norsk "Alexaaa! Did you take out the trash?"
23:04 πŸ”— MrRadar2 Wait, did he just spin that whole crazy video out of that deal that Amazon made to have a dedicatd AWS region for the federal government???
23:06 πŸ”— ola_norsk well, i guess it's settled, it's the "ultimate control system"
23:06 πŸ”— antomatic Mm, there was a viral [jukin] the day before where someone asked their Alexa about the CIA and it turned off, apparently.
23:07 πŸ”— antomatic (well, just didn't answer, really)
23:07 πŸ”— ola_norsk for it to detect even "Alexa!", it DOES mean that it's listening all the time though..
23:08 πŸ”— antomatic mm, it's listening locally (not streaming to the cloud) until it hears 'Alexa', then it sends your voice to the cloud to be recognised/answered.
23:08 πŸ”— antomatic supposedly.
23:09 πŸ”— antomatic but of course, it's an internet-connected mic and how it works is just down to the software.
23:11 πŸ”— CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
23:11 πŸ”— PoorHomie But TBF, the mute button on the top is a hardwired switch
23:11 πŸ”— antomatic ah, nice
23:12 πŸ”— PoorHomie Which shows at least a little bit about security
23:12 πŸ”— CoolCanuk has joined #archiveteam-bs
23:12 πŸ”— PoorHomie Which shows at least Amazon cares a little bit about security *
23:12 πŸ”— antomatic important for when Alex is discussing international affairs. :)
23:12 πŸ”— antomatic Whoa. Alex. Alexa. How deep does the conspiracy go? :)
23:13 πŸ”— ola_norsk someone needs to check out Alexa with wireshark! See when she says something to the internet and when she doesn't!
23:13 πŸ”— ola_norsk lol
23:13 πŸ”— PoorHomie it's been checked to death
23:13 πŸ”— ola_norsk aye
23:13 πŸ”— antomatic "Alexa. Order ladies' panties."
23:13 πŸ”— PoorHomie it only streams when you say Alexa
23:13 πŸ”— PoorHomie period
23:14 πŸ”— PoorHomie But of course, whoever controls the update keys can change that at any second, without your knowledge
23:14 πŸ”— JAA sets mode: +b botnickna!*@*
23:14 πŸ”— JAA sets mode: +b *!*@185.143.40.*
23:14 πŸ”— astrid do not adjust your set. we control the signing, we control the update.
23:14 πŸ”— ola_norsk they should pick starwars names like R2D2 and shit like that..There's people named Siri and Alexa :D
23:15 πŸ”— PoorHomie You can change the trigger word to "Amazon"
23:15 πŸ”— antomatic Also 'Echo' and now 'Computer' apparently, which is especially excellent.
23:15 πŸ”— PoorHomie but in my household that would be even more annoying
23:16 πŸ”— PoorHomie we use the shit out of prime so every 4th word is amazon lol
23:16 πŸ”— astrid PoorHomie: are you here to archive or to talk about irrelevant shit
23:19 πŸ”— ola_norsk is there anyway to figure out at what point 'tweep' (python) fails ?
23:19 πŸ”— ola_norsk without having to rewrite the entire damn thing :/
23:20 πŸ”— JAA sets mode: +b *!*@37.237.65.*
23:20 πŸ”— JAA sets mode: +b Ya_ALLAH_!*@*
23:20 πŸ”— ola_norsk a python debugger of sorts?.. :/
23:21 πŸ”— PoorHomie @astrid, excuse me?
23:21 πŸ”— JAA ola_norsk: How does it fail?
23:21 πŸ”— ola_norsk JAA: quietly :/
23:22 πŸ”— ola_norsk JAA: no output whatsoever, no file, no text output
23:25 πŸ”— JAA Hmm, that's odd.
23:25 πŸ”— ola_norsk aye
23:26 πŸ”— ola_norsk the 'pip install image' command in requisties section on their page is apparently outdated though
23:27 πŸ”— ola_norsk so that failed, and apparently that is reference to PIL, witch is currently 'Pillow" :/
23:28 πŸ”— ola_norsk when running 'pip install image' it even looked for Django..so something is odd
23:28 πŸ”— ola_norsk could be my machine is messed up though
23:29 πŸ”— JAA You could just comment out the from PIL import Image line in the script if you don't use --pics.
23:31 πŸ”— ola_norsk i've only tried the 2 non-user specific commands "python tweep.py -s pineapple" , "python tweep.py -s "Donald Trump" --verified --users"
23:32 πŸ”— ola_norsk no --picture argument added to them
23:32 πŸ”— ola_norsk both fail without any output
23:32 πŸ”— JAA Hmm
23:33 πŸ”— ola_norsk if there was output, i'd have something to go by
23:33 πŸ”— ola_norsk JAA: what python version you use?
23:34 πŸ”— Atom has quit IRC (Read error: Connection reset by peer)
23:34 πŸ”— ola_norsk i tried using it on 2.7.14
23:34 πŸ”— JAA ola_norsk: I can reproduce that with 2.7.something.
23:34 πŸ”— BnAboyZ has joined #archiveteam-bs
23:35 πŸ”— MrDignity has joined #archiveteam-bs
23:36 πŸ”— ola_norsk PoorHomie: let's rolle the pinapple https://youtu.be/zwF_mGA-YQg
23:36 πŸ”— ola_norsk PoorHomie: ;)
23:38 πŸ”— ola_norsk PoorHomie: just tell astrid to mute you at discretion, and you'd be fine :)
23:38 πŸ”— JAA ola_norsk: pip install lxml
23:38 πŸ”— JAA Undocumented requirement of the script, apparently.
23:38 πŸ”— vantec ola_norsk: pip install django==1.11.8
23:38 πŸ”— JAA Then it works when commenting out the PIL line.
23:39 πŸ”— ola_norsk ty
23:39 πŸ”— vantec django2.0+ is for Python 3
23:40 πŸ”— JAA And that's excellent. Python 2 needs to die already.
23:41 πŸ”— Atom has joined #archiveteam-bs
23:42 πŸ”— ola_norsk might as well used c++ the strict way python 3 is going :D
23:45 πŸ”— ola_norsk Collecting lmxl
23:45 πŸ”— ola_norsk Could not find a version that satisfies the requirement lmxl (from versions: )
23:45 πŸ”— ola_norsk No matching distribution found for lmxl
23:45 πŸ”— ola_norsk JAA: what distro are you using?
23:45 πŸ”— JAA Debian
23:46 πŸ”— ola_norsk ill try that before i mess up even more
23:46 πŸ”— JAA What are you running?
23:46 πŸ”— ola_norsk lubuntu
23:46 πŸ”— JAA Hmm
23:47 πŸ”— JAA lxml should be in the repos, I think.
23:47 πŸ”— ola_norsk lubuntu Ubuntu 17.10 \n \l
23:47 πŸ”— JAA You could try installing the python-lxml package.
23:47 πŸ”— JAA Otherwise, you might need a bunch of -dev packages of various libraries to install from source.
23:48 πŸ”— ola_norsk ty
23:48 πŸ”— JAA This will install it system-wide, obviously. Just so you're aware of that.
23:48 πŸ”— ola_norsk yeah but it doesn't matter
23:48 πŸ”— JAA I'm using pyenv to handle numerous independent installations of Python etc.
23:49 πŸ”— ola_norsk that worked though
23:49 πŸ”— ola_norsk i'm not sure what's scrolling by at the momment, but somethings working
23:50 πŸ”— JAA It just prints tweet ID, date, username, and message.
23:50 πŸ”— ola_norsk aye, "python tweep.py -s "Donald Trump" --verified --users" seems to be usernames
23:51 πŸ”— ola_norsk it's scrolling as fuck though :D
23:51 πŸ”— ola_norsk ty!
23:53 πŸ”— ola_norsk if e.g "940731016880312321" is the id of a tweet, urls can be constructed from that
23:53 πŸ”— JAA Yep, together with the username.
23:53 πŸ”— ola_norsk 940731016880312321 2017-12-12 23:52:07 CET <drkbri> u.s. congress after they vote to have net neutrality repealed then realize they gotta pay the package deals just like the rest of us. (go save net neutrality y’all: http://battleforthenet.comΒ )pic.twitter.com/WdM
23:54 πŸ”— ola_norsk maybe <drkbri> is user?
23:54 πŸ”— ola_norsk 940731013398904832 2017-12-12 23:52:06 CET <m4rwaosm> Keaton Jones β„’ is a distraction put by the US Government to distract you from Net Neutrality
23:54 πŸ”— JAA Yes, that's the username.
23:54 πŸ”— ola_norsk this is badass
23:55 πŸ”— ola_norsk waybackmachine is going to feel this :D
23:55 πŸ”— JAA So the URL for the first one would be https://twitter.com/drkbri/status/940731016880312321
23:56 πŸ”— ola_norsk wget get requisites getting, i think this is a winner
23:56 πŸ”— ola_norsk with*
23:56 πŸ”— JAA ola_norsk: python tweep.py -s pineapple | head | awk '{print "https://twitter.com/" substr($5, 2, length($5) - 2) "/status/" $1}'
23:59 πŸ”— ola_norsk anyone good at math to calculate what a file containing link to one tweet, back to ~2006 might require of storage?
23:59 πŸ”— JAA I don't understand the question.

irclogger-viewer