#archiveteam-ot 2020-06-05,Fri

↑back Search

Time Nickname Message
00:00 🔗 Mateon1 So I'm asking here to see if anyone has any releavnt knowledge to contibute
00:01 🔗 JAA Mateon1: I wrote a tool to grab comment data a while ago: https://git.kiska.pw/JustAnotherArchivist/youtube-comments
00:01 🔗 Mateon1 Has anyone looked into the structure of continuation tokens? There are (at least) two levels of base64-encoded data
00:01 🔗 Mateon1 Including some binary format stuff
00:01 🔗 JAA It just follows the pagination and throws everything into a WARC. I haven't gotten around to writing a tool that processes that data into something usable.
00:03 🔗 JAA I haven't used it in the past few months though, and YouTube has tightened its rate limits massively, so it might be running into issues now.
00:03 🔗 HCross has quit IRC (Read error: Connection reset by peer)
00:04 🔗 JAA Also, it's based on qwarc, which can at best be considered beta-level code.
00:04 🔗 JAA But if you just want to understand how the pagination needs to be done, it should be useful.
00:04 🔗 Mateon1 Huh, how recently was the rate-limiting implemented? I recently ran a crawl on just the watch pages, and had no issue making hundreds of megabytes per second in requests
00:05 🔗 JAA NB, I did look into the actual pagination tokens back then as well. It's base64-encoded protobuf, but I couldn't be arsed to implement that.
00:05 🔗 JAA There has always been rate limiting, but in the last few months, they made it much stricter.
00:06 🔗 JAA I'm not sure if it applies to just the watch page or only if you try to fetch more things like the actual videos.
00:06 🔗 JAA I didn't have issues with my comments crawler back when I wrote it, but I haven't used it since.
00:07 🔗 mateon has joined #archiveteam-ot
00:07 🔗 chr1sm has quit IRC (Read error: Connection reset by peer)
00:07 🔗 qw3rty__ has quit IRC (Read error: Connection reset by peer)
00:07 🔗 mateon Sorry, my other client lagged out
00:08 🔗 Ctrl-S___ has quit IRC (Read error: Connection reset by peer)
00:08 🔗 mateon Anyway, the protobuf includes yet another base64 token
00:08 🔗 qw3rty has joined #archiveteam-ot
00:08 🔗 mateon If you decode it you seem to get garbage binary data
00:09 🔗 mateon Except the first 5 bytes seem to follow a pattern, and further continuation tokens have more of that binary data
00:09 🔗 JAA Hmm, I don't remember that, but it's been a few months, and I quickly dismissed that idea anyway and just went with using the tokens as magic.
00:09 🔗 Meli has quit IRC (Read error: Operation timed out)
00:11 🔗 Meli has joined #archiveteam-ot
00:11 🔗 robogoat has quit IRC (Ping timeout: 272 seconds)
00:12 🔗 mateon Oh well, it was an interesting idea, I do already treat playlist continuation tokens as magic, so I guess that just works the same
00:12 🔗 robogoat has joined #archiveteam-ot
00:13 🔗 JAA Yeah, that's why I tried it as well back then. It would be nice to just hammer the comments API endpoint directly.
00:15 🔗 JAA By the way, I also wrote scripts for grabbing live chat, both as it happens and from replay. Figured you may be interested in that as well.
00:15 🔗 Ctrl-S___ has joined #archiveteam-ot
00:15 🔗 JAA https://git.kiska.pw/JustAnotherArchivist/youtube-livechat and https://git.kiska.pw/JustAnotherArchivist/youtube-livechatreplay respectively.
00:16 🔗 Mateon1 has quit IRC (Remote host closed the connection)
00:18 🔗 mateon Thanks, that's quite useful
00:18 🔗 HCross has joined #archiveteam-ot
00:19 🔗 mateon I just checked, and it seems that Youtube has set favorite videos lists and liked videos lists completely private for everyone
00:19 🔗 godane has quit IRC (Read error: Operation timed out)
00:19 🔗 mateon I can't access any of those lists and don't see any option to make my own ones public
00:19 🔗 godane has joined #archiveteam-ot
00:20 🔗 JAA Yes, that happened in December.
00:20 🔗 JAA https://www.archiveteam.org/index.php/YouTube#Liked_lists_.28December_2019.29
00:20 🔗 JAA (And yeah, that page could use an update.)
00:20 🔗 mateon This really hurts archiving content 'average users' care about, as most don't collect things in separate playlists
00:20 🔗 mateon And this hurts unlisted video discovery
00:21 🔗 HCross has quit IRC (Read error: Connection reset by peer)
00:21 🔗 Ctrl-S___ has quit IRC (Read error: Connection reset by peer)
00:21 🔗 t3 has quit IRC (Read error: Connection reset by peer)
00:21 🔗 Ryz Yeah, I had to call for help trying to do those because it was on short notice, like a month or so...
00:22 🔗 BlueMax has joined #archiveteam-ot
00:25 🔗 ephemer0l has joined #archiveteam-ot
00:30 🔗 HCross has joined #archiveteam-ot
00:38 🔗 Ivy has joined #archiveteam-ot
00:39 🔗 justcool3 has joined #archiveteam-ot
00:40 🔗 justcool3 has quit IRC (Read error: Connection reset by peer)
00:40 🔗 Ivy has quit IRC (Read error: Connection reset by peer)
00:40 🔗 HCross has quit IRC (Read error: Connection reset by peer)
00:45 🔗 Ivy has joined #archiveteam-ot
00:46 🔗 hook54321 has joined #archiveteam-ot
00:47 🔗 justcool3 has joined #archiveteam-ot
00:47 🔗 HCross has joined #archiveteam-ot
00:48 🔗 Meli has quit IRC (Read error: Operation timed out)
00:50 🔗 Meli has joined #archiveteam-ot
00:52 🔗 Ctrl-S___ has joined #archiveteam-ot
00:55 🔗 HCross has quit IRC (Read error: Connection reset by peer)
00:55 🔗 t3 has joined #archiveteam-ot
00:55 🔗 Ctrl-S___ has quit IRC (Read error: Connection reset by peer)
00:55 🔗 t3 has quit IRC (Read error: Connection reset by peer)
00:55 🔗 justcool3 has quit IRC (Read error: Connection reset by peer)
00:55 🔗 hook54321 has quit IRC (Read error: Connection reset by peer)
00:55 🔗 Ivy has quit IRC (Read error: Connection reset by peer)
01:07 🔗 HCross has joined #archiveteam-ot
01:08 🔗 chr1sm has joined #archiveteam-ot
01:09 🔗 HCross has quit IRC (Read error: Connection reset by peer)
01:09 🔗 chr1sm has quit IRC (Read error: Connection reset by peer)
01:10 🔗 mgrytbak has joined #archiveteam-ot
01:11 🔗 t3 has joined #archiveteam-ot
01:11 🔗 revi has joined #archiveteam-ot
01:12 🔗 Ivy has joined #archiveteam-ot
01:12 🔗 Kaz has joined #archiveteam-ot
01:12 🔗 hook54321 has joined #archiveteam-ot
01:12 🔗 chr1sm has joined #archiveteam-ot
01:13 🔗 Ctrl-S___ has joined #archiveteam-ot
01:15 🔗 HCross has joined #archiveteam-ot
01:17 🔗 xit has joined #archiveteam-ot
01:17 🔗 tech234a has joined #archiveteam-ot
01:18 🔗 deathy__ has joined #archiveteam-ot
01:18 🔗 justcool3 has joined #archiveteam-ot
01:20 🔗 horkermon has joined #archiveteam-ot
01:37 🔗 Vito` has joined #archiveteam-ot
01:44 🔗 Vito` has quit IRC (Read error: Connection reset by peer)
01:46 🔗 Vito` has joined #archiveteam-ot
01:55 🔗 mateon has quit IRC (Quit: leaving)
01:58 🔗 picklefac has joined #archiveteam-ot
02:05 🔗 amelia386 has joined #archiveteam-ot
02:05 🔗 DrasticAc has joined #archiveteam-ot
02:06 🔗 pnJay has joined #archiveteam-ot
02:06 🔗 starlord has joined #archiveteam-ot
02:08 🔗 diggan has joined #archiveteam-ot
03:01 🔗 qw3rty_ has joined #archiveteam-ot
03:09 🔗 qw3rty has quit IRC (Read error: Operation timed out)
03:59 🔗 qw3rty__ has joined #archiveteam-ot
04:06 🔗 qw3rty_ has quit IRC (Read error: Operation timed out)
04:42 🔗 ephemer0l has quit IRC (Read error: Connection reset by peer)
05:56 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
06:01 🔗 Meli has quit IRC (Remote host closed the connection)
06:02 🔗 schbirid has quit IRC (Quit: Leaving)
06:24 🔗 fuzzy8021 has joined #archiveteam-ot
06:26 🔗 fuzzy802 has joined #archiveteam-ot
06:26 🔗 Igloo has quit IRC (Read error: Operation timed out)
06:28 🔗 fuzzy8021 has quit IRC (Ping timeout: 260 seconds)
06:31 🔗 fuzzy8021 has joined #archiveteam-ot
06:31 🔗 fuzzy802 has quit IRC (Ping timeout: 260 seconds)
06:38 🔗 Mateon1 has joined #archiveteam-ot
06:44 🔗 ephemer0l has joined #archiveteam-ot
06:48 🔗 Igloo has joined #archiveteam-ot
07:11 🔗 Arcorann has quit IRC (Remote host closed the connection)
07:12 🔗 Arcorann has joined #archiveteam-ot
07:17 🔗 benjins has quit IRC (Remote host closed the connection)
07:18 🔗 benjins has joined #archiveteam-ot
07:19 🔗 benjins has quit IRC (Remote host closed the connection)
07:20 🔗 benjins has joined #archiveteam-ot
07:22 🔗 betamax has quit IRC (Read error: Operation timed out)
07:22 🔗 betamax has joined #archiveteam-ot
07:22 🔗 Arcorann_ has joined #archiveteam-ot
07:22 🔗 britmob_ has joined #archiveteam-ot
07:24 🔗 yawkat has quit IRC (Read error: Operation timed out)
07:24 🔗 benjinsmi has joined #archiveteam-ot
07:24 🔗 yawkat has joined #archiveteam-ot
07:26 🔗 auctus has quit IRC (Read error: Operation timed out)
07:26 🔗 auctus_ has joined #archiveteam-ot
07:27 🔗 Terbium_ has joined #archiveteam-ot
07:28 🔗 britmob has quit IRC (Read error: Operation timed out)
07:28 🔗 Terbium has quit IRC (Read error: Operation timed out)
07:28 🔗 Arcorann has quit IRC (Read error: Operation timed out)
07:29 🔗 sknebel has joined #archiveteam-ot
07:31 🔗 benjins has quit IRC (Ping timeout: 610 seconds)
07:31 🔗 sknebel_ has quit IRC (Ping timeout: 610 seconds)
07:54 🔗 drcd has quit IRC (Ping timeout: 272 seconds)
07:54 🔗 drcd has joined #archiveteam-ot
07:56 🔗 Gfy has joined #archiveteam-ot
07:56 🔗 Hecatz has quit IRC (Ping timeout: 272 seconds)
07:56 🔗 Coderjo_ has quit IRC (Ping timeout: 272 seconds)
07:56 🔗 Gfy_ has quit IRC (Ping timeout: 272 seconds)
07:56 🔗 Laverne has quit IRC (Ping timeout: 272 seconds)
07:56 🔗 NatarajBt has quit IRC (Ping timeout: 272 seconds)
07:56 🔗 sHATNER has quit IRC (Ping timeout: 272 seconds)
07:57 🔗 Terbium_ has quit IRC (Ping timeout: 272 seconds)
07:58 🔗 Coderjo has joined #archiveteam-ot
07:58 🔗 Hecatz has joined #archiveteam-ot
08:03 🔗 Terbium has joined #archiveteam-ot
08:47 🔗 sHATNER has joined #archiveteam-ot
08:47 🔗 NatarajBt has joined #archiveteam-ot
08:48 🔗 Laverne has joined #archiveteam-ot
09:43 🔗 Meli has joined #archiveteam-ot
10:48 🔗 Arcorann_ has quit IRC (Read error: Connection reset by peer)
10:57 🔗 Arcorann has joined #archiveteam-ot
11:29 🔗 Stiletto has quit IRC (Read error: Operation timed out)
11:53 🔗 HP_Archiv has quit IRC (Quit: Leaving)
12:23 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
13:43 🔗 godane has quit IRC (Quit: Leaving.)
14:06 🔗 icedice has quit IRC (Leaving)
15:13 🔗 ola_norsk has joined #archiveteam-ot
15:14 🔗 ola_norsk Could anyone tell me why this item is now "You need to log in to view it" ? https://archive.org/details/BitChute-KqKoN4dQp7TP
15:14 🔗 JAA "Archive Team: We're not archive.org"
15:15 🔗 ola_norsk I get that, but you guys have some experience with it
15:17 🔗 JAA I bet the reason is what's written in the review.
15:18 🔗 ola_norsk "A documentary full of false conspiracy theories" ?
15:18 🔗 JAA Yes
15:20 🔗 ola_norsk I've seen the interview, and yes, it's even edited and cut blatantly in quotation clips, ... But i'm fairly certain i could pull up many documentaries that do the very same thing
15:22 🔗 ola_norsk has quit IRC (Quit: GET A FUCKING GRIP ON YOURSELF, AMERICA!)
15:22 🔗 JAA Of course this isn't the only video that ever spread misinformation. (NB, I haven't watched it, but PolitiFact is quite reliable, and the description certainly screams bullshit.)
15:36 🔗 Arcorann has quit IRC (Read error: Connection reset by peer)
15:52 🔗 asdf0101 has quit IRC (Remote host closed the connection)
15:56 🔗 asdf0101 has joined #archiveteam-ot
17:20 🔗 ola_norsk has joined #archiveteam-ot
17:24 🔗 ola_norsk I am not quite sure how to formulate this question; But is my understanding correct, in that archive.org have begun 'darkening' items , based off of tertiary sources such as politico?
17:27 🔗 JAA You'd have to ask IA.
17:28 🔗 ola_norsk What is your feel about it though?
17:28 🔗 JAA Maybe?
17:29 🔗 JAA They've started putting warnings in the WBM for content that has since been removed from the target website for ToS violations (e.g. Medium), so it doesn't seem unreasonable.
17:33 🔗 ola_norsk I believed in this Kahle quote .. https://imgur.com/a/ZCyCVTJ
17:35 🔗 JAA Nothing's being deleted or even made inaccessible though.
17:35 🔗 JAA Just like Trump's tweet didn't get deleted, only a warning attached for spouting nonsense.
17:36 🔗 ola_norsk though the 'facts' are to be found elsewhere?
17:36 🔗 ola_norsk on some other site?
17:36 🔗 JAA Wat?
17:38 🔗 ola_norsk It is my understanding that rather significant study got retracted recently .. https://www.politifact.com/article/2020/may/08/fact-checking-plandemic-documentary-full-false-con/ article does not reflect that
17:38 🔗 * ola_norsk have a tendency to scimread though
17:39 🔗 ola_norsk source: https://youtu.be/48Xh9p7Q6Xs
17:39 🔗 ola_norsk The politico article say 7th May
17:42 🔗 phuzion_ ola_norsk: ArchiveTeam is not the Internet Archive. If you have specific questions for the Internet Archive, I'd suggest you reach out to the Internet Archive and ask them directly. https://archive.org/about/contact.php
17:44 🔗 ola_norsk phuzion_: I am aware of that. I am asking as a contributor, to what i expect are you guys as contributors
17:44 🔗 JAA 1. Upload stuff that's worth preserving to IA. 2. ??? 3. PROFIT
17:50 🔗 ola_norsk Im not sure how to word the worry a thoughts of a politically filtered archive brings me
17:50 🔗 JAA 17:35:12 <@JAA> Nothing's being deleted or even made inaccessible though.
17:50 🔗 JAA It isn't filtered.
17:50 🔗 JAA Unless you have evidence to the contrary.
17:53 🔗 ola_norsk I don't , but seeing a completely harmless interview, albeit silly or lacking factual statements being 'darkened' and suddenly requiring login to view, as if it was some dangerous thing, is worrying. Especially, if the decision to do so was done by trusting 'fact checkers' who can't even keep their own facts in line.
17:54 🔗 JAA Yes, spreading misinformation that may lead to preventable deaths is dangerous.
17:56 🔗 ola_norsk Then if it's the case that the item got darkened, based on the politico 'fact check', some note should be taken that politicos facts arent up to date
17:56 🔗 ola_norsk IMO
17:56 🔗 JAA Nothing got darkened.
17:57 🔗 ola_norsk are there any items of the intervew that does not have 'noindex: true' set?
17:59 🔗 JAA Have you tried searching? https://archive.org/search.php?query=plandemic
18:00 🔗 godane has joined #archiveteam-ot
18:01 🔗 ola_norsk Since they are all the same, why are some "login required"
18:01 🔗 JAA Why do you keep asking things here that only IA can answer?
18:01 🔗 ola_norsk Theoretical
18:01 🔗 JAA Because nobody noticed yet or the scripts that find them haven't run recently or or or
18:02 🔗 ola_norsk i guess my concern boils down to, "who factchecks the factchekers" ..
18:03 🔗 phuzion_ You seem to have a lot of questions about IA, none of which we have the correct answers to, because as I will reiterate for the third time, we are not the Internet Archive. If you have specific questions for the Internet Archive, ask them directly. They have an email address info@archive.org that you can email and ask.
18:03 🔗 ola_norsk but anyway, i've ran out of english diction on such a difficult topic
18:04 🔗 ola_norsk phuzion_: If you scroll up a bit, you will notice i wrote "I am aware"
18:04 🔗 phuzion_ ola_norsk: And yet you continue to ask questions to people who do not have the answers.
18:04 🔗 JAA So you're arguing for the sake of arguing, got it.
18:04 🔗 ola_norsk not arguing, asking for opinions
18:05 🔗 ola_norsk has quit IRC (Quit: leaving)
18:06 🔗 phuzion_ is now known as phuzion
18:10 🔗 JAA Now I wonder how much space is getting wasted by these people throwing up dozens of copies of the same interview "because it gets suppressed".
18:18 🔗 JAA Oh, neat, this could be useful for covering elections: https://help.twitter.com/en/using-twitter/election-labels
19:58 🔗 SynMonger has quit IRC (Quit: Wait, what?)
19:59 🔗 SynMonger has joined #archiveteam-ot
20:00 🔗 SynMonger has quit IRC (Client Quit)
20:02 🔗 SynMonger has joined #archiveteam-ot
21:18 🔗 SynMonger has quit IRC (Quit: Wait, what?)
21:21 🔗 SynMonger has joined #archiveteam-ot
21:39 🔗 ephemer0l has quit IRC (Read error: Connection reset by peer)
21:51 🔗 DogsRNice has joined #archiveteam-ot
22:49 🔗 HP_Archiv has joined #archiveteam-ot
23:44 🔗 ephemer0l has joined #archiveteam-ot

irclogger-viewer