[00:00] So I'm asking here to see if anyone has any releavnt knowledge to contibute [00:01] Mateon1: I wrote a tool to grab comment data a while ago: https://git.kiska.pw/JustAnotherArchivist/youtube-comments [00:01] Has anyone looked into the structure of continuation tokens? There are (at least) two levels of base64-encoded data [00:01] Including some binary format stuff [00:01] It just follows the pagination and throws everything into a WARC. I haven't gotten around to writing a tool that processes that data into something usable. [00:03] I haven't used it in the past few months though, and YouTube has tightened its rate limits massively, so it might be running into issues now. [00:03] *** HCross has quit IRC (Read error: Connection reset by peer) [00:04] Also, it's based on qwarc, which can at best be considered beta-level code. [00:04] But if you just want to understand how the pagination needs to be done, it should be useful. [00:04] Huh, how recently was the rate-limiting implemented? I recently ran a crawl on just the watch pages, and had no issue making hundreds of megabytes per second in requests [00:05] NB, I did look into the actual pagination tokens back then as well. It's base64-encoded protobuf, but I couldn't be arsed to implement that. [00:05] There has always been rate limiting, but in the last few months, they made it much stricter. [00:06] I'm not sure if it applies to just the watch page or only if you try to fetch more things like the actual videos. [00:06] I didn't have issues with my comments crawler back when I wrote it, but I haven't used it since. [00:07] *** mateon has joined #archiveteam-ot [00:07] *** chr1sm has quit IRC (Read error: Connection reset by peer) [00:07] *** qw3rty__ has quit IRC (Read error: Connection reset by peer) [00:07] Sorry, my other client lagged out [00:08] *** Ctrl-S___ has quit IRC (Read error: Connection reset by peer) [00:08] Anyway, the protobuf includes yet another base64 token [00:08] *** qw3rty has joined #archiveteam-ot [00:08] If you decode it you seem to get garbage binary data [00:09] Except the first 5 bytes seem to follow a pattern, and further continuation tokens have more of that binary data [00:09] Hmm, I don't remember that, but it's been a few months, and I quickly dismissed that idea anyway and just went with using the tokens as magic. [00:09] *** Meli has quit IRC (Read error: Operation timed out) [00:11] *** Meli has joined #archiveteam-ot [00:11] *** robogoat has quit IRC (Ping timeout: 272 seconds) [00:12] Oh well, it was an interesting idea, I do already treat playlist continuation tokens as magic, so I guess that just works the same [00:12] *** robogoat has joined #archiveteam-ot [00:13] Yeah, that's why I tried it as well back then. It would be nice to just hammer the comments API endpoint directly. [00:15] By the way, I also wrote scripts for grabbing live chat, both as it happens and from replay. Figured you may be interested in that as well. [00:15] *** Ctrl-S___ has joined #archiveteam-ot [00:15] https://git.kiska.pw/JustAnotherArchivist/youtube-livechat and https://git.kiska.pw/JustAnotherArchivist/youtube-livechatreplay respectively. [00:16] *** Mateon1 has quit IRC (Remote host closed the connection) [00:18] Thanks, that's quite useful [00:18] *** HCross has joined #archiveteam-ot [00:19] I just checked, and it seems that Youtube has set favorite videos lists and liked videos lists completely private for everyone [00:19] *** godane has quit IRC (Read error: Operation timed out) [00:19] I can't access any of those lists and don't see any option to make my own ones public [00:19] *** godane has joined #archiveteam-ot [00:20] Yes, that happened in December. [00:20] https://www.archiveteam.org/index.php/YouTube#Liked_lists_.28December_2019.29 [00:20] (And yeah, that page could use an update.) [00:20] This really hurts archiving content 'average users' care about, as most don't collect things in separate playlists [00:20] And this hurts unlisted video discovery [00:21] *** HCross has quit IRC (Read error: Connection reset by peer) [00:21] *** Ctrl-S___ has quit IRC (Read error: Connection reset by peer) [00:21] *** t3 has quit IRC (Read error: Connection reset by peer) [00:21] Yeah, I had to call for help trying to do those because it was on short notice, like a month or so... [00:22] *** BlueMax has joined #archiveteam-ot [00:25] *** ephemer0l has joined #archiveteam-ot [00:30] *** HCross has joined #archiveteam-ot [00:38] *** Ivy has joined #archiveteam-ot [00:39] *** justcool3 has joined #archiveteam-ot [00:40] *** justcool3 has quit IRC (Read error: Connection reset by peer) [00:40] *** Ivy has quit IRC (Read error: Connection reset by peer) [00:40] *** HCross has quit IRC (Read error: Connection reset by peer) [00:45] *** Ivy has joined #archiveteam-ot [00:46] *** hook54321 has joined #archiveteam-ot [00:47] *** justcool3 has joined #archiveteam-ot [00:47] *** HCross has joined #archiveteam-ot [00:48] *** Meli has quit IRC (Read error: Operation timed out) [00:50] *** Meli has joined #archiveteam-ot [00:52] *** Ctrl-S___ has joined #archiveteam-ot [00:55] *** HCross has quit IRC (Read error: Connection reset by peer) [00:55] *** t3 has joined #archiveteam-ot [00:55] *** Ctrl-S___ has quit IRC (Read error: Connection reset by peer) [00:55] *** t3 has quit IRC (Read error: Connection reset by peer) [00:55] *** justcool3 has quit IRC (Read error: Connection reset by peer) [00:55] *** hook54321 has quit IRC (Read error: Connection reset by peer) [00:55] *** Ivy has quit IRC (Read error: Connection reset by peer) [01:07] *** HCross has joined #archiveteam-ot [01:08] *** chr1sm has joined #archiveteam-ot [01:09] *** HCross has quit IRC (Read error: Connection reset by peer) [01:09] *** chr1sm has quit IRC (Read error: Connection reset by peer) [01:10] *** mgrytbak has joined #archiveteam-ot [01:11] *** t3 has joined #archiveteam-ot [01:11] *** revi has joined #archiveteam-ot [01:12] *** Ivy has joined #archiveteam-ot [01:12] *** Kaz has joined #archiveteam-ot [01:12] *** hook54321 has joined #archiveteam-ot [01:12] *** chr1sm has joined #archiveteam-ot [01:13] *** Ctrl-S___ has joined #archiveteam-ot [01:15] *** HCross has joined #archiveteam-ot [01:17] *** xit has joined #archiveteam-ot [01:17] *** tech234a has joined #archiveteam-ot [01:18] *** deathy__ has joined #archiveteam-ot [01:18] *** justcool3 has joined #archiveteam-ot [01:20] *** horkermon has joined #archiveteam-ot [01:37] *** Vito` has joined #archiveteam-ot [01:44] *** Vito` has quit IRC (Read error: Connection reset by peer) [01:46] *** Vito` has joined #archiveteam-ot [01:55] *** mateon has quit IRC (Quit: leaving) [01:58] *** picklefac has joined #archiveteam-ot [02:05] *** amelia386 has joined #archiveteam-ot [02:05] *** DrasticAc has joined #archiveteam-ot [02:06] *** pnJay has joined #archiveteam-ot [02:06] *** starlord has joined #archiveteam-ot [02:08] *** diggan has joined #archiveteam-ot [03:01] *** qw3rty_ has joined #archiveteam-ot [03:09] *** qw3rty has quit IRC (Read error: Operation timed out) [03:59] *** qw3rty__ has joined #archiveteam-ot [04:06] *** qw3rty_ has quit IRC (Read error: Operation timed out) [04:42] *** ephemer0l has quit IRC (Read error: Connection reset by peer) [05:56] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [06:01] *** Meli has quit IRC (Remote host closed the connection) [06:02] *** schbirid has quit IRC (Quit: Leaving) [06:24] *** fuzzy8021 has joined #archiveteam-ot [06:26] *** fuzzy802 has joined #archiveteam-ot [06:26] *** Igloo has quit IRC (Read error: Operation timed out) [06:28] *** fuzzy8021 has quit IRC (Ping timeout: 260 seconds) [06:31] *** fuzzy8021 has joined #archiveteam-ot [06:31] *** fuzzy802 has quit IRC (Ping timeout: 260 seconds) [06:38] *** Mateon1 has joined #archiveteam-ot [06:44] *** ephemer0l has joined #archiveteam-ot [06:48] *** Igloo has joined #archiveteam-ot [07:11] *** Arcorann has quit IRC (Remote host closed the connection) [07:12] *** Arcorann has joined #archiveteam-ot [07:17] *** benjins has quit IRC (Remote host closed the connection) [07:18] *** benjins has joined #archiveteam-ot [07:19] *** benjins has quit IRC (Remote host closed the connection) [07:20] *** benjins has joined #archiveteam-ot [07:22] *** betamax has quit IRC (Read error: Operation timed out) [07:22] *** betamax has joined #archiveteam-ot [07:22] *** Arcorann_ has joined #archiveteam-ot [07:22] *** britmob_ has joined #archiveteam-ot [07:24] *** yawkat has quit IRC (Read error: Operation timed out) [07:24] *** benjinsmi has joined #archiveteam-ot [07:24] *** yawkat has joined #archiveteam-ot [07:26] *** auctus has quit IRC (Read error: Operation timed out) [07:26] *** auctus_ has joined #archiveteam-ot [07:27] *** Terbium_ has joined #archiveteam-ot [07:28] *** britmob has quit IRC (Read error: Operation timed out) [07:28] *** Terbium has quit IRC (Read error: Operation timed out) [07:28] *** Arcorann has quit IRC (Read error: Operation timed out) [07:29] *** sknebel has joined #archiveteam-ot [07:31] *** benjins has quit IRC (Ping timeout: 610 seconds) [07:31] *** sknebel_ has quit IRC (Ping timeout: 610 seconds) [07:54] *** drcd has quit IRC (Ping timeout: 272 seconds) [07:54] *** drcd has joined #archiveteam-ot [07:56] *** Gfy has joined #archiveteam-ot [07:56] *** Hecatz has quit IRC (Ping timeout: 272 seconds) [07:56] *** Coderjo_ has quit IRC (Ping timeout: 272 seconds) [07:56] *** Gfy_ has quit IRC (Ping timeout: 272 seconds) [07:56] *** Laverne has quit IRC (Ping timeout: 272 seconds) [07:56] *** NatarajBt has quit IRC (Ping timeout: 272 seconds) [07:56] *** sHATNER has quit IRC (Ping timeout: 272 seconds) [07:57] *** Terbium_ has quit IRC (Ping timeout: 272 seconds) [07:58] *** Coderjo has joined #archiveteam-ot [07:58] *** Hecatz has joined #archiveteam-ot [08:03] *** Terbium has joined #archiveteam-ot [08:47] *** sHATNER has joined #archiveteam-ot [08:47] *** NatarajBt has joined #archiveteam-ot [08:48] *** Laverne has joined #archiveteam-ot [09:43] *** Meli has joined #archiveteam-ot [10:48] *** Arcorann_ has quit IRC (Read error: Connection reset by peer) [10:57] *** Arcorann has joined #archiveteam-ot [11:29] *** Stiletto has quit IRC (Read error: Operation timed out) [11:53] *** HP_Archiv has quit IRC (Quit: Leaving) [12:23] *** BlueMax has quit IRC (Read error: Connection reset by peer) [13:43] *** godane has quit IRC (Quit: Leaving.) [14:06] *** icedice has quit IRC (Leaving) [15:13] *** ola_norsk has joined #archiveteam-ot [15:14] Could anyone tell me why this item is now "You need to log in to view it" ? https://archive.org/details/BitChute-KqKoN4dQp7TP [15:14] "Archive Team: We're not archive.org" [15:15] I get that, but you guys have some experience with it [15:17] I bet the reason is what's written in the review. [15:18] "A documentary full of false conspiracy theories" ? [15:18] Yes [15:20] I've seen the interview, and yes, it's even edited and cut blatantly in quotation clips, ... But i'm fairly certain i could pull up many documentaries that do the very same thing [15:22] *** ola_norsk has quit IRC (Quit: GET A FUCKING GRIP ON YOURSELF, AMERICA!) [15:22] Of course this isn't the only video that ever spread misinformation. (NB, I haven't watched it, but PolitiFact is quite reliable, and the description certainly screams bullshit.) [15:36] *** Arcorann has quit IRC (Read error: Connection reset by peer) [15:52] *** asdf0101 has quit IRC (Remote host closed the connection) [15:56] *** asdf0101 has joined #archiveteam-ot [17:20] *** ola_norsk has joined #archiveteam-ot [17:24] I am not quite sure how to formulate this question; But is my understanding correct, in that archive.org have begun 'darkening' items , based off of tertiary sources such as politico? [17:27] You'd have to ask IA. [17:28] What is your feel about it though? [17:28] Maybe? [17:29] They've started putting warnings in the WBM for content that has since been removed from the target website for ToS violations (e.g. Medium), so it doesn't seem unreasonable. [17:33] I believed in this Kahle quote .. https://imgur.com/a/ZCyCVTJ [17:35] Nothing's being deleted or even made inaccessible though. [17:35] Just like Trump's tweet didn't get deleted, only a warning attached for spouting nonsense. [17:36] though the 'facts' are to be found elsewhere? [17:36] on some other site? [17:36] Wat? [17:38] It is my understanding that rather significant study got retracted recently .. https://www.politifact.com/article/2020/may/08/fact-checking-plandemic-documentary-full-false-con/ article does not reflect that [17:38] * ola_norsk have a tendency to scimread though [17:39] source: https://youtu.be/48Xh9p7Q6Xs [17:39] The politico article say 7th May [17:42] ola_norsk: ArchiveTeam is not the Internet Archive. If you have specific questions for the Internet Archive, I'd suggest you reach out to the Internet Archive and ask them directly. https://archive.org/about/contact.php [17:44] phuzion_: I am aware of that. I am asking as a contributor, to what i expect are you guys as contributors [17:44] 1. Upload stuff that's worth preserving to IA. 2. ??? 3. PROFIT [17:50] Im not sure how to word the worry a thoughts of a politically filtered archive brings me [17:50] 17:35:12 <@JAA> Nothing's being deleted or even made inaccessible though. [17:50] It isn't filtered. [17:50] Unless you have evidence to the contrary. [17:53] I don't , but seeing a completely harmless interview, albeit silly or lacking factual statements being 'darkened' and suddenly requiring login to view, as if it was some dangerous thing, is worrying. Especially, if the decision to do so was done by trusting 'fact checkers' who can't even keep their own facts in line. [17:54] Yes, spreading misinformation that may lead to preventable deaths is dangerous. [17:56] Then if it's the case that the item got darkened, based on the politico 'fact check', some note should be taken that politicos facts arent up to date [17:56] IMO [17:56] Nothing got darkened. [17:57] are there any items of the intervew that does not have 'noindex: true' set? [17:59] Have you tried searching? https://archive.org/search.php?query=plandemic [18:00] *** godane has joined #archiveteam-ot [18:01] Since they are all the same, why are some "login required" [18:01] Why do you keep asking things here that only IA can answer? [18:01] Theoretical [18:01] Because nobody noticed yet or the scripts that find them haven't run recently or or or [18:02] i guess my concern boils down to, "who factchecks the factchekers" .. [18:03] You seem to have a lot of questions about IA, none of which we have the correct answers to, because as I will reiterate for the third time, we are not the Internet Archive. If you have specific questions for the Internet Archive, ask them directly. They have an email address info@archive.org that you can email and ask. [18:03] but anyway, i've ran out of english diction on such a difficult topic [18:04] phuzion_: If you scroll up a bit, you will notice i wrote "I am aware" [18:04] ola_norsk: And yet you continue to ask questions to people who do not have the answers. [18:04] So you're arguing for the sake of arguing, got it. [18:04] not arguing, asking for opinions [18:05] *** ola_norsk has quit IRC (Quit: leaving) [18:06] *** phuzion_ is now known as phuzion [18:10] Now I wonder how much space is getting wasted by these people throwing up dozens of copies of the same interview "because it gets suppressed". [18:18] Oh, neat, this could be useful for covering elections: https://help.twitter.com/en/using-twitter/election-labels [19:58] *** SynMonger has quit IRC (Quit: Wait, what?) [19:59] *** SynMonger has joined #archiveteam-ot [20:00] *** SynMonger has quit IRC (Client Quit) [20:02] *** SynMonger has joined #archiveteam-ot [21:18] *** SynMonger has quit IRC (Quit: Wait, what?) [21:21] *** SynMonger has joined #archiveteam-ot [21:39] *** ephemer0l has quit IRC (Read error: Connection reset by peer) [21:51] *** DogsRNice has joined #archiveteam-ot [22:49] *** HP_Archiv has joined #archiveteam-ot [23:44] *** ephemer0l has joined #archiveteam-ot