[00:10] *** BlueMax has quit IRC (Quit: Leaving) [00:35] *** HashbangI has quit IRC (Read error: Connection reset by peer) [00:45] *** hshashash has joined #archiveteam-bs [00:48] has there been any discussion about archiving podcasts? very culturally significant and they' [00:48] re not going to be around forever [00:49] scraping large podcast hosting sites would get the bulk of it with relatively little effort [01:09] *** BlueMax has joined #archiveteam-bs [02:19] i think it'd be cool if someone had/is archiving NPR's hourly news updates radio stream (3~5 minute mp3 each hour) [02:19] url never changes [02:22] http://public.npr.org/anon.npr-mp3/npr/news/newscast.mp3 [02:31] *** Raccoon has quit IRC (Read error: Connection reset by peer) [02:31] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [02:32] *** Raccoon has joined #archiveteam-bs [02:39] *** shashasha has joined #archiveteam-bs [02:39] *** yano has quit IRC (Read error: Connection reset by peer) [02:39] *** yano_ has joined #archiveteam-bs [02:40] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [02:41] *** fuzzy8021 has joined #archiveteam-bs [02:41] *** mundus20- has quit IRC (Read error: Operation timed out) [02:41] *** mundus201 has joined #archiveteam-bs [02:42] *** hshashash has quit IRC (Read error: Connection reset by peer) [02:43] *** paul2520 has quit IRC (Read error: Operation timed out) [02:43] *** paul2520 has joined #archiveteam-bs [02:45] *** shashasha has quit IRC (Remote host closed the connection) [02:45] *** qw3rty112 has joined #archiveteam-bs [02:47] *** systwi_ has joined #archiveteam-bs [02:47] *** d5f4a3622 has quit IRC (Read error: Connection reset by peer) [02:47] *** qw3rty111 has quit IRC (Read error: Operation timed out) [02:47] *** d5f4a3622 has joined #archiveteam-bs [02:47] *** Flashfloo has quit IRC (Ping timeout: 601 seconds) [02:47] *** TigerbotH has quit IRC (Ping timeout: 601 seconds) [02:49] *** shashasha has joined #archiveteam-bs [02:49] *** shashasha has quit IRC (Remote host closed the connection) [02:49] *** kiska11 has joined #archiveteam-bs [02:50] *** Igloo_ has joined #archiveteam-bs [02:53] *** ndiddy has quit IRC (Ping timeout: 602 seconds) [02:53] *** TigerbotH has joined #archiveteam-bs [02:54] *** paul2520 has quit IRC (Ping timeout: 602 seconds) [02:54] *** systwi has quit IRC (Ping timeout: 602 seconds) [02:54] *** luckcolor has quit IRC (Ping timeout: 602 seconds) [02:54] *** Kenshin has quit IRC (Ping timeout: 602 seconds) [02:54] *** luckcolor has joined #archiveteam-bs [02:54] *** kiska1 has quit IRC (Ping timeout: 600 seconds) [02:54] *** Igloo has quit IRC (Ping timeout: 600 seconds) [02:54] *** shashasha has joined #archiveteam-bs [02:54] *** abstract has quit IRC (Ping timeout: 600 seconds) [02:55] *** ivan- has joined #archiveteam-bs [02:55] *** Fusl sets mode: +o ivan- [02:56] *** Kenshin has joined #archiveteam-bs [02:56] *** Fusl sets mode: +o Kenshin [02:56] *** ndiddy has joined #archiveteam-bs [02:57] *** paul2520 has joined #archiveteam-bs [02:57] *** joepie91 has quit IRC (Ping timeout: 600 seconds) [02:58] *** ivan_ has quit IRC (Ping timeout: 600 seconds) [02:58] *** abstract has joined #archiveteam-bs [02:59] *** PotcFdk has quit IRC (Ping timeout: 600 seconds) [02:59] *** ivan- is now known as ivan_ [03:03] *** joepie91 has joined #archiveteam-bs [03:06] *** PotcFdk has joined #archiveteam-bs [03:20] *** step has quit IRC (Read error: Operation timed out) [03:47] *** wyatt8740 has joined #archiveteam-bs [03:52] *** wyatt8740 has quit IRC (Ceci n'est pas un IRC quit message.) [03:53] *** wyatt8740 has joined #archiveteam-bs [03:56] *** qw3rty113 has joined #archiveteam-bs [03:58] *** systwi_ is now known as systwi [04:00] *** qw3rty112 has quit IRC (Ping timeout: 600 seconds) [04:19] *** wyatt8740 has quit IRC (Read error: Operation timed out) [04:37] *** kiskabak has quit IRC (Remote host closed the connection) [04:37] *** kiskabak has joined #archiveteam-bs [04:37] *** Fusl sets mode: +o kiskabak [04:38] *** kiska11 is now known as kiska1 [05:12] *** fredgido has quit IRC (Ping timeout: 252 seconds) [05:12] *** fredgido has joined #archiveteam-bs [05:15] *** Rotietip has joined #archiveteam-bs [05:16] Hello everyone, there are some sites of which I am a little worried about their future/long-term stability, could anyone initiate a work in the Warrior for http://www.mundoparanormal.com http://grupoelron.org/ and http://www.videosporno.tv/ ? [05:27] *** wyatt8740 has joined #archiveteam-bs [05:40] *** terry1 has quit IRC (Read error: Operation timed out) [05:50] *** terry1 has joined #archiveteam-bs [05:50] *** PhrackD has quit IRC (Read error: Operation timed out) [05:52] Well, the last one is a straight up porn site [05:52] *** PhrackD has joined #archiveteam-bs [05:53] Other two went in [05:56] I know, but it has not been updated for more than a year and I am worried that it will be closed at any time, in addition to the fact that it only has videos embedded from external tubes, so the backup should not weigh too much. [05:57] Use the save now feature is my suggestion. I dont speak for archiveteam but I havent seen us do much to save porn [05:57] Except for EroShare I think it was [06:06] Flashfire: It is a site with thousands of entries, if you refer to the tool that has archive.org it would take me days to save the whole site but it doesn't matter. By the way, can ArchiveBot save individual boards from 8ch.net (including the threads archive that is loaded with javascript)? [06:06] https://web.archive.org/save/ I was refering to this yes [06:10] Rotietip https://web.archive.org/web/*/www.videosporno.tv/* I wouldnt worry too much about that site seems reasonably covered for a site that is mostly embeds from other sites [06:11] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [06:15] At least that leaves me calmer, I imagined it would be a lesser known site (like the other two) and therefore I thought I would not be so indexed. [06:15] last I checked IA doesn't take 8ch [06:16] oh never mind it's in wayback [06:21] *** m007a83 has joined #archiveteam-bs [06:32] *** step has joined #archiveteam-bs [06:33] 8ch.net is like 4chan but everyone can create their own board (so there are thousands of all kinds). The issue is that if the owner of a board does not appear in two weeks it becomes "reclaimable" by anyone. There is also the fact that old threads are indexed in a file (although this is configurable) like this: https://8ch.net/hisparefugio/archive/index.html (if it takes a long time to load, it is because there are thousands of [06:33] threads which are in JSON format within index.html). [06:33] The idea would be to save some boards currently inactive and with valuable content before some asshole claims them and sends everything to hell for "lulz/trolling" [06:42] I cant help there at all its blocked by the Australian Government [06:46] No problem, however it is possible that 8chan requires more specialized tools than wget or ArchiveBot (at least to get the thread links) [06:56] I would assume chromebot it being headless Chromium might be able to do it [07:32] Speaking of Chromebot, with that would it be possible to archive a site like https://radio.garden/ ? [07:40] Possible, Rotietip. The site needs a lot of user interaction though, so custom click selectors are necessary. [07:41] *** Dimtree has quit IRC (Ping timeout: 745 seconds) [08:06] *** Dimtree has joined #archiveteam-bs [08:30] *** Sokar has quit IRC (Quit: KVIrc 4.9.3 Aria http://www.kvirc.net/) [08:52] can someone add http://www.fuzzymemories.tv/ to archivebot and possibly chromebot looks unique and interesting though not sure how the videos will go [09:58] what do you all think about archiving https://picosong.com/ ? i did a quick dive through a majestic report of 51k urls and saw a lot of content that may get DMCAd when its up at the IA [09:59] JAA Kaz SketchCow? [10:03] *** killsushi has quit IRC (Quit: Leaving) [10:09] *** Dj-Wawa has joined #archiveteam-bs [10:26] *** t3 has quit IRC (Quit: Connection closed for inactivity) [10:29] Does a DMCA cost IA anything? Would the benefits outweigh the risks? [10:49] In the case of radio.garden the links of the radios that appear in the lower right corner can be detected with ".channel-list-item-link", for cities with many radios you can go to the next page with ".channel-list-navigation > .mod-next" and the most difficult thing would be to get the green dots to click since it seems to be all drawn on a canvas element (which can be found with ".cesium-widget") [10:54] A project like that is not something I can dedicate a lot of time to at the moment but there I leave the idea in case someone wants to make a script to emulate the necessary interactions with the site to be able to archive it with Chromebot. [11:18] *** Rotietip has quit IRC (Read error: Operation timed out) [11:27] *** BlueMax has quit IRC (Quit: Leaving) [11:31] *** Rotietip has joined #archiveteam-bs [12:15] *** TC01 has joined #archiveteam-bs [12:17] *** Rotietip has quit IRC (Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/) [12:38] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [12:43] *** Dj-Wawa has joined #archiveteam-bs [12:57] *** yano_ is now known as yano [13:01] *** killsushi has joined #archiveteam-bs [13:38] Oh youuuuuuuu [13:45] https://www.pgmusic.com/forums/ubbthreads.php?ubb=showflat&Number=319856 [13:45] My attitude about Picosong is into the wayback. [13:47] *** Igloo_ is now known as Igloo [13:48] This sounds like a task for qwarc, unless someone wants to set up a warrior project. [13:49] I can look when I get home as I thought qwarc was buggy? [13:49] #notsosung ? :) [13:53] Yes, it is. [13:53] (But I blame aiohttp.) [14:00] Based on Fusl's Majestic list, the song IDs are [0-9A-Za-z]{,4}. There are also some w[0-9A-Za-z]{4} (more recent uploads it seems). Inexistent IDs redirect back to the homepage. [14:02] ,5 [14:02] not ,4 [14:02] some of them are 5 chars [14:03] Yes, but all of them start with "w". [14:03] Well, all that I found. [14:04] The IDs also appear to be somewhat sequential. [14:04] Got an example? [14:04] weird [14:04] picosong sounds nice [14:04] does it require a warrior project? [14:05] so it's w?[0-9A-Za-z]{1,4} [14:05] Yup [14:05] arkiver: Don't start another one. :-P [14:05] well if we need it I´m fine with getting one running [14:05] I have time [14:05] Yeah, let's get the others off the ground first. [14:05] but if we don´t need it, then fine too [14:05] This one will stay online until October. [14:06] yeah [14:06] but let´s not forget about it [14:06] We have time, I almost have a full list of freeml too [14:06] Which is due to due in December. [14:06] *to die [14:07] hmm yes [14:07] Should be ~30 million IDs to attempt on picosong (2 * 62**4). I'll try with qwarc for everything but the actual audio files (because qwarc's very slow and leaking memory left and right on processing large files). [14:10] yes [14:11] "I have time" there's still sony sketch and reddit :P [14:12] Sketch is the important one as that deadline is looming [14:15] alright [14:17] There are also Disqus comments on picosong. Do we have a method for grabbing those yet? [14:21] no [14:21] afaik [14:24] :-/ [14:24] At least it seems that most songs don't have any comments. [14:24] Found two examples so far with one each: https://picosong.com/ytid/ https://picosong.com/jLBh/ [14:26] Ok, 19 on this one: https://picosong.com/wK3ag/ [14:26] Anyway [14:26] I'll just leave https://disqus.com/home/forum/picosong/ here. [14:44] *** kiska sets mode: +o Igloo [14:55] *** Mayonaise has quit IRC (Read error: Connection reset by peer) [14:56] *** Mayonaise has joined #archiveteam-bs [17:18] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [17:24] *** bithippo has joined #archiveteam-bs [17:30] *** bithippo has quit IRC (Textual IRC Client: www.textualapp.com) [17:57] *** schbirid has joined #archiveteam-bs [18:19] *** wyatt8740 has quit IRC (Read error: Operation timed out) [18:24] *** wyatt8740 has joined #archiveteam-bs [18:52] *** wyatt8740 has quit IRC (Read error: Operation timed out) [19:05] *** wyatt8740 has joined #archiveteam-bs [19:18] *** wyatt8740 has quit IRC (Read error: Operation timed out) [19:44] *** schbirid has quit IRC (Remote host closed the connection) [22:15] Fusl: Do we have lingering "need admin IA work done" tasks [22:15] yes [22:15] sonysketch [22:15] and archiveteam_sonysketchimg [22:16] i PMd you the email addresses list on twitter [22:16] and we'll need another one for reddit soon i believe [22:21] It's a huge pain in the ass [22:21] I wish I could make it a command line but it's not possible [22:22] I'm thinking of making an archiveteam holdingbay collection [22:22] Then all your shit can go in there, and I write scripts to transfer over to final resting place. [22:22] Wayback doesn't care [22:24] Yes, I'm doing this [22:25] Less pain down the line and you're not all sitting around kicking the can waiting for me [22:26] https://archive.org/details/archiveteam_inbox [22:29] Yeah, seems like a good idea. [22:29] Fusl, Igloo, HCross and Kiska all have access to this collection now. [22:30] Is there anyone else doing high-grade uploads we need to add to it [22:30] * Kaz waves [22:30] checking address now [22:31] pm'd [22:31] Me too. [22:31] Also PM'd. [22:36] SketchCow: By the way, you missed this beautiful question about the Tumblr collection over in #tumbledown yesterday: [22:36] 2019-07-22 20:50:35 UTC < rbarreto> hi everyone - i'm a researcher on the impacts of SESTA / FOSTA and I'm trying to get data on the blogs that were impacted by NSFW [22:36] 2019-07-22 20:50:55 UTC < rbarreto> but when I look at what the internet archive has ...it's pretty much just images of this one guy [22:49] what [22:49] not enough tumblr porn on IA? I doubt this very much [22:53] They're referring to the images on this page: https://archive.org/details/archiveteam_tumblr [22:55] haha [22:55] people take thumbnails very seriously these days [22:57] *** second has quit IRC (Quit: ZNC 1.6.5 - http://znc.in) [23:06] *** second has joined #archiveteam-bs [23:27] *** BlueMax has joined #archiveteam-bs [23:34] *** wyatt8740 has joined #archiveteam-bs [23:41] *** VerifiedJ has quit IRC (Quit: Leaving) [23:45] *** wyatt8740 has quit IRC (Read error: Operation timed out) [23:47] *** wyatt8740 has joined #archiveteam-bs