[00:04] *** HP_Archiv has quit IRC (Client Quit) [00:10] *** BlueMax has joined #archiveteam-bs [00:26] *** step has joined #archiveteam-bs [01:51] *** Smiley has joined #archiveteam-bs [01:53] *** BlueMaxim has joined #archiveteam-bs [01:53] *** BlueMax has quit IRC (Read error: Connection reset by peer) [01:54] *** SmileyG has quit IRC (Read error: Operation timed out) [02:00] *** ndiddy_ has joined #archiveteam-bs [02:00] *** ndiddy_ has left [02:01] *** Arcorann has quit IRC (Read error: Connection reset by peer) [02:03] *** Arcorann has joined #archiveteam-bs [03:33] *** xit has joined #archiveteam-bs [03:45] *** qw3rty_ has joined #archiveteam-bs [03:53] *** qw3rty__ has quit IRC (Read error: Operation timed out) [04:05] *** godane has quit IRC (Ping timeout: 260 seconds) [04:20] *** godane has joined #archiveteam-bs [04:28] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [04:33] !ao https://www.theblaze.com/news/kim-kardashian-kanye-west-mental-health [04:44] *** nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) [04:45] *** nepeat has joined #archiveteam-bs [05:03] *** fuzzy802 has joined #archiveteam-bs [05:05] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [05:05] Wrong channel [05:13] *** fuzzy802 is now known as fuzzy8021 [05:17] *** fuzzy8021 has quit IRC (Read error: Connection reset by peer) [05:17] *** fuzzy8021 has joined #archiveteam-bs [05:19] *** nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) [05:19] *** nepeat has joined #archiveteam-bs [05:26] *** BlueMax has joined #archiveteam-bs [06:26] *** antomati_ has joined #archiveteam-bs [06:27] *** Wingy has quit IRC (Read error: Operation timed out) [06:27] *** asdf0101 has quit IRC (Read error: Operation timed out) [06:28] *** SynMonger has quit IRC (Read error: Operation timed out) [06:28] *** Jake has quit IRC (Read error: Operation timed out) [06:28] *** qw3rty has joined #archiveteam-bs [06:28] *** SynMonger has joined #archiveteam-bs [06:34] *** Terbium has quit IRC (Read error: Operation timed out) [06:34] *** dxrt_ has quit IRC (Read error: Operation timed out) [06:34] *** antomatic has quit IRC (Read error: Operation timed out) [06:34] *** Mayeau has quit IRC (Read error: Operation timed out) [06:34] *** Terbium has joined #archiveteam-bs [06:34] *** sembiance has quit IRC (Read error: Operation timed out) [06:34] *** qw3rty_ has quit IRC (Read error: Operation timed out) [06:34] *** Mayeau has joined #archiveteam-bs [06:35] *** t3 has quit IRC (Quit: Connection closed for inactivity) [06:35] *** systwi has quit IRC (Read error: Operation timed out) [06:36] *** sembiance has joined #archiveteam-bs [06:36] *** Jake7 has joined #archiveteam-bs [06:36] *** asdf0101 has joined #archiveteam-bs [06:36] *** systwi has joined #archiveteam-bs [06:37] *** Jake8 has joined #archiveteam-bs [06:37] *** paul2520 has quit IRC (Ping timeout: 622 seconds) [06:38] *** dxrt_ has joined #archiveteam-bs [06:38] *** dxrt sets mode: +o dxrt_ [06:38] *** Jake8 has quit IRC (Client Quit) [06:39] *** Jake1 has joined #archiveteam-bs [06:40] *** paul2520 has joined #archiveteam-bs [06:40] *** Jake1 has quit IRC (Client Quit) [06:41] *** Jake2 has joined #archiveteam-bs [06:42] *** Jake2 has quit IRC (Client Quit) [06:43] *** Jake1 has joined #archiveteam-bs [06:44] *** Jake1 has quit IRC (Client Quit) [06:45] *** Jake4 has joined #archiveteam-bs [06:46] *** Jake4 has quit IRC (Client Quit) [06:47] *** Jake7 has quit IRC (Ping timeout: 622 seconds) [06:47] *** Jake9 has joined #archiveteam-bs [06:49] *** Jake3 has joined #archiveteam-bs [06:50] *** mtntmnky has quit IRC (Read error: Operation timed out) [06:50] *** robogoat has quit IRC (Write error: Broken pipe) [06:50] *** prq has quit IRC (Write error: Broken pipe) [06:51] *** robogoat has joined #archiveteam-bs [06:51] *** jrwr has quit IRC (Ping timeout: 260 seconds) [06:51] *** nyany has quit IRC (Read error: Operation timed out) [06:51] *** Raccoon` has joined #archiveteam-bs [06:52] *** phirephl- has quit IRC (Read error: Operation timed out) [06:52] *** Pixi` has joined #archiveteam-bs [06:52] *** twigfoot has quit IRC (Read error: Operation timed out) [06:52] *** DigiDigi has quit IRC (Read error: Operation timed out) [06:53] *** Kaz has quit IRC (Ping timeout: 260 seconds) [06:53] *** revi has quit IRC (Ping timeout: 260 seconds) [06:53] *** Igloo has quit IRC (Read error: Operation timed out) [06:53] *** Darkstar has quit IRC (Read error: Operation timed out) [06:53] *** svchfoo1 has quit IRC (Read error: Operation timed out) [06:54] *** dxrt has quit IRC (Read error: Operation timed out) [06:54] *** dxrt has joined #archiveteam-bs [06:54] *** Iglooop1 has quit IRC (Read error: Operation timed out) [06:54] *** Raccoon has quit IRC (Ping timeout: 376 seconds) [06:54] *** chfoo has quit IRC (Read error: Operation timed out) [06:55] *** svchfoo3 sets mode: +o dxrt [06:55] *** Pixi has quit IRC (Read error: Operation timed out) [06:55] *** chfoo has joined #archiveteam-bs [06:56] *** svchfoo3 sets mode: +o chfoo [06:56] *** Larsenv_ has joined #archiveteam-bs [06:56] *** phirephly has joined #archiveteam-bs [06:56] *** lennier2 has joined #archiveteam-bs [06:57] *** Kaz has joined #archiveteam-bs [06:57] *** Meli has quit IRC (Ping timeout: 272 seconds) [06:57] *** Raccoon` has quit IRC (Remote host closed the connection) [06:58] *** kisspunch has quit IRC (Ping timeout: 272 seconds) [06:58] *** Gfy has quit IRC (Ping timeout: 272 seconds) [06:58] *** LordNigh2 has joined #archiveteam-bs [06:58] *** omglolba- has joined #archiveteam-bs [06:58] *** Larsenv has quit IRC (Read error: Operation timed out) [06:59] *** ndiddy has quit IRC (Ping timeout: 272 seconds) [06:59] *** zerkalo has quit IRC (Ping timeout: 272 seconds) [06:59] *** omglolbah has quit IRC (Ping timeout: 272 seconds) [06:59] *** Laverne has quit IRC (Ping timeout: 272 seconds) [06:59] *** sHATNER has quit IRC (Ping timeout: 272 seconds) [06:59] *** Jake9 has quit IRC (Ping timeout: 622 seconds) [06:59] *** lennier1 has quit IRC (Ping timeout: 272 seconds) [06:59] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds) [06:59] *** LordNigh2 is now known as Lord_Nigh [06:59] *** Maylay has quit IRC (Ping timeout: 272 seconds) [06:59] *** brayden has quit IRC (Ping timeout: 272 seconds) [06:59] *** lennier2 is now known as lennier1 [07:01] *** Gfy has joined #archiveteam-bs [07:02] *** Maylay has joined #archiveteam-bs [07:04] *** kisspunch has joined #archiveteam-bs [07:04] *** Jake4 has joined #archiveteam-bs [07:07] *** Jake3 has quit IRC (Remote host closed the connection) [07:07] *** Jake1 has joined #archiveteam-bs [07:09] *** Jake1 has quit IRC (Client Quit) [07:09] *** Arcorann has quit IRC (Read error: Connection reset by peer) [07:09] *** Jake2 has joined #archiveteam-bs [07:10] *** twigfoot has joined #archiveteam-bs [07:10] *** Arcorann has joined #archiveteam-bs [07:11] *** Jake2 has quit IRC (Client Quit) [07:11] *** Jake67 has joined #archiveteam-bs [07:11] *** DigiDigi has joined #archiveteam-bs [07:11] *** Darkstar has joined #archiveteam-bs [07:11] *** nyany has joined #archiveteam-bs [07:12] *** svchfoo3 sets mode: +o nyany [07:13] *** Jake67 has quit IRC (Client Quit) [07:13] *** prq has joined #archiveteam-bs [07:13] *** Jake2 has joined #archiveteam-bs [07:15] *** Igloo has joined #archiveteam-bs [07:15] *** Jake2 has quit IRC (Client Quit) [07:16] *** jrwr has joined #archiveteam-bs [07:16] *** svchfoo3 sets mode: +o jrwr [07:17] *** Jake4 has quit IRC (Ping timeout: 622 seconds) [07:17] *** Jake4 has joined #archiveteam-bs [07:18] *** mtntmnky has joined #archiveteam-bs [07:18] *** revi has joined #archiveteam-bs [07:26] *** Brayconn has joined #archiveteam-bs [07:27] Brayconn: Could you provide a link to the site? [07:27] Yeah, let me get something else here quick... [07:28] *** z8f6Px98C has joined #archiveteam-bs [07:28] 👋 [07:29] Oh nice, I can send unicode stuff in here [07:29] sup! [07:29] Ok, so site link is https://theartistunion.com/ [07:30] Seems like a bunch of copywritten music? [07:30] It's music hidden behind a download gate. The artist offers a free download (mostly .wav and 320kbps) of a track, but it's hidden behind a "like, repost, follow" button. [07:31] I wrote something that needs a dummy SoundCloud account and can pull the file hidden behind those buttons, don't know if Brayconn already sent that one. [07:32] Currently the biggest issue is indexing everything. [07:32] (I haven't sent any script links yet) [07:32] Hm okay, the site is all Javascript, which is a problem for easy recursive archival. We can get certain pages, but probably not a recursive crawl on the whole thing. [07:32] So we'll have to be selective about important pages. But it seems like the focus is mainly on the songs and not the webpages? [07:32] It uses algolia as a backend, that makes it easier. I've polled 720 000 search results as we speak [07:33] you can only make each search return 1000 results max, the upper limit cannot be changed since it requires the admin api key and we only have access to the search one [07:33] hold on i'm gonna upload those scripts to wetransfer [07:34] my current approach would have been to split the results into chunks of 1k results and then spread that across multiple computers [07:34] https://we.tl/t-Ogt8pfwBNN [07:35] search.py generates strings with a specified length and goes from 'aaa' to '999' for example. it outputs dicts, which can then be assembled into a single big list. [07:37] indexing.py removes duplicates and can sort everything into chunks of 1000. it can also sort the results by the number of downloads, since i feel like the ones with many downloads should probably have priority over those with 0 results [07:37] so far i've only gathered all the search results for strings with up to a length of 3, which is already 2.5gb worth of json. [07:39] And Brayconn mentioned that there are expected to be 713945-ish songs in total on the site? [07:39] 730k [07:40] *** PovAddict has quit IRC (Quit: Konversation terminated!) [07:40] Actually, I said that having only seen the 3 letter index [07:40] Do we know if there'd be more searching 4 letters + ? [07:40] *** sHATNER has joined #archiveteam-bs [07:40] total search results were 2.2 million, but with all the duplicates removed 730k are left [07:41] there are probably more, but searching for 4 digit combinations will take 36 times the amount of time [07:41] *** brayden has joined #archiveteam-bs [07:41] *** Laverne has joined #archiveteam-bs [07:42] i can upload the result files if anyone wants me to [07:42] And the songs come in the form of .mp3s or similar, right? (I'm just looking at my network tab.) [07:43] mp3, wav, it's up to the artist to decide. the api, when looking at the network tab, exposes an mp3 file, but that's only the preview [07:44] as an example, take this url: https://theartistunion.com/api/v3/users/ruvlo/tracks.json [07:45] *** Meli has joined #archiveteam-bs [07:45] the audio_source element inside each list item exposes the mp3:"https://d2tml28x3t0b85.cloudfront.net/tracks/stream_files/001/240/053/original/BLVK%20SHEEP%20&%20NVADRZ%20-%20STATIC%20%28RUVLO%20REMIX%29.mp3?1594765017" [07:46] even if it says mp3 the downloadable file is a wav [07:46] "url":"https://d2tml28x3t0b85.cloudfront.net/tracks/original_files/001/127/461/original/RUVLO%20-%20DAMNED%20%28ELEMENT%20115%29.wav?1571166021" [07:48] Okay, cool. I was wondering if it was some streaming/playlist/partitioned setup (with M3U's or something). But it looks like it's not? [07:48] nope, just the raw files [07:49] by the way, i feel like i should mention this. theartistunion has a reputation for being incredibly unreliable. it will often times return 504 gateway errors because the site went down temporarily, often just for like a minute. this happens sporadically and that's been the case ever since the site launched [07:52] Okay, good to know. [07:53] So this seems like a Warrior project to me (https://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior). [07:53] Probably both for the enumeration of the song links (trying different search combinations) and the downloading of the songs. [07:54] arkiver: ^ [07:55] yup, agreed. [07:57] Thank you for your efforts. arkiver will probably take a while to respond. [07:58] i'm always happy to help :) [07:58] Should one of us make a wiki page for the site? [07:59] Sure, that would be helpful. I'll add it to the Deathwatch page. [07:59] in the meantime here are all the 2.2 million search results that can be organized using indexing.py [07:59] https://we.tl/t-uM4qbH4ibK [08:00] send help I can't find the new page button [08:01] oh ok wait I got to it in a roundabout way [08:02] Also turns out it already has a page, and has supposedly been saved...?! [08:04] *** Meli has quit IRC (Quit: After 1w 1d 16h 32m 33s of wasteful lurking, 's brain 63gf4u1ted! X_x) [08:04] Seems as though we already grabbed the site: Warrior tracker here? https://tracker.archiveteam.org/theartistunion/ [08:04] Ha, that's really funny [08:04] Guess that's why you check the wiki first [08:05] oof [08:05] Well uh... I guess we can check everything's there for sure :P [08:05] (I'm saying that to myself as well, I should've known better lol) [08:05] Users we grabbed: https://github.com/ArchiveTeam/theartistunion-items [08:06] As well as the code: https://github.com/ArchiveTeam/theartistunion-grab [08:07] It looks like it was grabbed around August/September 2019. How much new music might there be to grab? [08:07] Also, wiki link for reference --> https://www.archiveteam.org/index.php?title=The_Artist_Union [08:08] Yeah, that would be the question. Should be pretty easy to find out and also shouldn't be too hard to just re-grab those users? [08:08] i'm guessing it'll be a couple hundred gigs at most. [08:09] most people have moved on to toneden by now [08:09] Looks like there's an IRC channel setup for it: #theabandonoftheartists [08:09] So I suggest we move the conversation there. [08:10] Agreed [08:26] We did this project already... [08:26] #theabandonoftheartists [08:27] Yep, figured that out a minute ago [08:27] One step ahead, several steps behind [08:28] *** Meli has joined #archiveteam-bs [08:29] Well I am not home... [08:31] *** Meli has quit IRC (Remote host closed the connection) [08:31] Oh oops, I was referring to the people like myself who were talking about this without checking for a wiki page first. [08:31] Not you lol. [08:39] *** Meli has joined #archiveteam-bs [09:09] *** Dj-Wawa has quit IRC (Ping timeout: 745 seconds) [09:09] *** Dj-Wawa has joined #archiveteam-bs [10:26] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:39] *** jshoard has joined #archiveteam-bs [12:06] *** jshoard_ has joined #archiveteam-bs [12:06] *** jshoard has quit IRC (Read error: Connection reset by peer) [13:20] *** lunik1 has quit IRC (Quit: :x) [13:21] *** lunik1 has joined #archiveteam-bs [14:20] My IRC client has this icon today, I think this is an easter egg or something [14:20] https://share.getcloudapp.com/9Zuj5qKm [14:20] *** Larsenv_ is now known as Larsenv [14:20] oops my nick [14:21] https://superuser.com/questions/786630/why-does-textual-have-a-party-theme-icon-for-no-obvious-reason [14:42] I did not realise Textual was open source [15:05] Laverne: It is, but if you want the already compiled version and updates you can buy it from the Mac App Store [15:05] interesting huh? [15:05] I chose to buy it [15:05] I've purchased it a year or two ago. It's the best one for the Mac and not that expensive [15:06] I had to juggle ZNC modules around a bit to get the scroll backs working nicely but I haven't had to fiddle with it since then :-) [15:35] *** Arcorann has quit IRC (Read error: Connection reset by peer) [16:28] *** Raccoon has joined #archiveteam-bs [17:42] *** Brayconn has quit IRC (Ping timeout: 252 seconds) [18:00] *** lennier1 has quit IRC (Read error: Connection reset by peer) [18:00] *** lennier1 has joined #archiveteam-bs [18:07] *** PovAddict has joined #archiveteam-bs [18:27] *** systwi_ has joined #archiveteam-bs [18:32] *** systwi has quit IRC (Read error: Operation timed out) [18:33] For the record, here are z8f6Px98C's scripts on a non-sucky host: https://transfer.notkiska.pw/16ddh4/indexing.py https://transfer.notkiska.pw/14nt6X/search.py https://transfer.notkiska.pw/30X9B/theartistunion.py [19:58] *** z8f6Px98C has quit IRC (Remote host closed the connection) [19:58] *** z8f6Px98C has joined #archiveteam-bs [20:01] *** fredgido has joined #archiveteam-bs [20:13] *** z8f6Px98C has quit IRC (Ping timeout: 622 seconds) [20:41] *** z8f6Px98C has joined #archiveteam-bs [20:48] *** z8f6Px98C has quit IRC (Read error: Operation timed out) [21:08] *** Nikchemny has joined #archiveteam-bs [21:08] It's 2:08 for me [21:09] And 208 is my favourite number [21:09] Hm [21:09] *** lennier2 has joined #archiveteam-bs [21:13] *** PovAddict has quit IRC (Quit: router restart) [21:15] *** lennier1 has quit IRC (Ping timeout: 496 seconds) [21:15] *** lennier2 is now known as lennier1 [21:25] *** Nikchemny has quit IRC (Quit: Page closed) [21:29] Doctissimo shifted their deletion of most Sexualité forums from tomorrow to 1 Sept: https://forum.doctissimo.fr/doctissimo/Addiction-sexuelle/nouvelles-importantes-suppression-sujet_10316_1.htm [21:36] *** jshoard_ has quit IRC (Leaving) [21:57] *** revi has quit IRC (Read error: Connection reset by peer) [21:57] *** revi has joined #archiveteam-bs [21:57] *** Kaz has quit IRC (Read error: Connection reset by peer) [21:59] *** Kaz has joined #archiveteam-bs [22:24] *** systwi_ is now known as systwi [22:34] *** exoire has joined #archiveteam-bs [22:35] *** exoire has left [22:41] *** lennier2 has joined #archiveteam-bs [22:44] *** lennier1 has quit IRC (Ping timeout: 260 seconds) [22:45] *** lennier2 is now known as lennier1 [23:10] *** britmob_ has quit IRC (Read error: Connection reset by peer) [23:12] *** britmob has joined #archiveteam-bs [23:12] *** Arcorann has joined #archiveteam-bs [23:49] *** Ravenloft has joined #archiveteam-bs [23:58] *** godane has quit IRC (Ping timeout: 265 seconds)