#archiveteam-bs 2020-07-23,Thu

↑back Search

Time Nickname Message
00:04 πŸ”— HP_Archiv has quit IRC (Client Quit)
00:10 πŸ”— BlueMax has joined #archiveteam-bs
00:26 πŸ”— step has joined #archiveteam-bs
01:51 πŸ”— Smiley has joined #archiveteam-bs
01:53 πŸ”— BlueMaxim has joined #archiveteam-bs
01:53 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
01:54 πŸ”— SmileyG has quit IRC (Read error: Operation timed out)
02:00 πŸ”— ndiddy_ has joined #archiveteam-bs
02:00 πŸ”— ndiddy_ has left
02:01 πŸ”— Arcorann has quit IRC (Read error: Connection reset by peer)
02:03 πŸ”— Arcorann has joined #archiveteam-bs
03:33 πŸ”— xit has joined #archiveteam-bs
03:45 πŸ”— qw3rty_ has joined #archiveteam-bs
03:53 πŸ”— qw3rty__ has quit IRC (Read error: Operation timed out)
04:05 πŸ”— godane has quit IRC (Ping timeout: 260 seconds)
04:20 πŸ”— godane has joined #archiveteam-bs
04:28 πŸ”— BlueMaxim has quit IRC (Read error: Connection reset by peer)
04:33 πŸ”— godane !ao https://www.theblaze.com/news/kim-kardashian-kanye-west-mental-health
04:44 πŸ”— nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
04:45 πŸ”— nepeat has joined #archiveteam-bs
05:03 πŸ”— fuzzy802 has joined #archiveteam-bs
05:05 πŸ”— fuzzy8021 has quit IRC (Read error: Operation timed out)
05:05 πŸ”— jodizzle Wrong channel
05:13 πŸ”— fuzzy802 is now known as fuzzy8021
05:17 πŸ”— fuzzy8021 has quit IRC (Read error: Connection reset by peer)
05:17 πŸ”— fuzzy8021 has joined #archiveteam-bs
05:19 πŸ”— nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
05:19 πŸ”— nepeat has joined #archiveteam-bs
05:26 πŸ”— BlueMax has joined #archiveteam-bs
06:26 πŸ”— antomati_ has joined #archiveteam-bs
06:27 πŸ”— Wingy has quit IRC (Read error: Operation timed out)
06:27 πŸ”— asdf0101 has quit IRC (Read error: Operation timed out)
06:28 πŸ”— SynMonger has quit IRC (Read error: Operation timed out)
06:28 πŸ”— Jake has quit IRC (Read error: Operation timed out)
06:28 πŸ”— qw3rty has joined #archiveteam-bs
06:28 πŸ”— SynMonger has joined #archiveteam-bs
06:34 πŸ”— Terbium has quit IRC (Read error: Operation timed out)
06:34 πŸ”— dxrt_ has quit IRC (Read error: Operation timed out)
06:34 πŸ”— antomatic has quit IRC (Read error: Operation timed out)
06:34 πŸ”— Mayeau has quit IRC (Read error: Operation timed out)
06:34 πŸ”— Terbium has joined #archiveteam-bs
06:34 πŸ”— sembiance has quit IRC (Read error: Operation timed out)
06:34 πŸ”— qw3rty_ has quit IRC (Read error: Operation timed out)
06:34 πŸ”— Mayeau has joined #archiveteam-bs
06:35 πŸ”— t3 has quit IRC (Quit: Connection closed for inactivity)
06:35 πŸ”— systwi has quit IRC (Read error: Operation timed out)
06:36 πŸ”— sembiance has joined #archiveteam-bs
06:36 πŸ”— Jake7 has joined #archiveteam-bs
06:36 πŸ”— asdf0101 has joined #archiveteam-bs
06:36 πŸ”— systwi has joined #archiveteam-bs
06:37 πŸ”— Jake8 has joined #archiveteam-bs
06:37 πŸ”— paul2520 has quit IRC (Ping timeout: 622 seconds)
06:38 πŸ”— dxrt_ has joined #archiveteam-bs
06:38 πŸ”— dxrt sets mode: +o dxrt_
06:38 πŸ”— Jake8 has quit IRC (Client Quit)
06:39 πŸ”— Jake1 has joined #archiveteam-bs
06:40 πŸ”— paul2520 has joined #archiveteam-bs
06:40 πŸ”— Jake1 has quit IRC (Client Quit)
06:41 πŸ”— Jake2 has joined #archiveteam-bs
06:42 πŸ”— Jake2 has quit IRC (Client Quit)
06:43 πŸ”— Jake1 has joined #archiveteam-bs
06:44 πŸ”— Jake1 has quit IRC (Client Quit)
06:45 πŸ”— Jake4 has joined #archiveteam-bs
06:46 πŸ”— Jake4 has quit IRC (Client Quit)
06:47 πŸ”— Jake7 has quit IRC (Ping timeout: 622 seconds)
06:47 πŸ”— Jake9 has joined #archiveteam-bs
06:49 πŸ”— Jake3 has joined #archiveteam-bs
06:50 πŸ”— mtntmnky has quit IRC (Read error: Operation timed out)
06:50 πŸ”— robogoat has quit IRC (Write error: Broken pipe)
06:50 πŸ”— prq has quit IRC (Write error: Broken pipe)
06:51 πŸ”— robogoat has joined #archiveteam-bs
06:51 πŸ”— jrwr has quit IRC (Ping timeout: 260 seconds)
06:51 πŸ”— nyany has quit IRC (Read error: Operation timed out)
06:51 πŸ”— Raccoon` has joined #archiveteam-bs
06:52 πŸ”— phirephl- has quit IRC (Read error: Operation timed out)
06:52 πŸ”— Pixi` has joined #archiveteam-bs
06:52 πŸ”— twigfoot has quit IRC (Read error: Operation timed out)
06:52 πŸ”— DigiDigi has quit IRC (Read error: Operation timed out)
06:53 πŸ”— Kaz has quit IRC (Ping timeout: 260 seconds)
06:53 πŸ”— revi has quit IRC (Ping timeout: 260 seconds)
06:53 πŸ”— Igloo has quit IRC (Read error: Operation timed out)
06:53 πŸ”— Darkstar has quit IRC (Read error: Operation timed out)
06:53 πŸ”— svchfoo1 has quit IRC (Read error: Operation timed out)
06:54 πŸ”— dxrt has quit IRC (Read error: Operation timed out)
06:54 πŸ”— dxrt has joined #archiveteam-bs
06:54 πŸ”— Iglooop1 has quit IRC (Read error: Operation timed out)
06:54 πŸ”— Raccoon has quit IRC (Ping timeout: 376 seconds)
06:54 πŸ”— chfoo has quit IRC (Read error: Operation timed out)
06:55 πŸ”— svchfoo3 sets mode: +o dxrt
06:55 πŸ”— Pixi has quit IRC (Read error: Operation timed out)
06:55 πŸ”— chfoo has joined #archiveteam-bs
06:56 πŸ”— svchfoo3 sets mode: +o chfoo
06:56 πŸ”— Larsenv_ has joined #archiveteam-bs
06:56 πŸ”— phirephly has joined #archiveteam-bs
06:56 πŸ”— lennier2 has joined #archiveteam-bs
06:57 πŸ”— Kaz has joined #archiveteam-bs
06:57 πŸ”— Meli has quit IRC (Ping timeout: 272 seconds)
06:57 πŸ”— Raccoon` has quit IRC (Remote host closed the connection)
06:58 πŸ”— kisspunch has quit IRC (Ping timeout: 272 seconds)
06:58 πŸ”— Gfy has quit IRC (Ping timeout: 272 seconds)
06:58 πŸ”— LordNigh2 has joined #archiveteam-bs
06:58 πŸ”— omglolba- has joined #archiveteam-bs
06:58 πŸ”— Larsenv has quit IRC (Read error: Operation timed out)
06:59 πŸ”— ndiddy has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— zerkalo has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— omglolbah has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— Laverne has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— sHATNER has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— Jake9 has quit IRC (Ping timeout: 622 seconds)
06:59 πŸ”— lennier1 has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— Lord_Nigh has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— LordNigh2 is now known as Lord_Nigh
06:59 πŸ”— Maylay has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— brayden has quit IRC (Ping timeout: 272 seconds)
06:59 πŸ”— lennier2 is now known as lennier1
07:01 πŸ”— Gfy has joined #archiveteam-bs
07:02 πŸ”— Maylay has joined #archiveteam-bs
07:04 πŸ”— kisspunch has joined #archiveteam-bs
07:04 πŸ”— Jake4 has joined #archiveteam-bs
07:07 πŸ”— Jake3 has quit IRC (Remote host closed the connection)
07:07 πŸ”— Jake1 has joined #archiveteam-bs
07:09 πŸ”— Jake1 has quit IRC (Client Quit)
07:09 πŸ”— Arcorann has quit IRC (Read error: Connection reset by peer)
07:09 πŸ”— Jake2 has joined #archiveteam-bs
07:10 πŸ”— twigfoot has joined #archiveteam-bs
07:10 πŸ”— Arcorann has joined #archiveteam-bs
07:11 πŸ”— Jake2 has quit IRC (Client Quit)
07:11 πŸ”— Jake67 has joined #archiveteam-bs
07:11 πŸ”— DigiDigi has joined #archiveteam-bs
07:11 πŸ”— Darkstar has joined #archiveteam-bs
07:11 πŸ”— nyany has joined #archiveteam-bs
07:12 πŸ”— svchfoo3 sets mode: +o nyany
07:13 πŸ”— Jake67 has quit IRC (Client Quit)
07:13 πŸ”— prq has joined #archiveteam-bs
07:13 πŸ”— Jake2 has joined #archiveteam-bs
07:15 πŸ”— Igloo has joined #archiveteam-bs
07:15 πŸ”— Jake2 has quit IRC (Client Quit)
07:16 πŸ”— jrwr has joined #archiveteam-bs
07:16 πŸ”— svchfoo3 sets mode: +o jrwr
07:17 πŸ”— Jake4 has quit IRC (Ping timeout: 622 seconds)
07:17 πŸ”— Jake4 has joined #archiveteam-bs
07:18 πŸ”— mtntmnky has joined #archiveteam-bs
07:18 πŸ”— revi has joined #archiveteam-bs
07:26 πŸ”— Brayconn has joined #archiveteam-bs
07:27 πŸ”— jodizzle Brayconn: Could you provide a link to the site?
07:27 πŸ”— Brayconn Yeah, let me get something else here quick...
07:28 πŸ”— z8f6Px98C has joined #archiveteam-bs
07:28 πŸ”— Brayconn πŸ‘‹
07:29 πŸ”— Brayconn Oh nice, I can send unicode stuff in here
07:29 πŸ”— z8f6Px98C sup!
07:29 πŸ”— Brayconn Ok, so site link is https://theartistunion.com/
07:30 πŸ”— Jake4 Seems like a bunch of copywritten music?
07:30 πŸ”— z8f6Px98C It's music hidden behind a download gate. The artist offers a free download (mostly .wav and 320kbps) of a track, but it's hidden behind a "like, repost, follow" button.
07:31 πŸ”— z8f6Px98C I wrote something that needs a dummy SoundCloud account and can pull the file hidden behind those buttons, don't know if Brayconn already sent that one.
07:32 πŸ”— z8f6Px98C Currently the biggest issue is indexing everything.
07:32 πŸ”— Brayconn (I haven't sent any script links yet)
07:32 πŸ”— jodizzle Hm okay, the site is all Javascript, which is a problem for easy recursive archival. We can get certain pages, but probably not a recursive crawl on the whole thing.
07:32 πŸ”— jodizzle So we'll have to be selective about important pages. But it seems like the focus is mainly on the songs and not the webpages?
07:32 πŸ”— z8f6Px98C It uses algolia as a backend, that makes it easier. I've polled 720 000 search results as we speak
07:33 πŸ”— z8f6Px98C you can only make each search return 1000 results max, the upper limit cannot be changed since it requires the admin api key and we only have access to the search one
07:33 πŸ”— z8f6Px98C hold on i'm gonna upload those scripts to wetransfer
07:34 πŸ”— z8f6Px98C my current approach would have been to split the results into chunks of 1k results and then spread that across multiple computers
07:34 πŸ”— z8f6Px98C https://we.tl/t-Ogt8pfwBNN
07:35 πŸ”— z8f6Px98C search.py generates strings with a specified length and goes from 'aaa' to '999' for example. it outputs dicts, which can then be assembled into a single big list.
07:37 πŸ”— z8f6Px98C indexing.py removes duplicates and can sort everything into chunks of 1000. it can also sort the results by the number of downloads, since i feel like the ones with many downloads should probably have priority over those with 0 results
07:37 πŸ”— z8f6Px98C so far i've only gathered all the search results for strings with up to a length of 3, which is already 2.5gb worth of json.
07:39 πŸ”— jodizzle And Brayconn mentioned that there are expected to be 713945-ish songs in total on the site?
07:39 πŸ”— z8f6Px98C 730k
07:40 πŸ”— PovAddict has quit IRC (Quit: Konversation terminated!)
07:40 πŸ”— Brayconn Actually, I said that having only seen the 3 letter index
07:40 πŸ”— Brayconn Do we know if there'd be more searching 4 letters + ?
07:40 πŸ”— sHATNER has joined #archiveteam-bs
07:40 πŸ”— z8f6Px98C total search results were 2.2 million, but with all the duplicates removed 730k are left
07:41 πŸ”— z8f6Px98C there are probably more, but searching for 4 digit combinations will take 36 times the amount of time
07:41 πŸ”— brayden has joined #archiveteam-bs
07:41 πŸ”— Laverne has joined #archiveteam-bs
07:42 πŸ”— z8f6Px98C i can upload the result files if anyone wants me to
07:42 πŸ”— jodizzle And the songs come in the form of .mp3s or similar, right? (I'm just looking at my network tab.)
07:43 πŸ”— z8f6Px98C mp3, wav, it's up to the artist to decide. the api, when looking at the network tab, exposes an mp3 file, but that's only the preview
07:44 πŸ”— z8f6Px98C as an example, take this url: https://theartistunion.com/api/v3/users/ruvlo/tracks.json
07:45 πŸ”— Meli has joined #archiveteam-bs
07:45 πŸ”— z8f6Px98C the audio_source element inside each list item exposes the mp3:"https://d2tml28x3t0b85.cloudfront.net/tracks/stream_files/001/240/053/original/BLVK%20SHEEP%20&%20NVADRZ%20-%20STATIC%20%28RUVLO%20REMIX%29.mp3?1594765017"
07:46 πŸ”— z8f6Px98C even if it says mp3 the downloadable file is a wav
07:46 πŸ”— z8f6Px98C "url":"https://d2tml28x3t0b85.cloudfront.net/tracks/original_files/001/127/461/original/RUVLO%20-%20DAMNED%20%28ELEMENT%20115%29.wav?1571166021"
07:48 πŸ”— jodizzle Okay, cool. I was wondering if it was some streaming/playlist/partitioned setup (with M3U's or something). But it looks like it's not?
07:48 πŸ”— z8f6Px98C nope, just the raw files
07:49 πŸ”— z8f6Px98C by the way, i feel like i should mention this. theartistunion has a reputation for being incredibly unreliable. it will often times return 504 gateway errors because the site went down temporarily, often just for like a minute. this happens sporadically and that's been the case ever since the site launched
07:52 πŸ”— jodizzle Okay, good to know.
07:53 πŸ”— jodizzle So this seems like a Warrior project to me (https://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior).
07:53 πŸ”— jodizzle Probably both for the enumeration of the song links (trying different search combinations) and the downloading of the songs.
07:54 πŸ”— jodizzle arkiver: ^
07:55 πŸ”— z8f6Px98C yup, agreed.
07:57 πŸ”— jodizzle Thank you for your efforts. arkiver will probably take a while to respond.
07:58 πŸ”— z8f6Px98C i'm always happy to help :)
07:58 πŸ”— Brayconn Should one of us make a wiki page for the site?
07:59 πŸ”— jodizzle Sure, that would be helpful. I'll add it to the Deathwatch page.
07:59 πŸ”— z8f6Px98C in the meantime here are all the 2.2 million search results that can be organized using indexing.py
07:59 πŸ”— z8f6Px98C https://we.tl/t-uM4qbH4ibK
08:00 πŸ”— Brayconn send help I can't find the new page button
08:01 πŸ”— Brayconn oh ok wait I got to it in a roundabout way
08:02 πŸ”— Brayconn Also turns out it already has a page, and has supposedly been saved...?!
08:04 πŸ”— Meli has quit IRC (Quit: After 1w 1d 16h 32m 33s of wasteful lurking, 's brain 63gf4u1ted! X_x)
08:04 πŸ”— Jake4 Seems as though we already grabbed the site: Warrior tracker here? https://tracker.archiveteam.org/theartistunion/
08:04 πŸ”— jodizzle Ha, that's really funny
08:04 πŸ”— jodizzle Guess that's why you check the wiki first
08:05 πŸ”— Brayconn oof
08:05 πŸ”— Brayconn Well uh... I guess we can check everything's there for sure :P
08:05 πŸ”— jodizzle (I'm saying that to myself as well, I should've known better lol)
08:05 πŸ”— Jake4 Users we grabbed: https://github.com/ArchiveTeam/theartistunion-items
08:06 πŸ”— Jake4 As well as the code: https://github.com/ArchiveTeam/theartistunion-grab
08:07 πŸ”— Arcorann It looks like it was grabbed around August/September 2019. How much new music might there be to grab?
08:07 πŸ”— Arcorann Also, wiki link for reference --> https://www.archiveteam.org/index.php?title=The_Artist_Union
08:08 πŸ”— Jake4 Yeah, that would be the question. Should be pretty easy to find out and also shouldn't be too hard to just re-grab those users?
08:08 πŸ”— z8f6Px98C i'm guessing it'll be a couple hundred gigs at most.
08:09 πŸ”— z8f6Px98C most people have moved on to toneden by now
08:09 πŸ”— jodizzle Looks like there's an IRC channel setup for it: #theabandonoftheartists
08:09 πŸ”— jodizzle So I suggest we move the conversation there.
08:10 πŸ”— Brayconn Agreed
08:26 πŸ”— kiska We did this project already...
08:26 πŸ”— kiska #theabandonoftheartists
08:27 πŸ”— jodizzle Yep, figured that out a minute ago
08:27 πŸ”— jodizzle One step ahead, several steps behind
08:28 πŸ”— Meli has joined #archiveteam-bs
08:29 πŸ”— kiska Well I am not home...
08:31 πŸ”— Meli has quit IRC (Remote host closed the connection)
08:31 πŸ”— jodizzle Oh oops, I was referring to the people like myself who were talking about this without checking for a wiki page first.
08:31 πŸ”— jodizzle Not you lol.
08:39 πŸ”— Meli has joined #archiveteam-bs
09:09 πŸ”— Dj-Wawa has quit IRC (Ping timeout: 745 seconds)
09:09 πŸ”— Dj-Wawa has joined #archiveteam-bs
10:26 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
11:39 πŸ”— jshoard has joined #archiveteam-bs
12:06 πŸ”— jshoard_ has joined #archiveteam-bs
12:06 πŸ”— jshoard has quit IRC (Read error: Connection reset by peer)
13:20 πŸ”— lunik1 has quit IRC (Quit: :x)
13:21 πŸ”— lunik1 has joined #archiveteam-bs
14:20 πŸ”— Larsenv_ My IRC client has this icon today, I think this is an easter egg or something
14:20 πŸ”— Larsenv_ https://share.getcloudapp.com/9Zuj5qKm
14:20 πŸ”— Larsenv_ is now known as Larsenv
14:20 πŸ”— Larsenv oops my nick
14:21 πŸ”— Larsenv https://superuser.com/questions/786630/why-does-textual-have-a-party-theme-icon-for-no-obvious-reason
14:42 πŸ”— Laverne I did not realise Textual was open source
15:05 πŸ”— Larsenv Laverne: It is, but if you want the already compiled version and updates you can buy it from the Mac App Store
15:05 πŸ”— Larsenv interesting huh?
15:05 πŸ”— Larsenv I chose to buy it
15:05 πŸ”— Laverne I've purchased it a year or two ago. It's the best one for the Mac and not that expensive
15:06 πŸ”— Laverne I had to juggle ZNC modules around a bit to get the scroll backs working nicely but I haven't had to fiddle with it since then :-)
15:35 πŸ”— Arcorann has quit IRC (Read error: Connection reset by peer)
16:28 πŸ”— Raccoon has joined #archiveteam-bs
17:42 πŸ”— Brayconn has quit IRC (Ping timeout: 252 seconds)
18:00 πŸ”— lennier1 has quit IRC (Read error: Connection reset by peer)
18:00 πŸ”— lennier1 has joined #archiveteam-bs
18:07 πŸ”— PovAddict has joined #archiveteam-bs
18:27 πŸ”— systwi_ has joined #archiveteam-bs
18:32 πŸ”— systwi has quit IRC (Read error: Operation timed out)
18:33 πŸ”— JAA For the record, here are z8f6Px98C's scripts on a non-sucky host: https://transfer.notkiska.pw/16ddh4/indexing.py https://transfer.notkiska.pw/14nt6X/search.py https://transfer.notkiska.pw/30X9B/theartistunion.py
19:58 πŸ”— z8f6Px98C has quit IRC (Remote host closed the connection)
19:58 πŸ”— z8f6Px98C has joined #archiveteam-bs
20:01 πŸ”— fredgido has joined #archiveteam-bs
20:13 πŸ”— z8f6Px98C has quit IRC (Ping timeout: 622 seconds)
20:41 πŸ”— z8f6Px98C has joined #archiveteam-bs
20:48 πŸ”— z8f6Px98C has quit IRC (Read error: Operation timed out)
21:08 πŸ”— Nikchemny has joined #archiveteam-bs
21:08 πŸ”— Nikchemny It's 2:08 for me
21:09 πŸ”— Nikchemny And 208 is my favourite number
21:09 πŸ”— Nikchemny Hm
21:09 πŸ”— lennier2 has joined #archiveteam-bs
21:13 πŸ”— PovAddict has quit IRC (Quit: router restart)
21:15 πŸ”— lennier1 has quit IRC (Ping timeout: 496 seconds)
21:15 πŸ”— lennier2 is now known as lennier1
21:25 πŸ”— Nikchemny has quit IRC (Quit: Page closed)
21:29 πŸ”— JAA Doctissimo shifted their deletion of most SexualitΓ© forums from tomorrow to 1 Sept: https://forum.doctissimo.fr/doctissimo/Addiction-sexuelle/nouvelles-importantes-suppression-sujet_10316_1.htm
21:36 πŸ”— jshoard_ has quit IRC (Leaving)
21:57 πŸ”— revi has quit IRC (Read error: Connection reset by peer)
21:57 πŸ”— revi has joined #archiveteam-bs
21:57 πŸ”— Kaz has quit IRC (Read error: Connection reset by peer)
21:59 πŸ”— Kaz has joined #archiveteam-bs
22:24 πŸ”— systwi_ is now known as systwi
22:34 πŸ”— exoire has joined #archiveteam-bs
22:35 πŸ”— exoire has left
22:41 πŸ”— lennier2 has joined #archiveteam-bs
22:44 πŸ”— lennier1 has quit IRC (Ping timeout: 260 seconds)
22:45 πŸ”— lennier2 is now known as lennier1
23:10 πŸ”— britmob_ has quit IRC (Read error: Connection reset by peer)
23:12 πŸ”— britmob has joined #archiveteam-bs
23:12 πŸ”— Arcorann has joined #archiveteam-bs
23:49 πŸ”— Ravenloft has joined #archiveteam-bs
23:58 πŸ”— godane has quit IRC (Ping timeout: 265 seconds)

irclogger-viewer