#archiveteam-bs 2019-03-23,Sat

↑back Search

Time Nickname Message
00:17 🔗 Dj-Wawa has joined #archiveteam-bs
00:26 🔗 Exairnous has joined #archiveteam-bs
00:40 🔗 wp494 has quit IRC (Ping timeout: 364 seconds)
00:44 🔗 wp494 has joined #archiveteam-bs
00:50 🔗 godane Cat Say No! : https://www.youtube.com/watch?v=cMESRatAG04
00:59 🔗 killsushi has quit IRC (Quit: Leaving)
01:00 🔗 Stilett0 has joined #archiveteam-bs
01:03 🔗 Stiletto has quit IRC (Ping timeout: 492 seconds)
01:22 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
01:23 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
01:32 🔗 Odd0002 has joined #archiveteam-bs
01:43 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
01:43 🔗 Odd0002 has joined #archiveteam-bs
04:11 🔗 MrRadar chfoo: I have submitted pull request #113 against seesaw-kit to fix the "rsync missing folder" issue
04:11 🔗 MrRadar That seems to be plaguging the Google Plus grab in particular
04:13 🔗 * MrRadar AFK
04:21 🔗 chfoo nice, i'll take a look at it soon
04:24 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
04:27 🔗 qw3rty117 has joined #archiveteam-bs
04:33 🔗 qw3rty116 has quit IRC (Read error: Operation timed out)
04:37 🔗 SimpBrain has quit IRC (Remote host closed the connection)
04:38 🔗 SimpBrain has joined #archiveteam-bs
04:44 🔗 odemgi has joined #archiveteam-bs
04:46 🔗 odemgi_ has quit IRC (Ping timeout: 252 seconds)
04:52 🔗 odemg has quit IRC (Ping timeout: 615 seconds)
04:53 🔗 ndiddy has quit IRC ()
04:59 🔗 odemg has joined #archiveteam-bs
05:28 🔗 dhyan_nat has joined #archiveteam-bs
05:37 🔗 Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
06:05 🔗 S1mpbrain has joined #archiveteam-bs
06:06 🔗 SimpBrain has quit IRC (Remote host closed the connection)
06:07 🔗 d5f4a3622 has joined #archiveteam-bs
06:43 🔗 MrRadar_ has joined #archiveteam-bs
06:45 🔗 MrRadar has quit IRC (Read error: Operation timed out)
07:06 🔗 Mateon1 has joined #archiveteam-bs
07:07 🔗 Despatche has quit IRC (Read error: Operation timed out)
07:57 🔗 Odd0002_ has joined #archiveteam-bs
07:59 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
07:59 🔗 Odd0002_ is now known as Odd0002
08:05 🔗 Exairnous has quit IRC (Ping timeout: 265 seconds)
08:22 🔗 schbirid lol oh man the paradisebay comments
08:22 🔗 schbirid > So basically all the money we've spent on this game is gone lmao wow
08:23 🔗 schbirid yeah, you spent money on digital accessoires
08:23 🔗 schbirid *in a third-party's world
08:24 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
08:54 🔗 S1mpbrain has quit IRC (Remote host closed the connection)
08:54 🔗 SimpBrain has joined #archiveteam-bs
09:05 🔗 eientei95 schbirid: please
09:06 🔗 eientei95 They spent money on flipping bits
09:43 🔗 wp494 has quit IRC (Ping timeout: 364 seconds)
09:45 🔗 wp494 has joined #archiveteam-bs
09:45 🔗 halt has quit IRC (hub.efnet.us efnet.deic.eu)
10:04 🔗 kode54 has quit IRC (Quit: ZNC 1.7.2 - https://znc.in)
10:10 🔗 kode54 has joined #archiveteam-bs
10:43 🔗 Dj-Wawa has joined #archiveteam-bs
11:16 🔗 dhyan_nat has joined #archiveteam-bs
11:33 🔗 VerifiedJ has quit IRC (Read error: Connection reset by peer)
11:33 🔗 VerifiedJ has joined #archiveteam-bs
12:02 🔗 MrRadar_ is now known as MrRadar
12:04 🔗 BlueMax has quit IRC (Quit: Leaving)
12:06 🔗 Stilett0 has quit IRC ()
12:28 🔗 SimpBrain has quit IRC (Read error: Operation timed out)
12:35 🔗 SimpBrain has joined #archiveteam-bs
12:37 🔗 MrRadar2 sets mode: +o MrRadar
12:43 🔗 netsound has quit IRC (Leaving)
12:58 🔗 Hani has quit IRC (Read error: Operation timed out)
13:03 🔗 Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
13:04 🔗 Hani has joined #archiveteam-bs
13:21 🔗 S1mpbrain has joined #archiveteam-bs
13:21 🔗 SimpBrain has quit IRC (Remote host closed the connection)
13:28 🔗 JAA So I wrote a little tool to search Reddit using the Pushshift API and extract URLs from the results. It found about 5300 mixtape.moe links. (Still needs some manual cleanup because parsing Markdown is hard.)
13:32 🔗 ats has quit IRC (Ping timeout: 252 seconds)
13:35 🔗 icedice JAA: Well done! Can you do the same with archived.moe? There seems to be a lot of mixtape.moe links there as well.
13:35 🔗 JAA Yep, that's the plan. Should be easier there actually.
13:38 🔗 icedice Nice
13:38 🔗 icedice Are we going to do search engine crawls as well?
13:46 🔗 lindalap has joined #archiveteam-bs
13:47 🔗 JAA I'm scraping Bing.
13:48 🔗 lindalap Mixtape.moe stats: "14,594 GBs of files", "5,846,150 uploaded files", "68,350 files uploaded this month (since the 1st)", etc.
13:48 🔗 lindalap shutting down in a week
13:49 🔗 lindalap > Mixtape serves over 450 Terabytes of files to over 7,300,000 unique visitors per month. 65% of our traffic is webm video files, 12% gif, 10% are other images (jpg, png).
13:49 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
13:50 🔗 icedice Ok, nice
13:50 🔗 JAA lindalap: Yep, working on it already.
13:50 🔗 icedice Searx.me could be usable if you get a custom URL working since it covers all other search engines when configured properly
13:50 🔗 JAA It'll be impossible to get everything because there's no index and bruteforce is infeasible.
13:51 🔗 JAA icedice: I won't stop you from doing that. :-)
13:51 🔗 lindalap I know Drybones and could ask for a file list
13:53 🔗 icedice Is it possible to grab the mixtape.moe links on Twitter as well?
13:53 🔗 JAA Oh. Well, if they're willing to do that, would be nice.
13:53 🔗 icedice Jaa: I can look into it after I wake up
13:54 🔗 icedice 450 TB x $2000 = $900 000
13:55 🔗 MrRadar2 I think 450 TB was the monthly bandwidth use. lindalap said they currently have 14.5 TB of files
13:55 🔗 icedice Ah, that's much more manageable
13:56 🔗 lindalap Yeah, also it's a Pomf.se clone and we've done that before
13:56 🔗 icedice I didn't sleep last night, so my brain is mush now
13:56 🔗 lindalap Would be a Warrior job, but the code practically the same if possible
13:56 🔗 icedice So I'm going to take a nap now
13:57 🔗 lindalap Drybones @ Rizon IRC, if you need to contact the admin of Mixtape.moe btw. I already sent an email and IRC query.
13:58 🔗 lindalap #pomfret on Rizon IRC, but I'm not there currently (looks like they enabled identification requirement)
13:58 🔗 lindalap I asked lesderid for an invite to that channel
13:58 🔗 lindalap https://twitter.com/drybones_5 on Twitter
13:58 🔗 lindalap (he tweets too much)
14:00 🔗 swebb has quit IRC (Read error: Operation timed out)
14:01 🔗 swebb has joined #archiveteam-bs
14:01 🔗 lindalap https://drybones.me/ btw
14:03 🔗 lindalap > We will not publicly share our user data and we will not sell the domain, so don't bother asking.
14:03 🔗 lindalap Doesn't seem likely, then
14:03 🔗 lindalap despite the files technically being publicly available (to those knowing the URL, which is semi-bruteforceable)
14:13 🔗 lindalap I was going to mention about a 3.1 TB archive of probably the biggest collection of console games, but looks like there was already a grab on 2018-08-13. :)
14:14 🔗 lindalap s/biggest/extensive/
14:17 🔗 JAA Not bruteforceable in this limited amount of time probably.
14:17 🔗 lindalap Yeah, it's not.
14:17 🔗 JAA There are several servers/subdomains, [a-z]{6} codes, and you need to know the file extension.
14:17 🔗 lindalap I proposed to Drybones having a copy at the IA with a file "locked" status as a compromise, we'll see...
14:17 🔗 JAA Which is about an order of magnitude too much to bruteforce.
14:19 🔗 lindalap "I was going to mention about a 3.1 TB archive" They even mentioned Archive Team as a reason for operating. :)
14:19 🔗 lindalap Currently 20 TB in total
14:19 🔗 S1mpbrain has quit IRC (Remote host closed the connection)
14:21 🔗 lindalap Oh, actually all three recursive attempts have been aborted at some point.
14:21 🔗 lindalap JAA knows about the site, though I can query to remind what it is (don't want to publicly mention it here)
14:23 🔗 icedice Is it possible to crawl Twitter for Mixtape.moe links as well?
14:26 🔗 JAA archived.moe scrape running now.
14:27 🔗 icedice Nice
14:28 🔗 lindalap I'd guess reddit is a big one for Mixtape.moe links too.
14:28 🔗 lindalap Particularly anime subreddits.
14:28 🔗 icedice That was the first one we did
14:28 🔗 lindalap Nice.
14:28 🔗 icedice Or JAA handled it, to be specific
14:29 🔗 icedice Why is Bing more scrape friendly than other search engines btw?
14:30 🔗 VerifiedJ has quit IRC (Read error: Connection reset by peer)
14:31 🔗 VerifiedJ has joined #archiveteam-bs
14:31 🔗 JAA Yup, 5320 links from Reddit.
14:31 🔗 JAA Bing scrape's done and found 223 URLs.
14:32 🔗 MrRadar2 I'm searching my last scrape of Something Awful (only circa 2016) and I'll probably have a few hundred more to contribute as well when that finishes
14:32 🔗 JAA Because Bing wouldn't get any users if it weren't for scrapers? Idk...
14:32 🔗 icedice 8chan probably has some links
14:32 🔗 icedice I'd be a bit concerned with what some of those links might contain though...
14:33 🔗 icedice Hmm
14:33 🔗 icedice Discord probably has links
14:33 🔗 icedice No centralized way of searching it though
14:34 🔗 icedice So someone would have to join all of the major anime and manga Discord servers and put together a file list
14:34 🔗 lindalap Not sure what the point is, if Discord supports attaching files anyway.
14:34 🔗 monika discord only allows ~8mb for non-nitro users
14:34 🔗 monika mixtape allows ~100mb
14:35 🔗 lindalap If you could, email Drybones and ask for help with archiving for IA... more requests the better? Come up with a compromise, maybe.
14:38 🔗 ats has joined #archiveteam-bs
14:42 🔗 icedice Any file that has over a certain number of downloads maybe?
14:42 🔗 icedice I mean, if a file has 100 downloads I'd hardly consider it private
14:43 🔗 icedice * downloads or streams
14:43 🔗 icedice Not that the server logs would know the difference
14:43 🔗 icedice Oh
14:43 🔗 JAA archived.moe scrape done, 5644 URLs found.
14:43 🔗 icedice They might not store that info
14:44 🔗 icedice So then they would not know how many times a file has been accessed
14:44 🔗 lindalap 3 days logs, IIRC
14:49 🔗 icedice Well, that could still get us some additional files if Drybones thinks it's reasonable
14:51 🔗 lindalap Previously, Drybones hasn't been as cooperative as other Pomf clones from my experience. For example, refused many times to take my advice to register a copyright agent at the USCO to avoid DMCA liability. That's the kind of operator to deal with here...
14:53 🔗 lindalap In other things to discuss: Can someone !a http://kartat.kapsi.fi/ for me, please? Maps from Finnish government, mirrored from https://www.maanmittauslaitos.fi/kartat-ja-paikkatieto .
14:53 🔗 lindalap Lack of IA coverage for files.
14:54 🔗 lindalap Actually, most of it seems to be there at IA with few missing.
14:55 🔗 JAA That thing's huge.
14:55 🔗 JAA There are several datasets with hundreds of GB and a few of multiple TB.
14:56 🔗 lindalap I can get a selective list of what's missing from IA.
14:59 🔗 lindalap Getting that list now.
15:00 🔗 MrRadar2 I've got 712 unique mixtape.moe URLs from my Something Awful crawl
15:01 🔗 JAA There are few duplicates between my three scrapes, by the way. Less than a 100 out of over 11k discovered.
15:02 🔗 MrRadar2 https://2by2.info/blip/mixtape.moe.sorted.txt
15:08 🔗 lindalap Re: kartat.kapsi.fi: Some indexes are archived, but the files are not. A bit moot of me trying to handpick datasets based on their last indexed time at IA...
15:11 🔗 lindalap Except for aerial photography ("ortoilmakuva"), it's not "too big"
15:13 🔗 lindalap Some links are duplicate with same target URI, btw.
15:14 🔗 JAA It might be best to grab that as files with rsync and upload it as items. Especially considering it's only a mirror, not the original site (according to what you said above) and probably not in immediate risk. But in any case, it's large enough to first check with IA probably.
15:15 🔗 JAA There's also "Laserkeilausaineisto" with 1.7 TB and "Korkeusmalli" with over 700 GB.
15:15 🔗 lindalap 1415 GB seems to cover the other 271 GB for Laserkeilausaineisto.
15:15 🔗 lindalap So it's only ~1.4 TB
15:16 🔗 lindalap Likewise, aerial photography is 3839 GB not 12.4 TB
15:16 🔗 lindalap at least so it seems
15:19 🔗 JAA Ah yeah, duplicate entries and entries for subdirectories, great...
15:21 🔗 HashbangI has quit IRC (Remote host closed the connection)
15:25 🔗 icedice I copy pasted Mixtape.moe links from a few Discord servers I'm in: https://pastebin.com/raw/fywM0VPC
15:25 🔗 icedice ^ What was the recommended command for archiving a paste?
15:25 🔗 icedice Something with < or >
15:25 🔗 JH88 has quit IRC (Read error: Connection reset by peer)
15:26 🔗 lindalap I believe !ao < $URL
15:26 🔗 HashbangI has joined #archiveteam-bs
15:27 🔗 lindalap which requires +o or +v on the channel, I believe.
15:28 🔗 lindalap I believe there's also undocumented !a < $URL, which requires +o
15:28 🔗 icedice Thanks
15:28 🔗 JH88 has joined #archiveteam-bs
15:30 🔗 icedice I wonder if https://github.com/tsudoko/pullcord could be modified so that it output lists of URLs on the Discord servers it's run on
15:30 🔗 icedice Because copy pasting one by one is a pain in the ass
15:37 🔗 JAA icedice: Please just give links to me instead of throwing them into ArchiveBot already. I'll combine everything, ensure there are no duplicates, and grab them.
15:39 🔗 JAA Well, unless someone tells me not to for whatever reason.
15:40 🔗 ats has quit IRC (Read error: Operation timed out)
15:47 🔗 icedice Sorry, I already ran the archivation job and it already finished
15:47 🔗 icedice The links are here: https://pastebin.com/raw/fywM0VPC
15:56 🔗 icedice JAA: Do you think any of these other 4chan archivers might have earlier archived posts than archived.moe?
15:56 🔗 icedice https://www.archiveteam.org/index.php?title=4chan#List_of_Fuuka_Archivers_by_board
15:57 🔗 JAA Possibly, but I have no idea.
15:58 🔗 lindalap yuki.la isn't on that list, btw
15:59 🔗 lindalap 2008-02-02 – today
15:59 🔗 lindalap but it's not Fuuka-based, afaik
15:59 🔗 lindalap but yuki.la is also a bit delayed to display the most recent results (a day or two)
16:01 🔗 lindalap btw, yuki.la hasn't been AB'd at all
16:02 🔗 lindalap My friend also hosts a private BASC (?) based archive of 4chan boards.
16:02 🔗 lindalap It's not public.
16:02 🔗 lindalap large datasets
16:03 🔗 arkiver JAA: I guess contacting mixtape.moe yielded no results?
16:03 🔗 icedice https://desuarchive.org/_/search/text/my.mixtape.moe/
16:03 🔗 icedice "Returning only first 5000 of 22156 results found."
16:03 🔗 lindalap It's 11:03 AM or less at the US, give it time
16:03 🔗 icedice I'm going to say, that's a yes
16:03 🔗 asie make sure to filter by board
16:03 🔗 lindalap Actually Drybones tweeted one hour ago
16:04 🔗 ats has joined #archiveteam-bs
16:05 🔗 icedice https://fireden.net/ and https://warosu.org/ is probably also has links
16:06 🔗 lindalap 2560 results on rbt.asia / archive.rebeccablacktech.com
16:06 🔗 icedice If we get those three we will have covered all active 4chan archivers that archive board related to anime/manga/gaming/Japanese culture
16:07 🔗 icedice Ah, yeah Rebecca Black Tech seems to have some relevant boards as well
16:08 🔗 JAA Scraping desuarchive.org now, but it'll be incomplete since even the per-board searches produce more than 5k results at least for /a/.
16:09 🔗 icedice Yeah, https://yuki.la/ looks good as well
16:09 🔗 icedice Seems like they have most of /a/ back to 2008 archived
16:12 🔗 asie fwiw desuarchive also allows setting date limits, maybe that's a solution for /a/
16:13 🔗 icedice JAA: Isn't it possible to just go to page number X, archive that, go to the next one, and then the next one, and so on
16:13 🔗 asie working on scraping github repos for mixtape links
16:14 🔗 JAA asie: Oh, how does that work?
16:14 🔗 JAA icedice: That's what I'm doing, but it only returns 5000 results in total, i.e. there is no page 201 and up.
16:14 🔗 icedice Ah, ok
16:15 🔗 asie JAA: i'm not sure yet, but i did an initial search on github and there are 7000 (probably non-unique) mixtape URLs across git repos
16:15 🔗 asie i'm looking at the github search api now
16:15 🔗 JAA asie: I mean the date ranges on desuarchive.
16:15 🔗 asie oh
16:15 🔗 JAA And I assume that's a general feature of FoolFuuka?
16:15 🔗 asie "Date Start/Date End" on the menu, in URL they seem to be f.e. ".../start/2016-01-01/"
16:16 🔗 asie also i think so? i haven't analyzed it that deeply
16:16 🔗 asie sorry
16:16 🔗 asie and yes it does still display the first "5000" for a given date range so i presume you can use this to circumvent the limit
16:17 🔗 asie just set a date range ending at the earliest post scraped and keep going... someone'd have to try it
16:18 🔗 JAA Thanks, will try that in a bit.
16:23 🔗 icedice We could try getting some Mixtape links from 8chan by running "my.mixtape.moe site:8ch.net/v/", "my.mixtape.moe site:8ch.net/a/", "my.mixtape.moe site:8ch.net/co/", "my.mixtape.moe site:8ch.net/g/", and other Japanese or tech related boards in DuckDuckGo or Bing
16:24 🔗 icedice minus the " "
16:38 🔗 VerifiedJ has quit IRC (Read error: Connection reset by peer)
16:38 🔗 VerifiedJ has joined #archiveteam-bs
17:07 🔗 lindalap https://old.reddit.com/user/Farnsworth_The_Dog/ may get banned for reinstanting r/WPD discussion, a sub which was banned and had two other sub moderators suspended for 3 days
17:07 🔗 lindalap Also "Only reason I'm still here is for r/WatchRedditDie and our site now, otherwise I would have nuked this account."
17:07 🔗 lindalap oops, forgot #shreddit
17:13 🔗 JAA asie: I did some digging in the FoolFuuka source, and it looks like the "start" and "end" search options have existed since version 2.0.1, which was released over 4 years ago. :-)
17:14 🔗 JAA I'll rewrite my scraper to use those.
17:14 🔗 JAA Well, the "end" one.
17:29 🔗 asie JAA: https://paste.asie.pl/raw/pFdt github mixtape.moe scrape
17:30 🔗 asie wait n
17:30 🔗 asie two corrupt URLs crept in; https://paste.asie.pl/raw/FNFt is better
17:34 🔗 lindalap I'm shallow-grabbing that saidit.net sub I attempted at #archivebot manually.
17:34 🔗 lindalap with over18 cookie
17:35 🔗 lindalap Downloaded: 89 files, 1.3M in 0.3s (3.82 MB/s)
17:42 🔗 JAA asie: Thanks!
17:46 🔗 JAA Rescraping archived.moe and desuarchive.org now.
17:48 🔗 JAA On another note, apparently some my.mixtape.moe URLs redirect to track# servers instead. I don't have any example handy right now though.
17:50 🔗 asie It's true
17:50 🔗 asie I *think* it's all of them, actually.
17:50 🔗 asie The IDs should be unique across all of Mixtape, but we can check that once we have a large set of URLs
17:51 🔗 JAA Nope, I just tested a few links and got a direct download.
17:52 🔗 asie Interesting.
18:01 🔗 ndiddy has joined #archiveteam-bs
18:02 🔗 peanut has joined #archiveteam-bs
18:04 🔗 killsushi has joined #archiveteam-bs
18:04 🔗 icedice It depends on what kind of file you're downloading iirc
18:04 🔗 icedice zip --> direct download
18:04 🔗 icedice image or video file --> track# URL
18:05 🔗 icedice I think it was like that, at least
18:05 🔗 icedice Yeah, I remember now
18:06 🔗 icedice The zip file just needed to run the my.mixtape.moe URL through Wayback Machine
18:06 🔗 JAA The ones I tested were WEBM and MP4 videos.
18:06 🔗 icedice While a TIFF file redirected to track# and then I had to archive the track# URL
18:14 🔗 icedice Has https://old.reddit.com/r/WatchRedditDie been archived?
18:14 🔗 icedice If not, it should
18:25 🔗 icedice I just realized that we can get thousands of links from the major game, anime, and manga forums
18:26 🔗 icedice I'll put together a list when I get back
18:27 🔗 asie out of curiosity, should there be a channel/article for mixtape or is it too early?
18:27 🔗 asie err, not article, wiki page
18:32 🔗 netsound has joined #archiveteam-bs
18:34 🔗 asie JAA: https://paste.asie.pl/raw/voD5 very small assortment of mixtape URLs from the Polish Reddit-like "wykop.pl"; they have no real search facility for the relevant areas so i had to improvise
18:43 🔗 wp494 has quit IRC (Read error: Operation timed out)
18:43 🔗 wp494 has joined #archiveteam-bs
18:44 🔗 asie also, a suggestion to !a http://sbnc.khobbits.co.uk/log/logs/ - it's a comprehensive archive of 2011-2017 IRC logs of the most important channels of the Minecraft modding scene; the bot has been down for over a year so the site might go down at some point
18:45 🔗 asie and besides most recently everyone moved to Discord anyhow...
18:45 🔗 qwebirc96 has joined #archiveteam-bs
18:46 🔗 asie (not just the modding scene; #minecraft itself and key channels for say vanilla server administrators are present too)
19:08 🔗 qwebirc96 has quit IRC (Ping timeout: 260 seconds)
19:22 🔗 dhyan_nat has joined #archiveteam-bs
19:45 🔗 t3 has quit IRC (Quit: Connection closed for inactivity)
19:52 🔗 peanut has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
19:53 🔗 Exairnous has joined #archiveteam-bs
19:56 🔗 Despatche has joined #archiveteam-bs
20:00 🔗 VerifiedJ has quit IRC (Read error: Connection reset by peer)
20:00 🔗 VerifiedJ has joined #archiveteam-bs
20:31 🔗 t3 has joined #archiveteam-bs
20:42 🔗 JAA asie: Thanks. And I've thrown the Minecraft logs into ArchiveBot.
21:01 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
21:02 🔗 Despatche has joined #archiveteam-bs
21:12 🔗 delirein has joined #archiveteam-bs
21:13 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
22:18 🔗 JAA archived.moe and desuarchive.org reprocessed, 21711 URLs from there.
22:21 🔗 BlueMax has joined #archiveteam-bs
22:38 🔗 ndiddy has quit IRC ()
22:41 🔗 Soni has joined #archiveteam-bs
22:45 🔗 a_spook_ has joined #archiveteam-bs
22:54 🔗 a_spook_ JAA: might be silly/redundant/not a lot but I found a searchable thing to add to the mixtape.moe pile? http://index.commoncrawl.org/CC-MAIN-2019-09-index?url=*.mixtape.moe&output=json http://index.commoncrawl.org/
23:17 🔗 HashbangI has quit IRC (net_error)
23:18 🔗 HashbangI has joined #archiveteam-bs
23:27 🔗 BlueMax has quit IRC (Quit: Leaving)
23:28 🔗 JAA Drybones, the owner of Mixtape, posted on /r/Datahoarder by the way, and someone asked them if they're willing to archive everything onto IA/WBM: https://old.reddit.com/r/DataHoarder/comments/b4b7km/mixtapemoe_shutting_down/
23:30 🔗 JAA s/posted/commented/

irclogger-viewer