[00:01] so i think snscrape is working [00:01] thanks for telling me about cause i thought there was no good way to do it [00:03] :-) [00:17] good news is the list was not that much more then what i had [00:18] 2835 vs my 2397 [00:23] *** Meli has quit IRC (Quit: After 1d 8h 51m 19s of wasteful lurking, 's brain 63gf4u1ted! X_x) [00:23] *** Meli has joined #archiveteam-bs [00:42] *** BlueMax has joined #archiveteam-bs [01:17] *** lunik1 has quit IRC (Ping timeout: 265 seconds) [01:17] *** lunik1 has joined #archiveteam-bs [02:05] *** HP_Archiv has joined #archiveteam-bs [02:07] *** HP_Archiv has quit IRC (Client Quit) [02:14] *** HP_Archiv has joined #archiveteam-bs [02:49] *** DopefishJ has quit IRC (Remote host closed the connection) [02:55] *** DFJustin has joined #archiveteam-bs [03:14] *** HP_Archiv has quit IRC (Quit: Leaving) [03:28] *** atbk has quit IRC (Remote host closed the connection) [03:53] *** qw3rty_ has joined #archiveteam-bs [04:00] *** qw3rty__ has quit IRC (Read error: Operation timed out) [04:15] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [05:07] *** nicolas17 has quit IRC (Ping timeout: 745 seconds) [06:56] *** mtntmnky has quit IRC (Remote host closed the connection) [06:56] *** mtntmnky has joined #archiveteam-bs [08:18] *** schbirid has joined #archiveteam-bs [08:26] *** benjinsmi has quit IRC (Read error: Operation timed out) [08:27] *** benjins has joined #archiveteam-bs [08:59] *** BlueMax has quit IRC (Quit: Leaving) [10:01] https://trixter.oldskool.org/2020/07/14/how-to-reasonably-archive-color-magazines-to-pdf/ [11:20] *** OrIdow6 has quit IRC (Quit: Quitting.) [11:30] *** mtntmnky has quit IRC (Remote host closed the connection) [11:30] *** mtntmnky has joined #archiveteam-bs [11:43] *** Arcorann has quit IRC (Read error: Connection reset by peer) [11:53] *** OrIdow6 has joined #archiveteam-bs [11:59] *** schbirid has quit IRC (Quit: Leaving) [12:12] *** kiska has quit IRC (Remote host closed the connection) [12:13] *** kiska has joined #archiveteam-bs [12:38] *** schbirid has joined #archiveteam-bs [14:32] *** pokemonpr has joined #archiveteam-bs [15:00] *** pokemonpr has quit IRC (Ping timeout: 622 seconds) [15:06] *** pokemonpr has joined #archiveteam-bs [15:33] *** OrIdow6^2 has joined #archiveteam-bs [15:34] *** OrIdow6 has quit IRC (Ping timeout: 265 seconds) [15:35] *** OrIdow6^2 has quit IRC (Client Quit) [15:35] *** schbirid has quit IRC (Quit: Leaving) [15:38] *** OrIdow6 has joined #archiveteam-bs [15:40] *** pokemonpr has quit IRC (Ping timeout: 265 seconds) [15:51] I ran a discovery on the Dell downloads the other day (#effteepee): ~222k directories https://transfer.notkiska.pw/nIhCW/downloads.dell.com-directories and 16 TB of data in 431k files https://transfer.notkiska.pw/15CPUx/downloads.dell.com-files.gz ("filename (size)") [15:51] This is probably missing a few things where downloads.dell.com returns a page instead of a directory listing, but it should be close to the actual number. [15:55] Also, ftp.dell.com == ftp.ins.dell.com is the backend for downloads.dell.com. It's accessible through FTP, HTTP, and HTTPS. On HTTP(S), the homepage redirects to downloads.dell.com. [16:03] (Method: I retrieved the list of directories in the root via FTP, then everything else via https://downloads.dell.com/ because the FTP was quite slow for me.) [16:05] *** fuzzy8021 has quit IRC (Read error: Connection reset by peer) [16:05] HCross is looking into grabbing a copy of the FTP server, I'll try to do the downloads.dell.com site (AB job crashed). [16:06] *** fuzzy8021 has joined #archiveteam-bs [16:06] *** nepeat_ has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) [16:06] *** nepeat has joined #archiveteam-bs [17:10] *** godane1 has quit IRC (Read error: Connection reset by peer) [17:30] *** Ctrl has quit IRC (Read error: Operation timed out) [17:57] *** nicolas17 has joined #archiveteam-bs [18:07] *** Ctrl has joined #archiveteam-bs [18:21] Huh, Twitter blocked those that have blue checkmarks from making Twitter posts at the time during the hack that happened: https://www.cnet.com/news/twitters-verified-accounts-are-muzzled-and-the-jokes-go-wild/ [18:22] that was unblocked like 1 hour later [18:22] The fact there's stuff made like this https://twitter.com/TempNBCNews is worth an archive [18:23] Ryz: apparently last night a city in the US had a tornado warning and the official city government account couldn't tweet about it [18:23] Very very big oof in the timing [18:24] they should have rescheduled the tornado for a more convenient time :p [18:24] There was apparently a workaround in that brief time in which blue checkmark accounts were still able to retweet [18:24] *** Wiedi has quit IRC (Ping timeout: 265 seconds) [18:24] yeah RTs worked [18:25] Ryz: https://pbs.twimg.com/media/EdBBsGWX0AABx4N.jpg [18:30] Should probably do a proactive archive of those temporary accounts [18:31] Let's move all of our official announcements to that private platform, WCGW? [18:32] Also, lots of discussion about that was in -ot as it happened. [18:48] *** Wiedi has joined #archiveteam-bs [19:24] does twitter have a 'tweets I have liked' search filter? :/ [19:39] Twitter doesn't even have a 'tweets I have retweeted' filter (that works), so I'd be surprised if it had that. [20:10] *** Nikchemny has joined #archiveteam-bs [20:11] JAA or SketchCow : Does AT have plans for archive.st? [20:12] wut https://github.com/github/archive-program/issues/36 [20:13] VoynichCr ? [20:14] ? [20:15] Will AT save archive.st or not? Looks like it doesn't have search tool, so it must be like archive.st/aaaa, archive.st/aaab etc. [20:19] Nope, the URL pattern is https://archive.st/archive/YYYY/M/URL, so kind of similar to the WBM. [20:19] Is it shutting down or something? [20:19] JAA: I made https://archive.st/archive/2020/7/archiveteam.org/v9gd/ [20:20] Yeah, there's an ID in it as well. Impossible to enumerate. [20:20] JAA: Nope, but hter was peeep.us that is dead now. It saved the page, as user saw it. [20:20] JAA: Article about peeepus http://wikireality.ru/wiki/Peeep.us [20:20] VoynichCr: Someone doesn't understand the concept of open-source software, I guess. [20:21] Nikchemny: Sure, and how is that relevant for this service? [20:21] IDK, maybe they'll close their service too [20:22] You know, I thought about making a wikipage for it. [20:22] *page on AT wiki [20:23] Like page for IA and archive.today [20:24] JAA: Btw, I tried to save at.org again and it wrote: " ERROR! URL has already been archived. Visit the archive here: http://Archive.st/v9gd Sure you want a new copy? Click archive." [20:38] JAA: And in February I saved ED's main page ( https://archive.st/archive/2020/2/encyclopediadramatica.wiki/82g8/ ). Now I saved it again ( https://archive.st/archive/2020/7/encyclopediadramatica.wiki/snw0/ ) and it wrote nothing. Like the previous version doesn't exist. [20:42] Btw, the link is like https://archive.st/archive/YYYY/M/DOMAIN/abcd [21:17] Looks like it's actually https://archive.st/archive/YYYY/M/DOMAIN/CODE/URL if it's not the homepage. [21:18] Or at least some weird stuff comes after the code. [21:18] JAA: https://archive.st/archive/2020/7/lurkmore.to/i21b/ [21:19] Yes, look at the "archive here" link. [21:19] Anyway, yeah, the short URLs could be enumerated. [21:19] I don't think it's worth it now though. There's too much other stuff that's *actually* at risk currently. [21:20] https://archive.st/archive/YYYY/M/domain/abcd/domain/index.html - literally link of the copy [21:20] Yes but not always. [21:20] https://archive.st/archive/2020/6/www.wsj.com/7f0f/ [21:20] -> https://archive.st/archive/2020/6/www.wsj.com/7f0f/www.wsj.com/articles/california-is-examining-amazons-business-practices-11591987233.html [21:21] It's messy. [21:21] JAA, there is https://archive.st/archive/2020/7/lurkmore.to/i21b/lurkmore.to/index.html , not https://archive.st/archive/2020/7/lurkmore.to/i21b/lurkmore.to/2ch.ru or https://archive.st/archive/2020/7/lurkmore.to/i21b/lurkmore.to/Двач [21:21] Hmm [21:22] Maybe becues it's Russian [21:22] *because [21:24] JAA: hmm, the link is full English, but still index.html: https://archive.st/archive/2020/7/lurkmore.net/aj19/ [21:25] Btw, on the article anyone can read "Google remembers everything". That's stupid and funny [21:30] Mhm [21:35] Why do we need WBM, archive.today, megalodon.jp and (sometimes) archive.st when we have Google Cache? Ah, it's waste of time, Google remembers e-ve-ry-thing! [21:53] Web archival is nonsense anyway. As we all know, what goes on the internet stays on the internet forever. [21:55] That would be great if we can say this for every VK-user's page... [21:55] And every VK-community's page... [21:58] https://www.google.com/search?q=site%3Ae-reading.club Mmm, so many page for e-reading.club!! ;)) Literally EVERYTHING! Oh, robots.txt isn't real, yos [22:01] Sigh, I guess time to do the company acquisitions and shutdowns... it's been a week or two since doing this *shudders in annoyance and/or perceived pain* :c [22:01] These things really eat up that amount of time over time... ><; [22:03] I think that we need stop use word "everything" and start use cool word "something". George Harrison named one beautiful song "something". [22:08] JAA: Btw, looks like VK is mobile on archive.st https://archive.st/archive/2019/4/vk.com/eexa/vk.com/index.html [22:09] https://archive.st/archive/2019/4/vk.com/eexa/April272019328pm-mix5cctaroftz5elr4dze86qc1ansvov.jpg [22:40] *** Mateon1 has quit IRC (Ping timeout: 272 seconds) [22:41] *** Nikchemny has quit IRC (Quit: https://mibbit.com Online IRC Client) [22:44] *** lennier2 has joined #archiveteam-bs [22:45] *** lennier2 has quit IRC (Read error: Connection reset by peer) [22:45] *** lennier2 has joined #archiveteam-bs [22:53] *** lennier1 has quit IRC (Read error: Operation timed out) [22:53] *** lennier2 is now known as lennier1 [23:19] *** lennier1 has quit IRC (Quit: Going offline, see ya! (www.adiirc.com)) [23:26] *** lennier1 has joined #archiveteam-bs [23:30] *** notroot2 has quit IRC (ZNC 1.8.1 - https://znc.in)