[00:04] *** BlueMax has joined #archiveteam-bs [00:04] *** BlueMaxim has joined #archiveteam-bs [00:08] *** BlueMaxim has quit IRC (Client Quit) [00:13] *** BlueMax has quit IRC (Quit: Leaving) [00:13] *** BlueMax has joined #archiveteam-bs [00:49] *** xit has quit IRC (Quit: Connection closed for inactivity) [01:24] *** katocala has quit IRC (Ping timeout: 496 seconds) [01:25] *** katocala has joined #archiveteam-bs [01:34] *** lennier2 has joined #archiveteam-bs [01:40] *** lennier1 has quit IRC (Ping timeout: 492 seconds) [01:40] *** lennier2 is now known as lennier1 [02:49] *** HP_Archiv has joined #archiveteam-bs [03:21] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:30] Neither size or unique URLs [03:30] The biggest is Alexa Internet. [03:31] *** VADemon has joined #archiveteam-bs [03:42] *** DLoader_ has joined #archiveteam-bs [03:47] *** Raccoon has quit IRC (Ping timeout: 745 seconds) [03:53] *** DLoader has quit IRC (Ping timeout: 745 seconds) [03:53] *** DLoader_ is now known as DLoader [04:20] SketchCow, how much does Alexa Internet contribute to the overall daily content ingestion into WBM? By perecentage and by the filesize? [04:23] *** vitzli has joined #archiveteam-bs [04:54] *** Raccoon has joined #archiveteam-bs [05:34] Yeah..... I'm not going to go into that. [05:35] I'll just tell you, they contribute more. [05:38] *** Raccoon has quit IRC (Ping timeout: 745 seconds) [05:39] *** vitzli has quit IRC (Leaving) [05:58] *** kiska1825 has quit IRC (Remote host closed the connection) [05:58] *** Ryz has quit IRC (Remote host closed the connection) [05:59] *** Ryz has joined #archiveteam-bs [05:59] *** kiska1825 has joined #archiveteam-bs [07:08] *** PovAddict has joined #archiveteam-bs [07:08] *** nicolas17 has quit IRC (Read error: Connection reset by peer) [08:00] *** jshoard has joined #archiveteam-bs [08:11] *** atomicthu has quit IRC (Read error: Operation timed out) [08:40] *** atomicthu has joined #archiveteam-bs [09:24] *** atomicthu has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [09:27] *** atomicthu has joined #archiveteam-bs [09:49] *** Tugboat has joined #archiveteam-bs [09:49] *** qw3rty has joined #archiveteam-bs [10:08] *** BeefyBoot has joined #archiveteam-bs [10:09] hey y'all please let me know if this isn't a good place to ask for such [10:09] recently a large number of subreddits were banned by the admins of Reddit. Full list here: https://redd.it/hi41t2 [10:09] I'd be interested to know if there's any way to retrieve top posts from a banned sub [10:10] Especially for NSFW ones, archive.org doesn't seem to be grabbing content from reddit.com/r/xyzsubname/top/all because of the NSFW confirmation being on top [11:59] *** BlueMax has quit IRC (Quit: Leaving) [12:47] *** maxfan8 has quit IRC (Ping timeout: 260 seconds) [12:58] *** maxfan8 has joined #archiveteam-bs [12:59] you mean retrieve from reddit? once it's banned, it's gone... [13:00] as for archiving it, that's being tracked https://www.archiveteam.org/index.php?title=Reddit and the project has a dedicated IRC channel [13:38] *** Raccoon has joined #archiveteam-bs [13:53] ty [14:10] *** jshoard has quit IRC (Leaving) [14:17] *** godane has quit IRC (Read error: Operation timed out) [14:18] *** godane has joined #archiveteam-bs [14:54] *** fredgido has joined #archiveteam-bs [14:56] *** hook54321 has quit IRC () [14:57] *** hook54321 has joined #archiveteam-bs [14:58] *** fredgido_ has quit IRC (Ping timeout: 744 seconds) [15:12] *** maxfan8 has quit IRC (Quit: WeeChat 2.8) [15:12] *** maxfan8 has joined #archiveteam-bs [15:19] *** maxfan8 has quit IRC (WeeChat 2.8) [15:19] *** maxfan8 has joined #archiveteam-bs [15:26] *** Arcorann_ has quit IRC (Read error: Operation timed out) [15:57] UrbanDictionary surviving as long as it has was a bit of a miracle/obtuseness. It's been worth a mirroring for years. [15:58] Also, "SJWitis" for "I realize half my site is offensive shit" is... poor [16:00] urbandictionary already had an obscure moderation system in place for the last handful of years at least [16:00] Mega obscure. Made them 10% less 4channy [16:02] All I can say about the moderation system was that a couple memey entries I had made were irrelevant by the time they looked over them to decide if they would be approved [16:04] It's likely that a lot of content had already started being cut out by the time moderation made it into the site. Not sure if they were removing older racist terms from that time also. [16:39] BeefyBoot: try https://github.com/pushshift/api [16:43] VADemon: it was already suggested to me in the IRC for https://www.archiveteam.org/index.php?title=Reddit :D [16:43] but thanks a ton [16:46] *** VADemon has quit IRC (left4dead) [17:02] *** HP_Archiv has quit IRC (Quit: Leaving) [17:19] *** VerifiedJ has joined #archiveteam-bs [17:19] I think there's value in *describing* racist terms, as distinct from overt racism [17:21] the quality of definitions on UD is er... variable [17:21] to say the least :p [18:14] *** godane has quit IRC (Ping timeout: 265 seconds) [18:21] UD is a really good example of an early 2000s site that just didn't die [18:29] *** godane has joined #archiveteam-bs [19:01] How are we gonna archive https://www.urbandictionary.com/ ? It might be too big for AB~ Unsure if qwarc would work [19:02] sounds like a good candidate for scripts [19:03] *** VoynichCr has left [19:03] Ryz: I intend to try it once I finish my current iteration of qwarc dev work. [19:06] *** VoynichCr has joined #archiveteam-bs [19:36] *** BeefyBoot has quit IRC (Quit: Connection closed for inactivity) [19:45] *** mtntmnky has quit IRC (Read error: Operation timed out) [19:56] *** i0npulse has quit IRC (Ping timeout: 265 seconds) [20:22] *** VerifiedJ has quit IRC (Quit: Leaving) [21:07] So it turns out that downloads.dell.com has indices for the subdirectories, just not the main dir. E.g. http://downloads.dell.com/FOLDER01478021M/ [21:08] I'll make sure everything in all known directories is archived after the AB job is done. [21:12] For the record, it's case-insensitive: https://downloads.dell.com/manuals/ == https://downloads.dell.com/Manuals/ [21:13] Actually, nevermind that regarding listing the main dir: ftp://ftp.ins.dell.com/ (warning, *HUGE*) [21:26] *** godane has quit IRC (Ping timeout: 272 seconds) [21:40] *** godane has joined #archiveteam-bs [22:31] *** lennier2 has joined #archiveteam-bs [22:34] *** katocala has quit IRC (Ping timeout: 265 seconds) [22:38] *** lennier1 has quit IRC (Read error: Operation timed out) [22:38] *** lennier2 is now known as lennier1 [22:38] *** katocala has joined #archiveteam-bs [22:58] *** VADemon has joined #archiveteam-bs [23:10] *** Arcorann_ has joined #archiveteam-bs [23:10] *** Arcorann_ has quit IRC (Remote host closed the connection) [23:11] *** Arcorann_ has joined #archiveteam-bs