[00:23] @JAA: Hmm... bit of a problem: the bethsoft scrape is running over HTTPS. However, those webpages request some content over HTTP. Will that content get archived OK? Chrome is refusing to render those URLs correctly — the stylesheets and scripts refuse to load. [00:24] I'm guessing that once it's in the WBM, it won't matter. Noob question. [00:24] vesi: Yes, should all be retrieved correctly. ArchiveBot (or rather the underlying tool, wpull) doesn't care about that distinction. [00:26] Thanks, good to know. And when the content is loaded into the WBM, will it matter if someone looks up a URL using HTTP or HTTPS? [00:27] nope [00:43] what if we moved to discord [00:43] Fuck Discord [00:45] Want to play a game and figure out how many of Discord's terms we'd potentially be breaking with our activities? [00:45] as long as i don't have to do shots every time [00:46] I can see at least three already just from quickly glancing over it. [00:46] *** trumad_ has joined #archiveteam-bs [00:46] Not to mention that it's all proprietary etc. [00:46] #archiveteamquorum [00:47] *** trumad has quit IRC (Ping timeout: 276 seconds) [01:07] ops are looking a little thin, i forget who gets ops these days [01:15] *** vesi has quit IRC (Ping timeout: 260 seconds) [01:33] *** icedice has joined #archiveteam-bs [02:02] *** vesi has joined #archiveteam-bs [02:03] Hypothetically, if I had scrapes that I've made of certain sites, how should I go about sharing them with others? [02:19] Interesting... http://forums.bethsoft.com/topic/1231593- [02:19] Response time increased again by the way. We'll see how stable this is. [02:20] vesi: I mean, depends on who you want to share it with? [02:32] *** Ctrl has quit IRC (Read error: Operation timed out) [03:06] *** systwi has quit IRC (Read error: Connection reset by peer) [03:07] *** equant has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** thuban3 has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** bsmith093 has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** Datechnom has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** nyany_ has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** svchfoo3 has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** Ryz has quit IRC (ircd.choopa.net irc.mzima.net) [03:07] *** pikami has quit IRC (ircd.choopa.net irc.mzima.net) [03:08] *** Ctrl has joined #archiveteam-bs [03:08] *** systwi has joined #archiveteam-bs [03:18] *** equant has joined #archiveteam-bs [03:18] *** thuban3 has joined #archiveteam-bs [03:18] *** bsmith093 has joined #archiveteam-bs [03:18] *** Datechnom has joined #archiveteam-bs [03:18] *** nyany_ has joined #archiveteam-bs [03:18] *** svchfoo3 has joined #archiveteam-bs [03:18] *** Ryz has joined #archiveteam-bs [03:18] *** pikami has joined #archiveteam-bs [03:18] *** irc.mzima.net sets mode: +o svchfoo3 [03:28] *** mtntmnky has quit IRC (Remote host closed the connection) [03:29] *** mtntmnky has joined #archiveteam-bs [04:03] *** Darkstar has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [04:07] *** slyphic_ has joined #archiveteam-bs [04:07] *** slyphic has quit IRC (Read error: Connection reset by peer) [04:07] *** NIC007a83 has quit IRC (Read error: Operation timed out) [04:08] *** NIC007a83 has joined #archiveteam-bs [04:13] *** slyphic_ has quit IRC (Remote host closed the connection) [04:13] *** MrRadar has quit IRC (Read error: Connection reset by peer) [04:14] *** mistym has quit IRC (Ping timeout: 360 seconds) [04:14] *** Frogging has quit IRC (Read error: Operation timed out) [04:14] *** Frogging has joined #archiveteam-bs [04:14] *** jrwr has quit IRC (Ping timeout: 264 seconds) [04:15] *** odemgi_ has joined #archiveteam-bs [04:15] *** jodizzle has quit IRC (Read error: Operation timed out) [04:16] *** Maylay_ has joined #archiveteam-bs [04:16] *** pie_[bnc] has quit IRC (Ping timeout: 360 seconds) [04:17] *** mundus201 has quit IRC (Read error: Operation timed out) [04:17] *** jodizzle has joined #archiveteam-bs [04:18] *** mundus201 has joined #archiveteam-bs [04:18] *** odemgi has quit IRC (Ping timeout: 276 seconds) [04:18] *** Fionera has joined #archiveteam-bs [04:19] *** pie_[bnc] has joined #archiveteam-bs [04:19] *** mistym has joined #archiveteam-bs [04:19] *** Igloo has quit IRC (Read error: Operation timed out) [04:20] *** Selavi has quit IRC (Ping timeout: 264 seconds) [04:20] *** Selavi has joined #archiveteam-bs [04:21] *** Igloo has joined #archiveteam-bs [04:21] *** svchfoo1 sets mode: +o Igloo [04:21] *** svchfoo3 sets mode: +o Igloo [04:22] *** Fionera_ has quit IRC (Read error: Connection reset by peer) [04:23] *** slyphic has joined #archiveteam-bs [04:23] *** jrwr has joined #archiveteam-bs [04:27] *** Darkstar has joined #archiveteam-bs [04:27] *** MrRadar has joined #archiveteam-bs [04:28] *** Maylay has quit IRC (Read error: Operation timed out) [04:35] *** qw3rty__ has joined #archiveteam-bs [04:37] *** vesi has quit IRC (Ping timeout: 260 seconds) [04:39] *** qw3rty_ has quit IRC (Ping timeout: 276 seconds) [05:07] *** Ravenloft has quit IRC (Ping timeout: 360 seconds) [05:09] *** Ravenloft has joined #archiveteam-bs [05:10] *** DFJustin has quit IRC (Remote host closed the connection) [05:14] *** DFJustin has joined #archiveteam-bs [05:14] *** DFJustin has quit IRC (Remote host closed the connection) [05:15] *** DFJustin has joined #archiveteam-bs [07:04] *** SJon__ has quit IRC (Ping timeout: 264 seconds) [07:04] *** SJon__ has joined #archiveteam-bs [08:52] *** britmob_ has joined #archiveteam-bs [08:56] *** RKenshin has joined #archiveteam-bs [08:56] *** kiska3 has joined #archiveteam-bs [08:56] *** Auctus_ has joined #archiveteam-bs [08:57] *** mundus201 has quit IRC (Ping timeout: 610 seconds) [08:58] *** mundus201 has joined #archiveteam-bs [08:58] *** Auctus has quit IRC (Read error: Operation timed out) [08:58] *** britmob has quit IRC (Read error: Operation timed out) [08:59] *** Kenshin has quit IRC (Read error: Operation timed out) [08:59] *** RKenshin is now known as Kenshin [08:59] *** Maylay_ has quit IRC (Read error: Operation timed out) [09:00] *** Maylay has joined #archiveteam-bs [09:01] *** underscor has joined #archiveteam-bs [09:02] *** ShellyRol has joined #archiveteam-bs [09:06] *** Raccoon has quit IRC (Remote host closed the connection) [09:12] *** halt has joined #archiveteam-bs [09:26] *** lennier1 has quit IRC (Remote host closed the connection) [09:45] *** swirve has quit IRC (Ping timeout: 745 seconds) [09:51] *** swirve has joined #archiveteam-bs [11:38] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:38] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [11:42] *** ShellyRol has joined #archiveteam-bs [12:34] My qwarc retrieval of forums.bethsoft.com is now a bit over 50 % done. [12:44] *** Ctrl has quit IRC (Read error: Operation timed out) [12:48] *** Ctrl has joined #archiveteam-bs [13:10] *** yano has quit IRC (Remote host closed the connection) [13:11] *** yano has joined #archiveteam-bs [14:10] *** Ryz has quit IRC (Remote host closed the connection) [14:11] *** Ryz has joined #archiveteam-bs [14:11] *** kiska18 has joined #archiveteam-bs [15:47] Welp, first Twitter, now Facebook is going for a revamp in the future; can this affect snscrape/socialbot? https://www.ghacks.net/2020/02/11/this-is-how-facebooks-new-desktop-design-looks-and-how-you-can-restore-the-old-facebook/ [15:48] JAA ^ [15:51] It could, depending on how they implement it. [16:02] *** VerifiedJ has joined #archiveteam-bs [16:19] what do you guys think about a project for archiving YouTube captions (auto-generated and/or subtitles)? obviously there's several problems: a) obtaining/processing massive amounts video/channel IDs. b) maybe the collective size of the captions will be too huge. c) the usual argument that more content is uploaded per minute than we can keep up with -- however we can cap it at views >1k or >10k etc. But I think such an archive [16:19] can be useful so we can at least have a record of what was said even after videos get taken down. [16:28] d) Rate limiting [16:34] JAA: do you perhaps have a grasp of the scope of "a)" or a dataset for "a)" beyond what this archive managed? https://archive.org/download/Youtube_metadata_02_2019 [16:35] a) #youtubearchive would be a good starting point. [16:35] b) should not be an issue. [16:46] you mean the collective size of the captions should not be an issue? [17:02] Yeah, since text compresses very well. [17:02] I see [17:03] should I repost my question to #youtubearchive? or perhaps to hackint#youtubearchive? :D I dont want to spam [17:11] *** thuban3 is now known as thuban [17:25] Well, #youtubearchive is an archive of actual videos, and there's a bunch of bot noise in there, so possibly not the best place to discuss this. There's also #down-the-tube on hackint which was used for the liked playlist thingy recently. Could reuse that. [17:25] I mentioned #youtubearchive because it's a good source of video IDs. [17:34] *** asdf01018 has joined #archiveteam-bs [17:36] *** asdf0101 has quit IRC (Read error: Operation timed out) [17:36] *** asdf01018 is now known as asdf0101 [18:01] *** Craigle has quit IRC (Ping timeout: 745 seconds) [18:14] JAA: thanks, seems it was fruitful [18:24] *** Stiletto has joined #archiveteam-bs [18:27] *** DogsRNice has joined #archiveteam-bs [18:29] *** BartoCH_ has joined #archiveteam-bs [18:31] *** BartoCH has quit IRC (Read error: Connection reset by peer) [18:32] *** TC01_ has joined #archiveteam-bs [18:40] *** TC01 has quit IRC (Ping timeout: 745 seconds) [18:54] *** Craigle has joined #archiveteam-bs [20:15] *** thuban1 has joined #archiveteam-bs [20:16] *** BlueMax has joined #archiveteam-bs [20:17] *** thuban has quit IRC (Read error: Operation timed out) [20:37] *** Raccoon has joined #archiveteam-bs [21:13] *** Ctrl has quit IRC (Read error: Operation timed out) [21:22] *** n00b272 has joined #archiveteam-bs [21:28] *** BlueMax has quit IRC (Read error: Connection reset by peer) [21:45] *** prq has quit IRC (Remote host closed the connection) [21:56] *** Ctrl has joined #archiveteam-bs [22:55] *** godane has joined #archiveteam-bs [23:39] *** VerifiedJ has quit IRC (Quit: Leaving) [23:41] *** n00b272 has quit IRC (Ping timeout: 260 seconds) [23:43] *** astrid has left ][