[00:11] JAA: Nope - I've only seen stickies in the page itseldf [00:11] *itself [00:28] Oh, another fun bug of the boards: https://boards.na.leagueoflegends.com/api/PEr1qIcT/discussions?sort_type=recently_replied&num_loaded=1377400 claims to return 21 results but the HTML string is empty. [00:29] (It's supposed to return the 21 discussions that were least recently active ever.) [00:51] *** godane has joined #archiveteam-bs [02:04] Oh, fantastic. After finally working around all that rubbish, I did a first load test. Guess what, under load, the API returns an empty list instead of data, still with HTTP 200 though. (╯°□°)╯︵ ┻━┻ [02:37] *** RichardG_ has quit IRC (Quit: Keyboard not found, press F1 to continue) [02:38] *** RichardG has joined #archiveteam-bs [03:24] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [03:26] Running okay now. Estimated 400 GB and ~50 hours for the NA EN boards (presumably the largest ones). So assuming it goes okay now, it should finish on time. [03:35] Ok [03:45] *** wp494 has joined #archiveteam-bs [04:15] *** qw3rty_ has joined #archiveteam-bs [04:23] *** qw3rty__ has quit IRC (Read error: Operation timed out) [04:24] *** katocala has joined #archiveteam-bs [04:24] *** katocala has left [04:43] Discovered more edge cases, trying to solve them now and will have to restart then. [05:01] *** ephemer0l has quit IRC (Read error: Connection reset by peer) [05:11] *** ephemer0l has joined #archiveteam-bs [05:47] Discussion counts on each board according to the API: https://transfer.notkiska.pw/inline/av40k/lol-boards-discussion-counts [05:48] Apparently the English EU forums are available under both eune and euw. [06:52] This is fun. ~2k req/s, only a few dozen errors per minute :-) [06:59] *** DFJustin has quit IRC (Remote host closed the connection) [06:59] *** DFJustin has joined #archiveteam-bs [07:16] *** HP_Archiv has quit IRC (Quit: Leaving) [10:22] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [10:24] *** ShellyRol has joined #archiveteam-bs [11:54] *** PlsHelp has joined #archiveteam-bs [11:56] Also, regarding xbooru.com purging photos: comments at Xbooru is exclusive content so that's another reason to archive it. [11:56] Please read http://blog.booru.org/?p=209 [12:12] *** mtntmnky has quit IRC (Remote host closed the connection) [12:13] *** mtntmnky has joined #archiveteam-bs [12:26] FOCUS: https://closeup.booru.org/index.php?page=post&s=list&pid=26440 [12:27] FOCUS: https://cheesecake.booru.org/index.php?page=post&s=list [12:34] latest digitize tapes : https://www.patreon.com/posts/digitize-tapes-34874417 [12:35] FOCUS: https://vulva.booru.org/index.php?page=post&s=list&tags=all - these websites which are part of the booru.org network will be PURGED soon! [12:38] https://joy2.booru.org/index.php?page=post&s=list&tags=all [12:49] https://celeb-fake-nude.booru.org/index.php?page=post&s=list&pid=5940 [12:50] *** BlueMax has quit IRC (Read error: Connection reset by peer) [13:11] *** zhongfu has quit IRC (Ping timeout: 745 seconds) [13:12] *** zhongfu has joined #archiveteam-bs [13:16] *** PlsHelp has quit IRC (Quit: Page closed) [15:49] *** MaximeleG has joined #archiveteam-bs [16:10] SketchCow: so i found this looking for pdfs on twitter : https://twitter.com/rohweroutpost [16:11] its a magazine/newspaper from Rohwer Arkansas Japanese Relocation Camp [17:12] All of the LoL forums except the NA ones are done (save for errors to be handled soon). The smaller boards (eune_ro, eune_hu, eune_cs, eune_el, euw_de, jp_ja-jp, pbe_en, oce_en) are also done. The other boards are still running. [17:13] On the forums, I'm saving the thread pages and the forum pagination. I'm bruteforcing thread IDs up to {'na': 4924000, 'euw': 2106000, 'eune': 807000, 'ru': 30000, 'br': 382000, 'lan': 96000, 'las': 183000, 'oce': 89000, 'tr': 137000} [17:14] On the boards, I'm saving the homepage pagination, discussions including pagination in flat view, and user profiles. [17:14] As mentioned before, no images, outlinks, etc. [17:30] *** Smiley has joined #archiveteam-bs [17:33] *** SmileyG has quit IRC (Read error: Operation timed out) [18:41] *** idk has joined #archiveteam-bs [18:42] Hey JAA, after this, I assume it would be ideal to run the non-targeted images, outlinks, etc in AB after the League of Legends forums grab is finished? [18:45] *** idk has quit IRC (Ping timeout: 260 seconds) [18:47] Ryz: If anyone wants to extract that information from the WARCs, sure. I'm not planning to do so. [19:02] *** arkiver_ has quit IRC (Read error: Connection reset by peer) [19:02] *** arkiver_ has joined #archiveteam-bs [19:06] *** DogsRNice has joined #archiveteam-bs [19:07] *** MaximeleG has quit IRC (Ping timeout: 745 seconds) [19:30] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [19:37] *** SynMonger has quit IRC (Ping timeout: 276 seconds) [19:40] *** SynMonger has joined #archiveteam-bs [19:50] *** Craigle has quit IRC (Ping timeout: 276 seconds) [19:50] *** anarcat has quit IRC (Ping timeout: 276 seconds) [19:50] *** benjins has quit IRC (Ping timeout: 276 seconds) [19:50] *** fredgido has quit IRC (Ping timeout: 276 seconds) [19:50] *** pew has quit IRC (Ping timeout: 276 seconds) [19:50] *** klg has quit IRC (Ping timeout: 276 seconds) [19:50] *** logchfoo2 has quit IRC (Ping timeout: 276 seconds) [19:51] *** logchfoo3 starts logging #archiveteam-bs at Sat Mar 14 19:51:55 2020 [19:51] *** logchfoo3 has joined #archiveteam-bs [19:52] *** logchfoo2 has quit IRC (Ping timeout: 276 seconds) [19:52] *** Dallas has quit IRC (Ping timeout: 276 seconds) [19:52] *** Hooloovoo has quit IRC (Ping timeout: 276 seconds) [19:52] *** arkiver_ has quit IRC (Ping timeout: 276 seconds) [19:52] *** dxrt has quit IRC (Ping timeout: 276 seconds) [19:52] *** Fionera has quit IRC (Ping timeout: 276 seconds) [19:53] *** pew has joined #archiveteam-bs [19:54] *** godane has quit IRC (Ping timeout: 276 seconds) [19:54] *** coderobe has quit IRC (Ping timeout: 276 seconds) [19:56] *** synm0nger has joined #archiveteam-bs [19:58] *** Xibalba has quit IRC (Ping timeout: 276 seconds) [19:59] *** Xibalba has joined #archiveteam-bs [20:01] *** SynMonger has quit IRC (Read error: Operation timed out) [20:03] *** OrIdow6 has quit IRC (Ping timeout: 276 seconds) [20:03] *** VoynichCr has quit IRC (Ping timeout: 276 seconds) [20:05] *** mtntmnky_ has joined #archiveteam-bs [20:05] *** is-_ has joined #archiveteam-bs [20:06] *** pew has quit IRC (se.hub irc.underworld.no) [20:06] *** jmtd has quit IRC (se.hub irc.underworld.no) [20:06] *** is- has quit IRC (se.hub irc.underworld.no) [20:06] *** Frogging has quit IRC (se.hub irc.underworld.no) [20:06] *** purplebot has quit IRC (se.hub irc.underworld.no) [20:07] *** Jon| has joined #archiveteam-bs [20:21] *** mtntmnky has quit IRC (Remote host closed the connection) [20:23] *** dxrt has joined #archiveteam-bs [20:24] *** svchfoo1 sets mode: +o dxrt [20:26] *** klg has joined #archiveteam-bs [20:28] *** VoynichCr has joined #archiveteam-bs [20:30] *** Samizdat has joined #archiveteam-bs [20:41] *** pew has joined #archiveteam-bs [21:00] *** Samizdat has quit IRC (Read error: Connection reset by peer) [21:32] *** opticnerv has joined #archiveteam-bs [21:50] *** ephemer0l has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [21:54] *** ephemer0l has joined #archiveteam-bs [22:07] *** Smiley has quit IRC (Read error: Connection reset by peer) [22:07] *** Smiley has joined #archiveteam-bs [22:07] *** antomati_ has joined #archiveteam-bs [22:09] *** antomatic has quit IRC (Read error: Operation timed out) [22:15] *** HP_Archiv has joined #archiveteam-bs [22:28] *** OrIdow6 has joined #archiveteam-bs [22:39] *** Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) [22:48] *** opticnerv has quit IRC (Quit: Leaving) [22:50] *** Craigle has joined #archiveteam-bs [23:13] *** BlueMax has joined #archiveteam-bs