[01:19] *** Maylay_ has quit IRC (Read error: Operation timed out) [01:53] *** Maylay has joined #archiveteam-bs [01:53] *** Maylay has quit IRC (Remote host closed the connection!) [01:54] *** Maylay has joined #archiveteam-bs [01:54] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [02:04] *** wp494 has joined #archiveteam-bs [02:04] *** wp494 has quit IRC (Remote host closed the connection) [02:11] *** wp494 has joined #archiveteam-bs [03:04] *** wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) [03:15] *** wp494 has joined #archiveteam-bs [04:02] *** BlueMax has quit IRC (Quit: Leaving) [04:28] *** qw3rty_ has joined #archiveteam-bs [04:33] *** qw3rty__ has quit IRC (Ping timeout: 276 seconds) [04:49] *** thuban has joined #archiveteam-bs [04:56] *** thuban4 has quit IRC (Read error: Operation timed out) [05:06] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [05:42] (Going off of wp494 in #archiveteam, my initial impressions of the Leaue of Legends forums & boards) - The forums, which have been in read-only mode since October 6, 2014 (http://forums.na.leagueoflegends.com/board/showthread.php?t=4909234), are here http://forums.na.leagueoflegends.com/board/ [05:43] The boards are here https://boards.pbe.leagueoflegends.com/en/ - some boards are not listed in the sidebar (requires JS to see), e.g. https://boards.pbe.leagueoflegends.com/en/c/champions-gameplay-feedback?sort_type=recent [05:46] There are several of what I look like continent/region names in the domains - among these are na, euw, oce, and pbe - looks like these are separate sites with separate sets of posts [06:00] League has different login regions that accounts are bound to [06:00] accounts can be transferred between regions, but for an RP fee [06:10] SHOULD BE MENTIONED: PBE is a beta environment where things are tested prior to being released to the live game; it functions similar to Dota 2 test/Blizzard PTRs/etc [06:17] wp494: Do you know if this list (https://support-leagueoflegends.riotgames.com/hc/en-us/articles/201751684-League-of-Legends-Servers) + pbe is comprehensive? [06:18] To quote my script, "Total is 8748215" threads on the forums, as long as I haven't made any mistakes there [06:22] *** abartov__ has quit IRC (Read error: Connection reset by peer) [06:22] *** abartov__ has joined #archiveteam-bs [06:23] Some regions, namely southeast Asia, are outsourced to Garena, but for all Riot-controlled regions yes [06:25] There is a Riot-controlled Korea region but I'm not too sure what the deal is with that one and why it's otherwise excluded from that list [06:25] Garena regions are as follows: Singapore, Indonesia, Philippines, Taiwan, Vietnam, Thailand [06:25] (but given that it's Garena infrastructure I'm reasonably sure they wouldn't have their own boards regions, doesn't hurt to check but I doubt it) [06:32] *** systwi has quit IRC (Read error: Operation timed out) [06:45] https://leagueoflegends.fandom.com/wiki/Servers looks like it includes the other ones, then [06:57] Doesn't look like any of those new ones have forums [06:58] There is a hole of ~10000 thread IDs at the beginning of "lan"'s space [06:59] *** systwi has joined #archiveteam-bs [07:16] *** triggerh3 has quit IRC (Ping timeout: 496 seconds) [07:21] In addition to the division by region, and obviously the division into subforums and threads, there is a level directly under region consisting of language [07:21] In some regions (e.g. jp), there is only one [07:22] What languages are listed is determined through JS - I don't want to look into it right now [07:27] I'm talking about boards there - forums have 1 language per region, as far as I can tell, & I use "subforum" in the context of boards to refer to the groups of threads that I think may be properly called "categories" [08:41] *** NIC007a83 has quit IRC (Ping timeout: 745 seconds) [08:42] *** NIC007a83 has joined #archiveteam-bs [09:02] *** Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) [09:11] *** godane has quit IRC (Ping timeout: 255 seconds) [11:45] *** britmob_ has quit IRC (Read error: Operation timed out) [13:09] *** triggerh3 has joined #archiveteam-bs [13:30] *** Silvan has quit IRC (Ping timeout: 186 seconds) [13:31] *** SilSte has joined #archiveteam-bs [13:54] if AB is used on a big forum, I suggest feeding AB !ao lists of thread URLs that are spread across multiple AB jobs so they can run in parallel. (This assumes that each AB job is the rate-limiting factor, and not the forum site. Disregard this if a dedicated project effort is in the works that uses qwarc or other means that don't have AB's limits.) [14:18] atphoenix: wouldn't that mean stuff past the first page of a thread wouldn't be grabbed? [14:24] hook54321, that depends on how the thread URLs are provided to !ao [14:26] I'm suggesting that every page of every thread be fed as a separate line item to AB via a list. That still will require figuring out how long each thread is, but that presumably can be determined by looking at the first page of each thread. [15:05] *** Craigle has joined #archiveteam-bs [15:15] it might make more sense to qwarc it or run it outside of archivebot with over 6 concurrency [15:18] ^ [15:25] *** ats has quit IRC (Ping timeout: 258 seconds) [15:32] *** ats has joined #archiveteam-bs [16:07] I did mention qwarc. That parallel AB idea may be more 'accessible' for cases where people don't know how to setup qwarc for a custom project. [16:34] *** bitbit has joined #archiveteam-bs [16:43] *** HP_Archiv has joined #archiveteam-bs [16:45] *** ats has quit IRC (Ping timeout: 258 seconds) [16:46] *** ats_ has joined #archiveteam-bs [16:53] *** icedice has quit IRC (Ping timeout: 622 seconds) [17:06] *** Pixi` has quit IRC (Read error: Connection reset by peer) [17:06] *** Pixi has joined #archiveteam-bs [17:31] *** Pixi has quit IRC (Ping timeout: 1212 seconds) [17:43] *** Pixi has joined #archiveteam-bs [17:45] *** Pixi has quit IRC (Read error: Connection reset by peer) [17:45] *** Pixi has joined #archiveteam-bs [20:29] Threads are indexed by sequential numbers, should be easy enough [20:32] Pitfalls are 1. the "hole" in numbers I describe enough, which I think has a good chance of being replicated in other regions &/or other places 2. some non-OK HTTP status code (rate limiting?) that my script encountered at one point, even though it was a valid thread [20:32] *describe above [20:53] *** HP_Archiv has quit IRC (Quit: Leaving) [21:26] *** obskyr has quit IRC (Read error: Operation timed out) [21:26] *** obskyr has joined #archiveteam-bs [21:27] *** Lord_Nigh has quit IRC (Read error: Connection reset by peer) [21:27] *** benjinsmi has joined #archiveteam-bs [21:29] *** katocala has joined #archiveteam-bs [21:30] *** benjins has quit IRC (Ping timeout: 610 seconds) [21:31] *** halt_ has joined #archiveteam-bs [21:34] *** halt has quit IRC (Ping timeout: 610 seconds) [22:05] *** britmob has joined #archiveteam-bs [22:16] *** BlueMax has joined #archiveteam-bs [22:45] *** britmob_ has joined #archiveteam-bs [22:51] *** britmob has quit IRC (Read error: Operation timed out) [22:55] *** katocala has quit IRC (Read error: Operation timed out) [22:56] *** katocala has joined #archiveteam-bs [22:59] *** Wingy has quit IRC (Read error: Operation timed out) [23:02] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [23:05] *** Wingy has joined #archiveteam-bs [23:05] *** RichardG has joined #archiveteam-bs [23:07] *** logchfoo1 starts logging #archiveteam-bs at Mon Mar 02 23:07:06 2020 [23:07] *** logchfoo1 has joined #archiveteam-bs [23:07] *** scorche has quit IRC (Ping timeout: 255 seconds) [23:07] *** Wingy has quit IRC (Client Quit) [23:08] *** Wingy has joined #archiveteam-bs [23:09] *** scorche has joined #archiveteam-bs [23:13] *** Wingy has quit IRC (Read error: Operation timed out) [23:14] *** achip has joined #archiveteam-bs [23:14] *** Wingy has joined #archiveteam-bs [23:49] *** katocala has left [23:50] *** katocala has joined #archiveteam-bs [23:52] *** katocala has left