[00:23] *** icedice has quit IRC (Leaving) [00:29] *** hdch has joined #archiveteam-bs [00:33] *** hdch has quit IRC (Client Quit) [00:33] *** headacheb has joined #archiveteam-bs [00:33] *** headacheb has quit IRC (Remote host closed the connection) [00:34] *** hdch has joined #archiveteam-bs [01:31] *** VerifiedJ has quit IRC (Quit: Leaving) [01:58] *** Kitaru has joined #archiveteam-bs [02:36] *** BlueMax has quit IRC (Quit: Leaving) [02:37] *** Sk1d has quit IRC (Read error: Operation timed out) [02:42] *** Sk1d has joined #archiveteam-bs [02:45] *** dashcloud has quit IRC (Ping timeout: 268 seconds) [02:58] *** Kitaru has quit IRC (Read error: Connection reset by peer) [02:58] *** Kitaru has joined #archiveteam-bs [03:05] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [03:05] *** Kitaru has joined #archiveteam-bs [03:41] *** BlueMax has joined #archiveteam-bs [03:54] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [04:25] *** odemgi has joined #archiveteam-bs [04:25] *** qw3rty113 has joined #archiveteam-bs [04:28] *** odemgi_ has quit IRC (Read error: Operation timed out) [04:28] *** odemg has quit IRC (Ping timeout: 265 seconds) [04:32] *** qw3rty112 has quit IRC (Read error: Operation timed out) [04:41] *** odemg has joined #archiveteam-bs [05:20] *** Kitaru has joined #archiveteam-bs [05:37] *** Sk1d has quit IRC (Read error: Operation timed out) [05:40] *** Sk1d has joined #archiveteam-bs [05:41] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [05:46] *** Kitaru has joined #archiveteam-bs [06:33] *** Sk1d has quit IRC (Read error: Operation timed out) [06:37] *** Sk1d has joined #archiveteam-bs [07:11] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [07:41] *** Sk1d has quit IRC (Read error: Operation timed out) [07:46] *** Sk1d has joined #archiveteam-bs [07:54] *** alex___ has quit IRC (Quit: ZZzzz) [08:10] *** icedice has joined #archiveteam-bs [08:44] *** icedice has quit IRC (Read error: Operation timed out) [08:50] eientei95: When I actually go to the lists though it seems like there's not much available? [08:51] Yeah, there's nothing on much in the lists, but there's a lot of content that's available, such as that 1930 article I chucked as !ao [08:51] * 1932 [08:51] https://www.foreignaffairs.com/articles/japan/1932-07-01/memoirs-viscount-ishii [08:51] Ahhh okay. [08:52] eientei95: Seems like there's a really robust sitemap.xml: https://www.foreignaffairs.com/sitemap.xml [08:53] http://web.archive.org/web/20151030142558/https://www.foreignaffairs.com/articles/united-states/1932-07-01/great-depression Yeah, I think we should grab these [08:53] Seems like it enumerates all the articles. [08:53] *** jschwart has quit IRC (Konversation terminated!) [08:54] *** Sk1d has quit IRC (Read error: Operation timed out) [08:57] eientei95: Would it help if I scraped up all the articles from the sitemap into a list? [08:57] Or can you just throw archivebot directly at the sitemap? [08:58] Are the links available in the site? If so we can chuck the entire site into AB [08:58] * Flashfire throws sitemap into archivebot [08:58] *** Sk1d has joined #archiveteam-bs [08:58] kiska: Well, my point is that I think I could get direct links to the articles if you want to avoid recursive crawls. [08:59] Like the stuff on this page: https://www.foreignaffairs.com/sitemap.xml?page=151 [09:03] Or to get the most crucial content faster or whatever. [09:06] *** hdch has quit IRC (Remote host closed the connection) [09:08] jodizzle: Extracted the links from sitemap, sorted them in order from oldest to newest and chucked as !ao [09:09] eientei95: Cool. Though it looks like Flashfire threw the whole sitemap in as well. [09:10] Yeah, but that'd mean it gets queued [09:10] The ao job is already on the pipeline and is starting now [09:11] *** Sk1d has quit IRC (Read error: Operation timed out) [09:13] Is it okay to have totally redundant crawls like that? [09:15] *** Sk1d has joined #archiveteam-bs [09:24] Double the data double the fun [09:24] Plus the sitemap may grab other stuff [09:29] *** Sk1d has quit IRC (Read error: Operation timed out) [09:32] *** Sk1d has joined #archiveteam-bs [09:36] *** TC01 has quit IRC (Read error: Operation timed out) [09:39] *** TC01 has joined #archiveteam-bs [09:51] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:07] *** Sk1d has quit IRC (Read error: Operation timed out) [10:09] *** Sk1d has joined #archiveteam-bs [10:10] *** alex___ has joined #archiveteam-bs [10:14] *** schbirid has joined #archiveteam-bs [10:19] Flashfire: Yeah, the sitemap will grab other stuff, but the ao job will get what we need asap :P [10:20] *** alex___ has quit IRC (Ping timeout: 260 seconds) [10:21] *** alex___ has joined #archiveteam-bs [10:22] *** Sk1d has quit IRC (Read error: Operation timed out) [10:26] *** Sk1d has joined #archiveteam-bs [10:31] jodizzle: For urgent things, some duplication is fine. [10:32] Flashfire: FYI, there's no need to explicitly throw in /sitemap.xml. ArchiveBot automatically retrieves that anyway and extracts any links from it. [10:32] So just the homepage is fine. [10:33] (This doesn't necessarily apply when the sitemap is under a custom URL. If that URL isn't linked on the site or in robots.txt, ArchiveBot might not find it.) [10:39] *** Sk1d has quit IRC (Read error: Operation timed out) [10:43] *** Sk1d has joined #archiveteam-bs [10:56] *** Mateon1 has quit IRC (Read error: Operation timed out) [10:56] *** Mateon1 has joined #archiveteam-bs [14:17] *** wp494 has quit IRC (Ping timeout: 633 seconds) [14:17] *** VerifiedJ has joined #archiveteam-bs [14:18] *** wp494 has joined #archiveteam-bs [14:41] *** VerifiedJ has quit IRC (Quit: Leaving) [14:56] *** Smiley has quit IRC (Ping timeout: 252 seconds) [14:59] *** Smiley has joined #archiveteam-bs [15:18] *** dashcloud has joined #archiveteam-bs [16:25] *** Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [17:24] *** schbirid has quit IRC (Remote host closed the connection) [18:11] *** VerifiedJ has joined #archiveteam-bs [18:34] *** Stilett0 has joined #archiveteam-bs [18:38] *** Stiletto has quit IRC (Read error: Operation timed out) [19:07] *** hdch has joined #archiveteam-bs [19:32] *** zeronet has joined #archiveteam-bs [20:15] *** Sk1d has quit IRC (Read error: Operation timed out) [20:18] *** Sk1d has joined #archiveteam-bs [20:33] *** Sk1d has quit IRC (Read error: Operation timed out) [20:36] *** Sk1d has joined #archiveteam-bs [20:36] *** Kitaru has joined #archiveteam-bs [20:49] *** Sk1d has quit IRC (Read error: Operation timed out) [20:52] *** Sk1d has joined #archiveteam-bs [20:58] *** VerifiedJ has quit IRC (Quit: Leaving) [21:06] *** Sk1d has quit IRC (Read error: Operation timed out) [21:06] *** icedice has joined #archiveteam-bs [21:11] *** Sk1d has joined #archiveteam-bs [21:14] *** alex___ has quit IRC (Read error: Connection reset by peer) [21:18] *** alex___ has joined #archiveteam-bs [21:23] *** Sk1d has quit IRC (Read error: Operation timed out) [21:25] *** Pixi has quit IRC (Read error: Operation timed out) [21:28] *** Sk1d has joined #archiveteam-bs [21:30] *** Pixi has joined #archiveteam-bs [21:44] *** BlueMax has joined #archiveteam-bs [22:16] *** Sk1d has quit IRC (Read error: Operation timed out) [22:18] *** Sk1d has joined #archiveteam-bs [22:26] *** hdch has quit IRC (Remote host closed the connection) [22:26] *** hdch has joined #archiveteam-bs [22:30] *** Sk1d has quit IRC (Read error: Operation timed out) [22:33] *** Sk1d has joined #archiveteam-bs [22:47] *** Sk1d has quit IRC (Read error: Operation timed out) [22:51] *** Sk1d has joined #archiveteam-bs [22:59] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [23:04] *** Sk1d has quit IRC (Read error: Operation timed out) [23:07] *** Sk1d has joined #archiveteam-bs [23:16] We should grab stuff in relation to george papadopolous it looks like he is going to prison [23:21] *** Sk1d has quit IRC (Read error: Operation timed out) [23:21] *** Isanami has joined #archiveteam-bs [23:23] *** Sk1d has joined #archiveteam-bs [23:31] I threw his Twitter account into ArchiveBot/chromebot. [23:38] *** VerifiedJ has joined #archiveteam-bs [23:58] *** Kitaru has joined #archiveteam-bs