#archiveteam-bs 2018-11-25,Sun

↑back Search

Time Nickname Message
00:23 🔗 icedice has quit IRC (Leaving)
00:29 🔗 hdch has joined #archiveteam-bs
00:33 🔗 hdch has quit IRC (Client Quit)
00:33 🔗 headacheb has joined #archiveteam-bs
00:33 🔗 headacheb has quit IRC (Remote host closed the connection)
00:34 🔗 hdch has joined #archiveteam-bs
01:31 🔗 VerifiedJ has quit IRC (Quit: Leaving)
01:58 🔗 Kitaru has joined #archiveteam-bs
02:36 🔗 BlueMax has quit IRC (Quit: Leaving)
02:37 🔗 Sk1d has quit IRC (Read error: Operation timed out)
02:42 🔗 Sk1d has joined #archiveteam-bs
02:45 🔗 dashcloud has quit IRC (Ping timeout: 268 seconds)
02:58 🔗 Kitaru has quit IRC (Read error: Connection reset by peer)
02:58 🔗 Kitaru has joined #archiveteam-bs
03:05 🔗 Kitaru has quit IRC (Quit: This computer has gone to sleep)
03:05 🔗 Kitaru has joined #archiveteam-bs
03:41 🔗 BlueMax has joined #archiveteam-bs
03:54 🔗 Kitaru has quit IRC (Quit: This computer has gone to sleep)
04:25 🔗 odemgi has joined #archiveteam-bs
04:25 🔗 qw3rty113 has joined #archiveteam-bs
04:28 🔗 odemgi_ has quit IRC (Read error: Operation timed out)
04:28 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
04:32 🔗 qw3rty112 has quit IRC (Read error: Operation timed out)
04:41 🔗 odemg has joined #archiveteam-bs
05:20 🔗 Kitaru has joined #archiveteam-bs
05:37 🔗 Sk1d has quit IRC (Read error: Operation timed out)
05:40 🔗 Sk1d has joined #archiveteam-bs
05:41 🔗 Kitaru has quit IRC (Quit: This computer has gone to sleep)
05:46 🔗 Kitaru has joined #archiveteam-bs
06:33 🔗 Sk1d has quit IRC (Read error: Operation timed out)
06:37 🔗 Sk1d has joined #archiveteam-bs
07:11 🔗 Kitaru has quit IRC (Quit: This computer has gone to sleep)
07:41 🔗 Sk1d has quit IRC (Read error: Operation timed out)
07:46 🔗 Sk1d has joined #archiveteam-bs
07:54 🔗 alex___ has quit IRC (Quit: ZZzzz)
08:10 🔗 icedice has joined #archiveteam-bs
08:44 🔗 icedice has quit IRC (Read error: Operation timed out)
08:50 🔗 jodizzle eientei95: When I actually go to the lists though it seems like there's not much available?
08:51 🔗 eientei95 Yeah, there's nothing on much in the lists, but there's a lot of content that's available, such as that 1930 article I chucked as !ao
08:51 🔗 eientei95 * 1932
08:51 🔗 eientei95 https://www.foreignaffairs.com/articles/japan/1932-07-01/memoirs-viscount-ishii
08:51 🔗 jodizzle Ahhh okay.
08:52 🔗 jodizzle eientei95: Seems like there's a really robust sitemap.xml: https://www.foreignaffairs.com/sitemap.xml
08:53 🔗 eientei95 http://web.archive.org/web/20151030142558/https://www.foreignaffairs.com/articles/united-states/1932-07-01/great-depression Yeah, I think we should grab these
08:53 🔗 jodizzle Seems like it enumerates all the articles.
08:53 🔗 jschwart has quit IRC (Konversation terminated!)
08:54 🔗 Sk1d has quit IRC (Read error: Operation timed out)
08:57 🔗 jodizzle eientei95: Would it help if I scraped up all the articles from the sitemap into a list?
08:57 🔗 jodizzle Or can you just throw archivebot directly at the sitemap?
08:58 🔗 kiska Are the links available in the site? If so we can chuck the entire site into AB
08:58 🔗 * Flashfire throws sitemap into archivebot
08:58 🔗 Sk1d has joined #archiveteam-bs
08:58 🔗 jodizzle kiska: Well, my point is that I think I could get direct links to the articles if you want to avoid recursive crawls.
08:59 🔗 jodizzle Like the stuff on this page: https://www.foreignaffairs.com/sitemap.xml?page=151
09:03 🔗 jodizzle Or to get the most crucial content faster or whatever.
09:06 🔗 hdch has quit IRC (Remote host closed the connection)
09:08 🔗 eientei95 jodizzle: Extracted the links from sitemap, sorted them in order from oldest to newest and chucked as !ao
09:09 🔗 jodizzle eientei95: Cool. Though it looks like Flashfire threw the whole sitemap in as well.
09:10 🔗 eientei95 Yeah, but that'd mean it gets queued
09:10 🔗 eientei95 The ao job is already on the pipeline and is starting now
09:11 🔗 Sk1d has quit IRC (Read error: Operation timed out)
09:13 🔗 jodizzle Is it okay to have totally redundant crawls like that?
09:15 🔗 Sk1d has joined #archiveteam-bs
09:24 🔗 Flashfire Double the data double the fun
09:24 🔗 Flashfire Plus the sitemap may grab other stuff
09:29 🔗 Sk1d has quit IRC (Read error: Operation timed out)
09:32 🔗 Sk1d has joined #archiveteam-bs
09:36 🔗 TC01 has quit IRC (Read error: Operation timed out)
09:39 🔗 TC01 has joined #archiveteam-bs
09:51 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
10:07 🔗 Sk1d has quit IRC (Read error: Operation timed out)
10:09 🔗 Sk1d has joined #archiveteam-bs
10:10 🔗 alex___ has joined #archiveteam-bs
10:14 🔗 schbirid has joined #archiveteam-bs
10:19 🔗 eientei95 Flashfire: Yeah, the sitemap will grab other stuff, but the ao job will get what we need asap :P
10:20 🔗 alex___ has quit IRC (Ping timeout: 260 seconds)
10:21 🔗 alex___ has joined #archiveteam-bs
10:22 🔗 Sk1d has quit IRC (Read error: Operation timed out)
10:26 🔗 Sk1d has joined #archiveteam-bs
10:31 🔗 JAA jodizzle: For urgent things, some duplication is fine.
10:32 🔗 JAA Flashfire: FYI, there's no need to explicitly throw in /sitemap.xml. ArchiveBot automatically retrieves that anyway and extracts any links from it.
10:32 🔗 JAA So just the homepage is fine.
10:33 🔗 JAA (This doesn't necessarily apply when the sitemap is under a custom URL. If that URL isn't linked on the site or in robots.txt, ArchiveBot might not find it.)
10:39 🔗 Sk1d has quit IRC (Read error: Operation timed out)
10:43 🔗 Sk1d has joined #archiveteam-bs
10:56 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
10:56 🔗 Mateon1 has joined #archiveteam-bs
14:17 🔗 wp494 has quit IRC (Ping timeout: 633 seconds)
14:17 🔗 VerifiedJ has joined #archiveteam-bs
14:18 🔗 wp494 has joined #archiveteam-bs
14:41 🔗 VerifiedJ has quit IRC (Quit: Leaving)
14:56 🔗 Smiley has quit IRC (Ping timeout: 252 seconds)
14:59 🔗 Smiley has joined #archiveteam-bs
15:18 🔗 dashcloud has joined #archiveteam-bs
16:25 🔗 Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
17:24 🔗 schbirid has quit IRC (Remote host closed the connection)
18:11 🔗 VerifiedJ has joined #archiveteam-bs
18:34 🔗 Stilett0 has joined #archiveteam-bs
18:38 🔗 Stiletto has quit IRC (Read error: Operation timed out)
19:07 🔗 hdch has joined #archiveteam-bs
19:32 🔗 zeronet has joined #archiveteam-bs
20:15 🔗 Sk1d has quit IRC (Read error: Operation timed out)
20:18 🔗 Sk1d has joined #archiveteam-bs
20:33 🔗 Sk1d has quit IRC (Read error: Operation timed out)
20:36 🔗 Sk1d has joined #archiveteam-bs
20:36 🔗 Kitaru has joined #archiveteam-bs
20:49 🔗 Sk1d has quit IRC (Read error: Operation timed out)
20:52 🔗 Sk1d has joined #archiveteam-bs
20:58 🔗 VerifiedJ has quit IRC (Quit: Leaving)
21:06 🔗 Sk1d has quit IRC (Read error: Operation timed out)
21:06 🔗 icedice has joined #archiveteam-bs
21:11 🔗 Sk1d has joined #archiveteam-bs
21:14 🔗 alex___ has quit IRC (Read error: Connection reset by peer)
21:18 🔗 alex___ has joined #archiveteam-bs
21:23 🔗 Sk1d has quit IRC (Read error: Operation timed out)
21:25 🔗 Pixi has quit IRC (Read error: Operation timed out)
21:28 🔗 Sk1d has joined #archiveteam-bs
21:30 🔗 Pixi has joined #archiveteam-bs
21:44 🔗 BlueMax has joined #archiveteam-bs
22:16 🔗 Sk1d has quit IRC (Read error: Operation timed out)
22:18 🔗 Sk1d has joined #archiveteam-bs
22:26 🔗 hdch has quit IRC (Remote host closed the connection)
22:26 🔗 hdch has joined #archiveteam-bs
22:30 🔗 Sk1d has quit IRC (Read error: Operation timed out)
22:33 🔗 Sk1d has joined #archiveteam-bs
22:47 🔗 Sk1d has quit IRC (Read error: Operation timed out)
22:51 🔗 Sk1d has joined #archiveteam-bs
22:59 🔗 Kitaru has quit IRC (Quit: This computer has gone to sleep)
23:04 🔗 Sk1d has quit IRC (Read error: Operation timed out)
23:07 🔗 Sk1d has joined #archiveteam-bs
23:16 🔗 Flashfire We should grab stuff in relation to george papadopolous it looks like he is going to prison
23:21 🔗 Sk1d has quit IRC (Read error: Operation timed out)
23:21 🔗 Isanami has joined #archiveteam-bs
23:23 🔗 Sk1d has joined #archiveteam-bs
23:31 🔗 JAA I threw his Twitter account into ArchiveBot/chromebot.
23:38 🔗 VerifiedJ has joined #archiveteam-bs
23:58 🔗 Kitaru has joined #archiveteam-bs

irclogger-viewer