| Time |
Nickname |
Message |
|
00:23
🔗
|
|
icedice has quit IRC (Leaving) |
|
00:29
🔗
|
|
hdch has joined #archiveteam-bs |
|
00:33
🔗
|
|
hdch has quit IRC (Client Quit) |
|
00:33
🔗
|
|
headacheb has joined #archiveteam-bs |
|
00:33
🔗
|
|
headacheb has quit IRC (Remote host closed the connection) |
|
00:34
🔗
|
|
hdch has joined #archiveteam-bs |
|
01:31
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
|
01:58
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
02:36
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
|
02:37
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
02:42
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
02:45
🔗
|
|
dashcloud has quit IRC (Ping timeout: 268 seconds) |
|
02:58
🔗
|
|
Kitaru has quit IRC (Read error: Connection reset by peer) |
|
02:58
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
03:05
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
|
03:05
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
03:41
🔗
|
|
BlueMax has joined #archiveteam-bs |
|
03:54
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
|
04:25
🔗
|
|
odemgi has joined #archiveteam-bs |
|
04:25
🔗
|
|
qw3rty113 has joined #archiveteam-bs |
|
04:28
🔗
|
|
odemgi_ has quit IRC (Read error: Operation timed out) |
|
04:28
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
|
04:32
🔗
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
|
04:41
🔗
|
|
odemg has joined #archiveteam-bs |
|
05:20
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
05:37
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
05:40
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
05:41
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
|
05:46
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
06:33
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
06:37
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
07:11
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
|
07:41
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
07:46
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
07:54
🔗
|
|
alex___ has quit IRC (Quit: ZZzzz) |
|
08:10
🔗
|
|
icedice has joined #archiveteam-bs |
|
08:44
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
|
08:50
🔗
|
jodizzle |
eientei95: When I actually go to the lists though it seems like there's not much available? |
|
08:51
🔗
|
eientei95 |
Yeah, there's nothing on much in the lists, but there's a lot of content that's available, such as that 1930 article I chucked as !ao |
|
08:51
🔗
|
eientei95 |
* 1932 |
|
08:51
🔗
|
eientei95 |
https://www.foreignaffairs.com/articles/japan/1932-07-01/memoirs-viscount-ishii |
|
08:51
🔗
|
jodizzle |
Ahhh okay. |
|
08:52
🔗
|
jodizzle |
eientei95: Seems like there's a really robust sitemap.xml: https://www.foreignaffairs.com/sitemap.xml |
|
08:53
🔗
|
eientei95 |
http://web.archive.org/web/20151030142558/https://www.foreignaffairs.com/articles/united-states/1932-07-01/great-depression Yeah, I think we should grab these |
|
08:53
🔗
|
jodizzle |
Seems like it enumerates all the articles. |
|
08:53
🔗
|
|
jschwart has quit IRC (Konversation terminated!) |
|
08:54
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
08:57
🔗
|
jodizzle |
eientei95: Would it help if I scraped up all the articles from the sitemap into a list? |
|
08:57
🔗
|
jodizzle |
Or can you just throw archivebot directly at the sitemap? |
|
08:58
🔗
|
kiska |
Are the links available in the site? If so we can chuck the entire site into AB |
|
08:58
🔗
|
* |
Flashfire throws sitemap into archivebot |
|
08:58
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
08:58
🔗
|
jodizzle |
kiska: Well, my point is that I think I could get direct links to the articles if you want to avoid recursive crawls. |
|
08:59
🔗
|
jodizzle |
Like the stuff on this page: https://www.foreignaffairs.com/sitemap.xml?page=151 |
|
09:03
🔗
|
jodizzle |
Or to get the most crucial content faster or whatever. |
|
09:06
🔗
|
|
hdch has quit IRC (Remote host closed the connection) |
|
09:08
🔗
|
eientei95 |
jodizzle: Extracted the links from sitemap, sorted them in order from oldest to newest and chucked as !ao |
|
09:09
🔗
|
jodizzle |
eientei95: Cool. Though it looks like Flashfire threw the whole sitemap in as well. |
|
09:10
🔗
|
eientei95 |
Yeah, but that'd mean it gets queued |
|
09:10
🔗
|
eientei95 |
The ao job is already on the pipeline and is starting now |
|
09:11
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
09:13
🔗
|
jodizzle |
Is it okay to have totally redundant crawls like that? |
|
09:15
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
09:24
🔗
|
Flashfire |
Double the data double the fun |
|
09:24
🔗
|
Flashfire |
Plus the sitemap may grab other stuff |
|
09:29
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
09:32
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
09:36
🔗
|
|
TC01 has quit IRC (Read error: Operation timed out) |
|
09:39
🔗
|
|
TC01 has joined #archiveteam-bs |
|
09:51
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
|
10:07
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
10:09
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
10:10
🔗
|
|
alex___ has joined #archiveteam-bs |
|
10:14
🔗
|
|
schbirid has joined #archiveteam-bs |
|
10:19
🔗
|
eientei95 |
Flashfire: Yeah, the sitemap will grab other stuff, but the ao job will get what we need asap :P |
|
10:20
🔗
|
|
alex___ has quit IRC (Ping timeout: 260 seconds) |
|
10:21
🔗
|
|
alex___ has joined #archiveteam-bs |
|
10:22
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
10:26
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
10:31
🔗
|
JAA |
jodizzle: For urgent things, some duplication is fine. |
|
10:32
🔗
|
JAA |
Flashfire: FYI, there's no need to explicitly throw in /sitemap.xml. ArchiveBot automatically retrieves that anyway and extracts any links from it. |
|
10:32
🔗
|
JAA |
So just the homepage is fine. |
|
10:33
🔗
|
JAA |
(This doesn't necessarily apply when the sitemap is under a custom URL. If that URL isn't linked on the site or in robots.txt, ArchiveBot might not find it.) |
|
10:39
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
10:43
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
10:56
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
|
10:56
🔗
|
|
Mateon1 has joined #archiveteam-bs |
|
14:17
🔗
|
|
wp494 has quit IRC (Ping timeout: 633 seconds) |
|
14:17
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
|
14:18
🔗
|
|
wp494 has joined #archiveteam-bs |
|
14:41
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
|
14:56
🔗
|
|
Smiley has quit IRC (Ping timeout: 252 seconds) |
|
14:59
🔗
|
|
Smiley has joined #archiveteam-bs |
|
15:18
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
16:25
🔗
|
|
Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
|
17:24
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
|
18:11
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
|
18:34
🔗
|
|
Stilett0 has joined #archiveteam-bs |
|
18:38
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
|
19:07
🔗
|
|
hdch has joined #archiveteam-bs |
|
19:32
🔗
|
|
zeronet has joined #archiveteam-bs |
|
20:15
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
20:18
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
20:33
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
20:36
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
20:36
🔗
|
|
Kitaru has joined #archiveteam-bs |
|
20:49
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
20:52
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
20:58
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
|
21:06
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
21:06
🔗
|
|
icedice has joined #archiveteam-bs |
|
21:11
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
21:14
🔗
|
|
alex___ has quit IRC (Read error: Connection reset by peer) |
|
21:18
🔗
|
|
alex___ has joined #archiveteam-bs |
|
21:23
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
21:25
🔗
|
|
Pixi has quit IRC (Read error: Operation timed out) |
|
21:28
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
21:30
🔗
|
|
Pixi has joined #archiveteam-bs |
|
21:44
🔗
|
|
BlueMax has joined #archiveteam-bs |
|
22:16
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
22:18
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
22:26
🔗
|
|
hdch has quit IRC (Remote host closed the connection) |
|
22:26
🔗
|
|
hdch has joined #archiveteam-bs |
|
22:30
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
22:33
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
22:47
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
22:51
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
22:59
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
|
23:04
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
23:07
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
23:16
🔗
|
Flashfire |
We should grab stuff in relation to george papadopolous it looks like he is going to prison |
|
23:21
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
|
23:21
🔗
|
|
Isanami has joined #archiveteam-bs |
|
23:23
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
23:31
🔗
|
JAA |
I threw his Twitter account into ArchiveBot/chromebot. |
|
23:38
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
|
23:58
🔗
|
|
Kitaru has joined #archiveteam-bs |