Time |
Nickname |
Message |
00:23
🔗
|
|
icedice has quit IRC (Leaving) |
00:29
🔗
|
|
hdch has joined #archiveteam-bs |
00:33
🔗
|
|
hdch has quit IRC (Client Quit) |
00:33
🔗
|
|
headacheb has joined #archiveteam-bs |
00:33
🔗
|
|
headacheb has quit IRC (Remote host closed the connection) |
00:34
🔗
|
|
hdch has joined #archiveteam-bs |
01:31
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
01:58
🔗
|
|
Kitaru has joined #archiveteam-bs |
02:36
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
02:37
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
02:42
🔗
|
|
Sk1d has joined #archiveteam-bs |
02:45
🔗
|
|
dashcloud has quit IRC (Ping timeout: 268 seconds) |
02:58
🔗
|
|
Kitaru has quit IRC (Read error: Connection reset by peer) |
02:58
🔗
|
|
Kitaru has joined #archiveteam-bs |
03:05
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
03:05
🔗
|
|
Kitaru has joined #archiveteam-bs |
03:41
🔗
|
|
BlueMax has joined #archiveteam-bs |
03:54
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
04:25
🔗
|
|
odemgi has joined #archiveteam-bs |
04:25
🔗
|
|
qw3rty113 has joined #archiveteam-bs |
04:28
🔗
|
|
odemgi_ has quit IRC (Read error: Operation timed out) |
04:28
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
04:32
🔗
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
04:41
🔗
|
|
odemg has joined #archiveteam-bs |
05:20
🔗
|
|
Kitaru has joined #archiveteam-bs |
05:37
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
05:40
🔗
|
|
Sk1d has joined #archiveteam-bs |
05:41
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
05:46
🔗
|
|
Kitaru has joined #archiveteam-bs |
06:33
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
06:37
🔗
|
|
Sk1d has joined #archiveteam-bs |
07:11
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
07:41
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
07:46
🔗
|
|
Sk1d has joined #archiveteam-bs |
07:54
🔗
|
|
alex___ has quit IRC (Quit: ZZzzz) |
08:10
🔗
|
|
icedice has joined #archiveteam-bs |
08:44
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
08:50
🔗
|
jodizzle |
eientei95: When I actually go to the lists though it seems like there's not much available? |
08:51
🔗
|
eientei95 |
Yeah, there's nothing on much in the lists, but there's a lot of content that's available, such as that 1930 article I chucked as !ao |
08:51
🔗
|
eientei95 |
* 1932 |
08:51
🔗
|
eientei95 |
https://www.foreignaffairs.com/articles/japan/1932-07-01/memoirs-viscount-ishii |
08:51
🔗
|
jodizzle |
Ahhh okay. |
08:52
🔗
|
jodizzle |
eientei95: Seems like there's a really robust sitemap.xml: https://www.foreignaffairs.com/sitemap.xml |
08:53
🔗
|
eientei95 |
http://web.archive.org/web/20151030142558/https://www.foreignaffairs.com/articles/united-states/1932-07-01/great-depression Yeah, I think we should grab these |
08:53
🔗
|
jodizzle |
Seems like it enumerates all the articles. |
08:53
🔗
|
|
jschwart has quit IRC (Konversation terminated!) |
08:54
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
08:57
🔗
|
jodizzle |
eientei95: Would it help if I scraped up all the articles from the sitemap into a list? |
08:57
🔗
|
jodizzle |
Or can you just throw archivebot directly at the sitemap? |
08:58
🔗
|
kiska |
Are the links available in the site? If so we can chuck the entire site into AB |
08:58
🔗
|
* |
Flashfire throws sitemap into archivebot |
08:58
🔗
|
|
Sk1d has joined #archiveteam-bs |
08:58
🔗
|
jodizzle |
kiska: Well, my point is that I think I could get direct links to the articles if you want to avoid recursive crawls. |
08:59
🔗
|
jodizzle |
Like the stuff on this page: https://www.foreignaffairs.com/sitemap.xml?page=151 |
09:03
🔗
|
jodizzle |
Or to get the most crucial content faster or whatever. |
09:06
🔗
|
|
hdch has quit IRC (Remote host closed the connection) |
09:08
🔗
|
eientei95 |
jodizzle: Extracted the links from sitemap, sorted them in order from oldest to newest and chucked as !ao |
09:09
🔗
|
jodizzle |
eientei95: Cool. Though it looks like Flashfire threw the whole sitemap in as well. |
09:10
🔗
|
eientei95 |
Yeah, but that'd mean it gets queued |
09:10
🔗
|
eientei95 |
The ao job is already on the pipeline and is starting now |
09:11
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
09:13
🔗
|
jodizzle |
Is it okay to have totally redundant crawls like that? |
09:15
🔗
|
|
Sk1d has joined #archiveteam-bs |
09:24
🔗
|
Flashfire |
Double the data double the fun |
09:24
🔗
|
Flashfire |
Plus the sitemap may grab other stuff |
09:29
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
09:32
🔗
|
|
Sk1d has joined #archiveteam-bs |
09:36
🔗
|
|
TC01 has quit IRC (Read error: Operation timed out) |
09:39
🔗
|
|
TC01 has joined #archiveteam-bs |
09:51
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
10:07
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
10:09
🔗
|
|
Sk1d has joined #archiveteam-bs |
10:10
🔗
|
|
alex___ has joined #archiveteam-bs |
10:14
🔗
|
|
schbirid has joined #archiveteam-bs |
10:19
🔗
|
eientei95 |
Flashfire: Yeah, the sitemap will grab other stuff, but the ao job will get what we need asap :P |
10:20
🔗
|
|
alex___ has quit IRC (Ping timeout: 260 seconds) |
10:21
🔗
|
|
alex___ has joined #archiveteam-bs |
10:22
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
10:26
🔗
|
|
Sk1d has joined #archiveteam-bs |
10:31
🔗
|
JAA |
jodizzle: For urgent things, some duplication is fine. |
10:32
🔗
|
JAA |
Flashfire: FYI, there's no need to explicitly throw in /sitemap.xml. ArchiveBot automatically retrieves that anyway and extracts any links from it. |
10:32
🔗
|
JAA |
So just the homepage is fine. |
10:33
🔗
|
JAA |
(This doesn't necessarily apply when the sitemap is under a custom URL. If that URL isn't linked on the site or in robots.txt, ArchiveBot might not find it.) |
10:39
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
10:43
🔗
|
|
Sk1d has joined #archiveteam-bs |
10:56
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
10:56
🔗
|
|
Mateon1 has joined #archiveteam-bs |
14:17
🔗
|
|
wp494 has quit IRC (Ping timeout: 633 seconds) |
14:17
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
14:18
🔗
|
|
wp494 has joined #archiveteam-bs |
14:41
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
14:56
🔗
|
|
Smiley has quit IRC (Ping timeout: 252 seconds) |
14:59
🔗
|
|
Smiley has joined #archiveteam-bs |
15:18
🔗
|
|
dashcloud has joined #archiveteam-bs |
16:25
🔗
|
|
Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
17:24
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
18:11
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
18:34
🔗
|
|
Stilett0 has joined #archiveteam-bs |
18:38
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
19:07
🔗
|
|
hdch has joined #archiveteam-bs |
19:32
🔗
|
|
zeronet has joined #archiveteam-bs |
20:15
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
20:18
🔗
|
|
Sk1d has joined #archiveteam-bs |
20:33
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
20:36
🔗
|
|
Sk1d has joined #archiveteam-bs |
20:36
🔗
|
|
Kitaru has joined #archiveteam-bs |
20:49
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
20:52
🔗
|
|
Sk1d has joined #archiveteam-bs |
20:58
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
21:06
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
21:06
🔗
|
|
icedice has joined #archiveteam-bs |
21:11
🔗
|
|
Sk1d has joined #archiveteam-bs |
21:14
🔗
|
|
alex___ has quit IRC (Read error: Connection reset by peer) |
21:18
🔗
|
|
alex___ has joined #archiveteam-bs |
21:23
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
21:25
🔗
|
|
Pixi has quit IRC (Read error: Operation timed out) |
21:28
🔗
|
|
Sk1d has joined #archiveteam-bs |
21:30
🔗
|
|
Pixi has joined #archiveteam-bs |
21:44
🔗
|
|
BlueMax has joined #archiveteam-bs |
22:16
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
22:18
🔗
|
|
Sk1d has joined #archiveteam-bs |
22:26
🔗
|
|
hdch has quit IRC (Remote host closed the connection) |
22:26
🔗
|
|
hdch has joined #archiveteam-bs |
22:30
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
22:33
🔗
|
|
Sk1d has joined #archiveteam-bs |
22:47
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
22:51
🔗
|
|
Sk1d has joined #archiveteam-bs |
22:59
🔗
|
|
Kitaru has quit IRC (Quit: This computer has gone to sleep) |
23:04
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
23:07
🔗
|
|
Sk1d has joined #archiveteam-bs |
23:16
🔗
|
Flashfire |
We should grab stuff in relation to george papadopolous it looks like he is going to prison |
23:21
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
23:21
🔗
|
|
Isanami has joined #archiveteam-bs |
23:23
🔗
|
|
Sk1d has joined #archiveteam-bs |
23:31
🔗
|
JAA |
I threw his Twitter account into ArchiveBot/chromebot. |
23:38
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
23:58
🔗
|
|
Kitaru has joined #archiveteam-bs |