Time |
Nickname |
Message |
00:07
🔗
|
|
i0npulse has quit IRC (Ping timeout: 252 seconds) |
00:14
🔗
|
|
Zerote_ has joined #archiveteam-bs |
00:14
🔗
|
|
Zerote_ has quit IRC (Read error: Connection reset by peer) |
00:15
🔗
|
|
Zerote_ has joined #archiveteam-bs |
00:15
🔗
|
|
Zerote has quit IRC (Leaving) |
00:16
🔗
|
|
britmob has joined #archiveteam-bs |
00:33
🔗
|
|
i0npulse has joined #archiveteam-bs |
00:35
🔗
|
paul2520 |
would it be possible to get archivebot to crawl https://en.wikipedia.org/wiki/Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria? I was hoping the IA save-page-now would crawl the references, but it didn't... |
00:52
🔗
|
|
LowLevelM has joined #archiveteam-bs |
00:52
🔗
|
LowLevelM |
betamax: Thanks, there is probably enough warriors running urlteam right now |
00:53
🔗
|
betamax |
well, we could always do with more people completing CAPTCHAs to help save Yahoo Groups (shutting down in December) : https://github.com/davidferguson/yahoogroups-joiner |
00:53
🔗
|
betamax |
disclaimer: I'm helping coordinate that project |
00:55
🔗
|
LowLevelM |
Sounds cool, I will do that. |
00:55
🔗
|
LowLevelM |
Thanks |
00:55
🔗
|
betamax |
channel is #yahoosucks |
00:56
🔗
|
betamax |
and there's a leaderboard: http://tinyurl.com/ygleaders |
00:56
🔗
|
jodizzle |
paul2520: You can run that page with !ao, and that should grab all the outgoing links. |
00:57
🔗
|
jodizzle |
Also, IA save-page-now can do it now, I think, but you might have to select some tick boxes before saving the page. |
01:12
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
01:21
🔗
|
JAA |
jodizzle: !ao doesn't grab any links. |
01:24
🔗
|
jodizzle |
Oh. Does it just get page requisites then? |
01:25
🔗
|
JAA |
Yes |
01:26
🔗
|
JAA |
You'll want to create a list of the outlinks and run those with !ao <. |
01:38
🔗
|
markedL |
does wikiteam provide better coverage of wikipedia? |
01:40
🔗
|
jodizzle |
paul2520: I threw the citation links in as an '!ao <' job |
01:53
🔗
|
JAA |
There are regular dumps of all Wikimedia projects which are also uploaded to IA by someone on here I believe. And all Wikipedia outlinks get archived continuously as they're added to articles by IA. |
01:53
🔗
|
JAA |
Or at least that used to be the case; haven't checked it in a while. |
01:54
🔗
|
britmob |
I assume it was one of you, but that page is in the wayback machine now. |
02:07
🔗
|
|
LowLevelM has quit IRC (Ping timeout: 260 seconds) |
02:58
🔗
|
|
Raccoon has joined #archiveteam-bs |
03:25
🔗
|
|
Ivy has joined #archiveteam-bs |
03:26
🔗
|
|
manjaro-u has quit IRC (Ping timeout: 252 seconds) |
03:28
🔗
|
paul2520 |
thanks jodizzle... the save-page-now has "all outlinks" as a checkbox, but I noticed many of the URLs in the references didn't appear even after everything seemed finished |
03:29
🔗
|
paul2520 |
I appreciate you running the !ao... I was under the impression I do not have privileges to actually kick-off archivebot jobs myself. Should I try !ao for small, one-off requests like this in the future? |
03:29
🔗
|
paul2520 |
...or is there a way to get archivebot privileges? |
03:34
🔗
|
jodizzle |
paul2520: You should be able to use '!ao' and '!ao <' without special privileges |
03:35
🔗
|
jodizzle |
Using '!a' does require special privileges |
03:37
🔗
|
jodizzle |
Someone with #archivebot ops would need to decide to give those privileges to you |
04:05
🔗
|
|
bluefoo has quit IRC (Ping timeout: 252 seconds) |
04:11
🔗
|
|
bluefoo has joined #archiveteam-bs |
04:36
🔗
|
|
qw3rty has joined #archiveteam-bs |
04:41
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
04:46
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
05:19
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
05:19
🔗
|
HP_Archiv |
Heya |
05:19
🔗
|
HP_Archiv |
Anyone here? |
05:47
🔗
|
godane |
so if anyone has twitter show you no images in firefox you just have to clean your cache history |
06:31
🔗
|
|
bluefoo has quit IRC (Read error: Connection reset by peer) |
06:32
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
06:33
🔗
|
|
bluefoo has joined #archiveteam-bs |
06:43
🔗
|
|
bluefoo has quit IRC (Quit: bluefoo) |
06:49
🔗
|
|
bluefoo has joined #archiveteam-bs |
07:48
🔗
|
|
bluefoo has quit IRC (Quit: bluefoo) |
07:51
🔗
|
|
bluefoo has joined #archiveteam-bs |
08:23
🔗
|
|
bluefoo has quit IRC (Ping timeout: 252 seconds) |
08:35
🔗
|
|
Ivy has quit IRC (Quit: Connection closed for inactivity) |
09:19
🔗
|
|
bluefoo has joined #archiveteam-bs |
09:28
🔗
|
|
bluefoo has quit IRC (Ping timeout: 255 seconds) |
09:29
🔗
|
|
bluefoo has joined #archiveteam-bs |
10:00
🔗
|
|
HP_Archiv has quit IRC (Quit: Page closed) |
10:01
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
11:02
🔗
|
|
bluefoo has quit IRC (Read error: Operation timed out) |
11:20
🔗
|
|
HP_Archiv has quit IRC (Ping timeout: 260 seconds) |
11:24
🔗
|
|
bluefoo has joined #archiveteam-bs |
11:45
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
11:48
🔗
|
|
Smiley has joined #archiveteam-bs |
12:31
🔗
|
|
Smiley has quit IRC (Ping timeout: 496 seconds) |
12:45
🔗
|
|
Smiley has joined #archiveteam-bs |
13:33
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
15:04
🔗
|
SketchCow |
JAA: Agreed, and they're going to move. |
15:04
🔗
|
SketchCow |
I just needed to get them into the system and functioning. |
15:09
🔗
|
|
schbirid has joined #archiveteam-bs |
15:16
🔗
|
|
Stilettoo has joined #archiveteam-bs |
15:17
🔗
|
|
Ravenloft has joined #archiveteam-bs |
15:24
🔗
|
|
manjaro-u has joined #archiveteam-bs |
15:26
🔗
|
|
Stiletto has quit IRC (Ping timeout: 745 seconds) |
15:36
🔗
|
|
odemgi has quit IRC (Remote host closed the connection) |
15:38
🔗
|
SketchCow |
Looks like Archivebot backlog on FOS has diminished |
15:51
🔗
|
|
Stiletto has joined #archiveteam-bs |
15:51
🔗
|
|
Stilettoo has quit IRC (Read error: Operation timed out) |
16:15
🔗
|
SketchCow |
And is now gone. |
16:15
🔗
|
SketchCow |
Also, apparently my archivebot screenshotter service got reboot two weeks ago. |
16:17
🔗
|
SketchCow |
Olympics items are now being moved out of archivebot, and into archiveteam_inbox, where they belong. |
16:17
🔗
|
SketchCow |
Didn't follow my own rules! |
16:17
🔗
|
SketchCow |
And, the added archivebot items through alternate pipelines in inbox are now being automatically redirected to archivebot. |
16:18
🔗
|
JAA |
Sweet, thanks. |
16:19
🔗
|
SketchCow |
I'll be creating collections/redirections for the inbox items that need a home. |
16:20
🔗
|
SketchCow |
The screenshots in the archivebot are looking really sweet |
16:21
🔗
|
SketchCow |
I have a fundamental question as to whether the way I current screenshot the items, the time it takes, can keep up with the speed at which new items arrive. |
16:21
🔗
|
SketchCow |
Currently 7-10 arrive. I think that's more than this thing can do in a day. |
16:22
🔗
|
Igloo |
We could always do it at upload SketchCow |
16:23
🔗
|
SketchCow |
What, generate screenshots? It seems like a needless burden |
16:23
🔗
|
SketchCow |
I'll put it this way |
16:24
🔗
|
SketchCow |
If my thing sees something has screenshots, it skips and moves on. If someone is generating them and uploading them, then my thing won't do its work. |
16:24
🔗
|
SketchCow |
It would call it a "archivebot 3.0 nicety" |
16:25
🔗
|
SketchCow |
Like, if people are being archivebot pipelines, and then they run what I'm running to generate the screenshots (and I'm doing a pretty hacky thing) then include them... great. |
16:25
🔗
|
SketchCow |
I just don't want to add more friction to an already 10% fragile process |
16:26
🔗
|
SketchCow |
My fun little post-processing of items comes way after everyone is already using the data. |
16:27
🔗
|
Igloo |
Makes sense, Hard for anyone to do it outside of IA's network due to needing to download the WARC |
16:28
🔗
|
Igloo |
And the delays that come with it |
16:28
🔗
|
SketchCow |
Yeah. |
16:28
🔗
|
SketchCow |
I am downloading 5gb WARC sets to do this |
16:28
🔗
|
SketchCow |
To generate a single screenshot |
16:28
🔗
|
SketchCow |
Take that, carbon footprint |
16:32
🔗
|
SketchCow |
I wrote a script to generate a list of the top 20-30 pages of the archivebot collection and do the items in there first, so the collection already looks pretty nice for 99% of the manual browsings that will happen in it. |
16:32
🔗
|
SketchCow |
Brewster was happy, I was happy. |
16:34
🔗
|
|
manjaro-u has quit IRC (Quit: Konversation terminated!) |
16:36
🔗
|
SketchCow |
https://archive.org/details/archiveteam_2018_olympics |
16:43
🔗
|
SketchCow |
https://archive.org/details/archiveteam_24syv |
16:45
🔗
|
|
eientei95 has quit IRC (Read error: Connection reset by peer) |
16:45
🔗
|
|
eientei95 has joined #archiveteam-bs |
16:50
🔗
|
|
Ravenloft has quit IRC (Read error: Operation timed out) |
16:51
🔗
|
|
mistym- has joined #archiveteam-bs |
16:53
🔗
|
|
mistym has quit IRC (Ping timeout: 745 seconds) |
16:55
🔗
|
SketchCow |
https://archive.org/details/archiveteam_yourshot |
17:26
🔗
|
|
DogsRNice has joined #archiveteam-bs |
17:32
🔗
|
|
apache2 has quit IRC (Remote host closed the connection) |
17:32
🔗
|
|
Mateon1 has quit IRC (Write error: Broken pipe) |
17:32
🔗
|
|
Mateon1 has joined #archiveteam-bs |
17:32
🔗
|
|
apache2 has joined #archiveteam-bs |
17:36
🔗
|
SketchCow |
https://archive.org/search.php?query=mediatype%3Acollection%20description%3A%2Aforthcoming%2A |
17:36
🔗
|
SketchCow |
All of these are collections I need to add descriptions to |
18:07
🔗
|
|
systwi_ is now known as systwi |
18:30
🔗
|
|
X-Scale` has joined #archiveteam-bs |
18:32
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
18:32
🔗
|
|
X-Scale` is now known as X-Scale |
18:44
🔗
|
|
odemgi has joined #archiveteam-bs |
18:59
🔗
|
Raccoon |
what's the background history on "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD" |
19:00
🔗
|
Raccoon |
btw, duckduckgo gives a few archiveteam results, then launches into some russian cp comment, before arriving at shakespeare :) |
19:06
🔗
|
|
manjaro-u has joined #archiveteam-bs |
19:22
🔗
|
|
sotty has joined #archiveteam-bs |
19:56
🔗
|
|
X-Scale` has joined #archiveteam-bs |
19:57
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
19:57
🔗
|
|
X-Scale` is now known as X-Scale |
20:10
🔗
|
|
legoktm has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
20:11
🔗
|
|
legoktm has joined #archiveteam-bs |
20:22
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:15
🔗
|
|
icedice has quit IRC (Ping timeout: 252 seconds) |
21:21
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
21:29
🔗
|
|
icedice has joined #archiveteam-bs |
22:10
🔗
|
|
Panasonic has joined #archiveteam-bs |
22:36
🔗
|
paul2520 |
thanks jodizzle |
22:47
🔗
|
paul2520 |
what did you do to get the https://transfer.notkiska.pw/oPAdn/wikipedia-Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria-citations.txt file? |
22:51
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
22:54
🔗
|
JAA |
Gah, why are government sites always so awful? |
22:55
🔗
|
JAA |
Looking into https://www.fbo.gov/ currently. It does POST requests, stores the search parameters in a session store I assume and then does pagination with GET and cookies. :-| |
22:56
🔗
|
JAA |
It's being migrated to an SPA site. Not sure what's wrose. |
22:56
🔗
|
JAA |
worse* |
22:57
🔗
|
jodizzle |
Yeah, going through government websites is like taking a trip through different web design patterns. |
22:57
🔗
|
jodizzle |
Probably a consequence of cheap contracting, in a lot of cases. |
22:57
🔗
|
jodizzle |
Some of them are pretty nice though. No JS, lightweight |
22:58
🔗
|
jodizzle |
paul2520: I did some work in a Python shell to request that Wikipedia page and fetch the citation links with CSS selectors. |
22:59
🔗
|
JAA |
I'll be grabbing FBO with qwarc shortly, and it'll be a pain. |
22:59
🔗
|
JAA |
I wonder how well the pagination of 3 million results will work. |
23:08
🔗
|
paul2520 |
that sounds great jodizzle -- feel like putting it in a gist? |
23:10
🔗
|
jodizzle |
paul2520: Sure, I can try and dig up what I did later. |
23:26
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:56
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
23:56
🔗
|
|
BartoCH has quit IRC (Remote host closed the connection) |
23:59
🔗
|
|
RichardG has joined #archiveteam-bs |