#archiveteam-bs 2019-11-10,Sun

↑back Search

Time Nickname Message
00:07 🔗 i0npulse has quit IRC (Ping timeout: 252 seconds)
00:14 🔗 Zerote_ has joined #archiveteam-bs
00:14 🔗 Zerote_ has quit IRC (Read error: Connection reset by peer)
00:15 🔗 Zerote_ has joined #archiveteam-bs
00:15 🔗 Zerote has quit IRC (Leaving)
00:16 🔗 britmob has joined #archiveteam-bs
00:33 🔗 i0npulse has joined #archiveteam-bs
00:35 🔗 paul2520 would it be possible to get archivebot to crawl https://en.wikipedia.org/wiki/Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria? I was hoping the IA save-page-now would crawl the references, but it didn't...
00:52 🔗 LowLevelM has joined #archiveteam-bs
00:52 🔗 LowLevelM betamax: Thanks, there is probably enough warriors running urlteam right now
00:53 🔗 betamax well, we could always do with more people completing CAPTCHAs to help save Yahoo Groups (shutting down in December) : https://github.com/davidferguson/yahoogroups-joiner
00:53 🔗 betamax disclaimer: I'm helping coordinate that project
00:55 🔗 LowLevelM Sounds cool, I will do that.
00:55 🔗 LowLevelM Thanks
00:55 🔗 betamax channel is #yahoosucks
00:56 🔗 betamax and there's a leaderboard: http://tinyurl.com/ygleaders
00:56 🔗 jodizzle paul2520: You can run that page with !ao, and that should grab all the outgoing links.
00:57 🔗 jodizzle Also, IA save-page-now can do it now, I think, but you might have to select some tick boxes before saving the page.
01:12 🔗 Raccoon has quit IRC (Remote host closed the connection)
01:21 🔗 JAA jodizzle: !ao doesn't grab any links.
01:24 🔗 jodizzle Oh. Does it just get page requisites then?
01:25 🔗 JAA Yes
01:26 🔗 JAA You'll want to create a list of the outlinks and run those with !ao <.
01:38 🔗 markedL does wikiteam provide better coverage of wikipedia?
01:40 🔗 jodizzle paul2520: I threw the citation links in as an '!ao <' job
01:53 🔗 JAA There are regular dumps of all Wikimedia projects which are also uploaded to IA by someone on here I believe. And all Wikipedia outlinks get archived continuously as they're added to articles by IA.
01:53 🔗 JAA Or at least that used to be the case; haven't checked it in a while.
01:54 🔗 britmob I assume it was one of you, but that page is in the wayback machine now.
02:07 🔗 LowLevelM has quit IRC (Ping timeout: 260 seconds)
02:58 🔗 Raccoon has joined #archiveteam-bs
03:25 🔗 Ivy has joined #archiveteam-bs
03:26 🔗 manjaro-u has quit IRC (Ping timeout: 252 seconds)
03:28 🔗 paul2520 thanks jodizzle... the save-page-now has "all outlinks" as a checkbox, but I noticed many of the URLs in the references didn't appear even after everything seemed finished
03:29 🔗 paul2520 I appreciate you running the !ao... I was under the impression I do not have privileges to actually kick-off archivebot jobs myself. Should I try !ao for small, one-off requests like this in the future?
03:29 🔗 paul2520 ...or is there a way to get archivebot privileges?
03:34 🔗 jodizzle paul2520: You should be able to use '!ao' and '!ao <' without special privileges
03:35 🔗 jodizzle Using '!a' does require special privileges
03:37 🔗 jodizzle Someone with #archivebot ops would need to decide to give those privileges to you
04:05 🔗 bluefoo has quit IRC (Ping timeout: 252 seconds)
04:11 🔗 bluefoo has joined #archiveteam-bs
04:36 🔗 qw3rty has joined #archiveteam-bs
04:41 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
04:46 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
05:19 🔗 HP_Archiv has joined #archiveteam-bs
05:19 🔗 HP_Archiv Heya
05:19 🔗 HP_Archiv Anyone here?
05:47 🔗 godane so if anyone has twitter show you no images in firefox you just have to clean your cache history
06:31 🔗 bluefoo has quit IRC (Read error: Connection reset by peer)
06:32 🔗 killsushi has quit IRC (Quit: Leaving)
06:33 🔗 bluefoo has joined #archiveteam-bs
06:43 🔗 bluefoo has quit IRC (Quit: bluefoo)
06:49 🔗 bluefoo has joined #archiveteam-bs
07:48 🔗 bluefoo has quit IRC (Quit: bluefoo)
07:51 🔗 bluefoo has joined #archiveteam-bs
08:23 🔗 bluefoo has quit IRC (Ping timeout: 252 seconds)
08:35 🔗 Ivy has quit IRC (Quit: Connection closed for inactivity)
09:19 🔗 bluefoo has joined #archiveteam-bs
09:28 🔗 bluefoo has quit IRC (Ping timeout: 255 seconds)
09:29 🔗 bluefoo has joined #archiveteam-bs
10:00 🔗 HP_Archiv has quit IRC (Quit: Page closed)
10:01 🔗 HP_Archiv has joined #archiveteam-bs
11:02 🔗 bluefoo has quit IRC (Read error: Operation timed out)
11:20 🔗 HP_Archiv has quit IRC (Ping timeout: 260 seconds)
11:24 🔗 bluefoo has joined #archiveteam-bs
11:45 🔗 Smiley has quit IRC (Read error: Operation timed out)
11:48 🔗 Smiley has joined #archiveteam-bs
12:31 🔗 Smiley has quit IRC (Ping timeout: 496 seconds)
12:45 🔗 Smiley has joined #archiveteam-bs
13:33 🔗 BlueMax has quit IRC (Quit: Leaving)
15:04 🔗 SketchCow JAA: Agreed, and they're going to move.
15:04 🔗 SketchCow I just needed to get them into the system and functioning.
15:09 🔗 schbirid has joined #archiveteam-bs
15:16 🔗 Stilettoo has joined #archiveteam-bs
15:17 🔗 Ravenloft has joined #archiveteam-bs
15:24 🔗 manjaro-u has joined #archiveteam-bs
15:26 🔗 Stiletto has quit IRC (Ping timeout: 745 seconds)
15:36 🔗 odemgi has quit IRC (Remote host closed the connection)
15:38 🔗 SketchCow Looks like Archivebot backlog on FOS has diminished
15:51 🔗 Stiletto has joined #archiveteam-bs
15:51 🔗 Stilettoo has quit IRC (Read error: Operation timed out)
16:15 🔗 SketchCow And is now gone.
16:15 🔗 SketchCow Also, apparently my archivebot screenshotter service got reboot two weeks ago.
16:17 🔗 SketchCow Olympics items are now being moved out of archivebot, and into archiveteam_inbox, where they belong.
16:17 🔗 SketchCow Didn't follow my own rules!
16:17 🔗 SketchCow And, the added archivebot items through alternate pipelines in inbox are now being automatically redirected to archivebot.
16:18 🔗 JAA Sweet, thanks.
16:19 🔗 SketchCow I'll be creating collections/redirections for the inbox items that need a home.
16:20 🔗 SketchCow The screenshots in the archivebot are looking really sweet
16:21 🔗 SketchCow I have a fundamental question as to whether the way I current screenshot the items, the time it takes, can keep up with the speed at which new items arrive.
16:21 🔗 SketchCow Currently 7-10 arrive. I think that's more than this thing can do in a day.
16:22 🔗 Igloo We could always do it at upload SketchCow
16:23 🔗 SketchCow What, generate screenshots? It seems like a needless burden
16:23 🔗 SketchCow I'll put it this way
16:24 🔗 SketchCow If my thing sees something has screenshots, it skips and moves on. If someone is generating them and uploading them, then my thing won't do its work.
16:24 🔗 SketchCow It would call it a "archivebot 3.0 nicety"
16:25 🔗 SketchCow Like, if people are being archivebot pipelines, and then they run what I'm running to generate the screenshots (and I'm doing a pretty hacky thing) then include them... great.
16:25 🔗 SketchCow I just don't want to add more friction to an already 10% fragile process
16:26 🔗 SketchCow My fun little post-processing of items comes way after everyone is already using the data.
16:27 🔗 Igloo Makes sense, Hard for anyone to do it outside of IA's network due to needing to download the WARC
16:28 🔗 Igloo And the delays that come with it
16:28 🔗 SketchCow Yeah.
16:28 🔗 SketchCow I am downloading 5gb WARC sets to do this
16:28 🔗 SketchCow To generate a single screenshot
16:28 🔗 SketchCow Take that, carbon footprint
16:32 🔗 SketchCow I wrote a script to generate a list of the top 20-30 pages of the archivebot collection and do the items in there first, so the collection already looks pretty nice for 99% of the manual browsings that will happen in it.
16:32 🔗 SketchCow Brewster was happy, I was happy.
16:34 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
16:36 🔗 SketchCow https://archive.org/details/archiveteam_2018_olympics
16:43 🔗 SketchCow https://archive.org/details/archiveteam_24syv
16:45 🔗 eientei95 has quit IRC (Read error: Connection reset by peer)
16:45 🔗 eientei95 has joined #archiveteam-bs
16:50 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
16:51 🔗 mistym- has joined #archiveteam-bs
16:53 🔗 mistym has quit IRC (Ping timeout: 745 seconds)
16:55 🔗 SketchCow https://archive.org/details/archiveteam_yourshot
17:26 🔗 DogsRNice has joined #archiveteam-bs
17:32 🔗 apache2 has quit IRC (Remote host closed the connection)
17:32 🔗 Mateon1 has quit IRC (Write error: Broken pipe)
17:32 🔗 Mateon1 has joined #archiveteam-bs
17:32 🔗 apache2 has joined #archiveteam-bs
17:36 🔗 SketchCow https://archive.org/search.php?query=mediatype%3Acollection%20description%3A%2Aforthcoming%2A
17:36 🔗 SketchCow All of these are collections I need to add descriptions to
18:07 🔗 systwi_ is now known as systwi
18:30 🔗 X-Scale` has joined #archiveteam-bs
18:32 🔗 X-Scale has quit IRC (Ping timeout: 252 seconds)
18:32 🔗 X-Scale` is now known as X-Scale
18:44 🔗 odemgi has joined #archiveteam-bs
18:59 🔗 Raccoon what's the background history on "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD"
19:00 🔗 Raccoon btw, duckduckgo gives a few archiveteam results, then launches into some russian cp comment, before arriving at shakespeare :)
19:06 🔗 manjaro-u has joined #archiveteam-bs
19:22 🔗 sotty has joined #archiveteam-bs
19:56 🔗 X-Scale` has joined #archiveteam-bs
19:57 🔗 X-Scale has quit IRC (Ping timeout: 252 seconds)
19:57 🔗 X-Scale` is now known as X-Scale
20:10 🔗 legoktm has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
20:11 🔗 legoktm has joined #archiveteam-bs
20:22 🔗 schbirid has quit IRC (Quit: Leaving)
21:15 🔗 icedice has quit IRC (Ping timeout: 252 seconds)
21:21 🔗 wyatt8740 has joined #archiveteam-bs
21:29 🔗 icedice has joined #archiveteam-bs
22:10 🔗 Panasonic has joined #archiveteam-bs
22:36 🔗 paul2520 thanks jodizzle
22:47 🔗 paul2520 what did you do to get the https://transfer.notkiska.pw/oPAdn/wikipedia-Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria-citations.txt file?
22:51 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
22:54 🔗 JAA Gah, why are government sites always so awful?
22:55 🔗 JAA Looking into https://www.fbo.gov/ currently. It does POST requests, stores the search parameters in a session store I assume and then does pagination with GET and cookies. :-|
22:56 🔗 JAA It's being migrated to an SPA site. Not sure what's wrose.
22:56 🔗 JAA worse*
22:57 🔗 jodizzle Yeah, going through government websites is like taking a trip through different web design patterns.
22:57 🔗 jodizzle Probably a consequence of cheap contracting, in a lot of cases.
22:57 🔗 jodizzle Some of them are pretty nice though. No JS, lightweight
22:58 🔗 jodizzle paul2520: I did some work in a Python shell to request that Wikipedia page and fetch the citation links with CSS selectors.
22:59 🔗 JAA I'll be grabbing FBO with qwarc shortly, and it'll be a pain.
22:59 🔗 JAA I wonder how well the pagination of 3 million results will work.
23:08 🔗 paul2520 that sounds great jodizzle -- feel like putting it in a gist?
23:10 🔗 jodizzle paul2520: Sure, I can try and dig up what I did later.
23:26 🔗 BlueMax has joined #archiveteam-bs
23:56 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
23:56 🔗 BartoCH has quit IRC (Remote host closed the connection)
23:59 🔗 RichardG has joined #archiveteam-bs

irclogger-viewer