[00:07] *** i0npulse has quit IRC (Ping timeout: 252 seconds)
[00:14] *** Zerote_ has joined #archiveteam-bs
[00:14] *** Zerote_ has quit IRC (Read error: Connection reset by peer)
[00:15] *** Zerote_ has joined #archiveteam-bs
[00:15] *** Zerote has quit IRC (Leaving)
[00:16] *** britmob has joined #archiveteam-bs
[00:33] *** i0npulse has joined #archiveteam-bs
[00:35] <paul2520> would it be possible to get archivebot to crawl https://en.wikipedia.org/wiki/Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria? I was hoping the IA save-page-now would crawl the references, but it didn't...
[00:52] *** LowLevelM has joined #archiveteam-bs
[00:52] <LowLevelM> betamax: Thanks, there is probably enough warriors running urlteam right now
[00:53] <betamax> well, we could always do with more people completing CAPTCHAs to help save Yahoo Groups (shutting down in December) : https://github.com/davidferguson/yahoogroups-joiner
[00:53] <betamax> disclaimer: I'm helping coordinate that project
[00:55] <LowLevelM> Sounds cool, I will do that.
[00:55] <LowLevelM> Thanks
[00:55] <betamax> channel is #yahoosucks
[00:56] <betamax> and there's a leaderboard: http://tinyurl.com/ygleaders
[00:56] <jodizzle> paul2520: You can run that page with !ao, and that should grab all the outgoing links.
[00:57] <jodizzle> Also, IA save-page-now can do it now, I think, but you might have to select some tick boxes before saving the page.
[01:12] *** Raccoon has quit IRC (Remote host closed the connection)
[01:21] <JAA> jodizzle: !ao doesn't grab any links.
[01:24] <jodizzle> Oh.  Does it just get page requisites then?
[01:25] <JAA> Yes
[01:26] <JAA> You'll want to create a list of the outlinks and run those with !ao <.
[01:38] <markedL> does wikiteam provide better coverage of wikipedia? 
[01:40] <jodizzle> paul2520: I threw the citation links in as an '!ao <' job
[01:53] <JAA> There are regular dumps of all Wikimedia projects which are also uploaded to IA by someone on here I believe. And all Wikipedia outlinks get archived continuously as they're added to articles by IA.
[01:53] <JAA> Or at least that used to be the case; haven't checked it in a while.
[01:54] <britmob> I assume it was one of you, but that page is in the wayback machine now.
[02:07] *** LowLevelM has quit IRC (Ping timeout: 260 seconds)
[02:58] *** Raccoon has joined #archiveteam-bs
[03:25] *** Ivy has joined #archiveteam-bs
[03:26] *** manjaro-u has quit IRC (Ping timeout: 252 seconds)
[03:28] <paul2520> thanks jodizzle... the save-page-now has "all outlinks" as a checkbox, but I noticed many of the URLs in the references didn't appear even after everything seemed finished
[03:29] <paul2520> I appreciate you running the !ao... I was under the impression I do not have privileges to actually kick-off archivebot jobs myself. Should I try !ao for small, one-off requests like this in the future?
[03:29] <paul2520> ...or is there a way to get archivebot privileges?
[03:34] <jodizzle> paul2520: You should be able to use '!ao' and '!ao <' without special privileges
[03:35] <jodizzle> Using '!a' does require special privileges
[03:37] <jodizzle> Someone with #archivebot ops would need to decide to give those privileges to you
[04:05] *** bluefoo has quit IRC (Ping timeout: 252 seconds)
[04:11] *** bluefoo has joined #archiveteam-bs
[04:36] *** qw3rty has joined #archiveteam-bs
[04:41] *** DogsRNice has quit IRC (Read error: Connection reset by peer)
[04:46] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds)
[05:19] *** HP_Archiv has joined #archiveteam-bs
[05:19] <HP_Archiv> Heya
[05:19] <HP_Archiv> Anyone here?
[05:47] <godane> so if anyone has twitter show you no images in firefox you just have to clean your cache history
[06:31] *** bluefoo has quit IRC (Read error: Connection reset by peer)
[06:32] *** killsushi has quit IRC (Quit: Leaving)
[06:33] *** bluefoo has joined #archiveteam-bs
[06:43] *** bluefoo has quit IRC (Quit: bluefoo)
[06:49] *** bluefoo has joined #archiveteam-bs
[07:48] *** bluefoo has quit IRC (Quit: bluefoo)
[07:51] *** bluefoo has joined #archiveteam-bs
[08:23] *** bluefoo has quit IRC (Ping timeout: 252 seconds)
[08:35] *** Ivy has quit IRC (Quit: Connection closed for inactivity)
[09:19] *** bluefoo has joined #archiveteam-bs
[09:28] *** bluefoo has quit IRC (Ping timeout: 255 seconds)
[09:29] *** bluefoo has joined #archiveteam-bs
[10:00] *** HP_Archiv has quit IRC (Quit: Page closed)
[10:01] *** HP_Archiv has joined #archiveteam-bs
[11:02] *** bluefoo has quit IRC (Read error: Operation timed out)
[11:20] *** HP_Archiv has quit IRC (Ping timeout: 260 seconds)
[11:24] *** bluefoo has joined #archiveteam-bs
[11:45] *** Smiley has quit IRC (Read error: Operation timed out)
[11:48] *** Smiley has joined #archiveteam-bs
[12:31] *** Smiley has quit IRC (Ping timeout: 496 seconds)
[12:45] *** Smiley has joined #archiveteam-bs
[13:33] *** BlueMax has quit IRC (Quit: Leaving)
[15:04] <SketchCow> JAA: Agreed, and they're going to move.
[15:04] <SketchCow> I just needed to get them into the system and functioning.
[15:09] *** schbirid has joined #archiveteam-bs
[15:16] *** Stilettoo has joined #archiveteam-bs
[15:17] *** Ravenloft has joined #archiveteam-bs
[15:24] *** manjaro-u has joined #archiveteam-bs
[15:26] *** Stiletto has quit IRC (Ping timeout: 745 seconds)
[15:36] *** odemgi has quit IRC (Remote host closed the connection)
[15:38] <SketchCow> Looks like Archivebot backlog on FOS has diminished
[15:51] *** Stiletto has joined #archiveteam-bs
[15:51] *** Stilettoo has quit IRC (Read error: Operation timed out)
[16:15] <SketchCow> And is now gone.
[16:15] <SketchCow> Also, apparently my archivebot screenshotter service got reboot two weeks ago.
[16:17] <SketchCow> Olympics items are now being moved out of archivebot, and into archiveteam_inbox, where they belong.
[16:17] <SketchCow> Didn't follow my own rules!
[16:17] <SketchCow> And, the added archivebot items through alternate pipelines in inbox are now being automatically redirected to archivebot.
[16:18] <JAA> Sweet, thanks.
[16:19] <SketchCow> I'll be creating collections/redirections for the inbox items that need a home.
[16:20] <SketchCow> The screenshots in the archivebot are looking really sweet
[16:21] <SketchCow> I have a fundamental question as to whether the way I current screenshot the items, the time it takes, can keep up with the speed at which new items arrive.
[16:21] <SketchCow> Currently 7-10 arrive. I think that's more than this thing can do in a day.
[16:22] <Igloo> We could always do it at upload SketchCow
[16:23] <SketchCow> What, generate screenshots? It seems like a needless burden
[16:23] <SketchCow> I'll put it this way
[16:24] <SketchCow> If my thing sees something has screenshots, it skips and moves on. If someone is generating them and uploading them, then my thing won't do its work.
[16:24] <SketchCow> It would call it a "archivebot 3.0 nicety"
[16:25] <SketchCow> Like, if people are being archivebot pipelines, and then they run what I'm running to generate the screenshots (and I'm doing a pretty hacky thing) then include them... great.
[16:25] <SketchCow> I just don't want to add more friction to an already 10% fragile process
[16:26] <SketchCow> My fun little post-processing of items comes way after everyone is already using the data.
[16:27] <Igloo> Makes sense, Hard for anyone to do it outside of IA's network due to needing to download the WARC
[16:28] <Igloo> And the delays that come with it
[16:28] <SketchCow> Yeah.
[16:28] <SketchCow> I am downloading 5gb WARC sets to do this
[16:28] <SketchCow> To generate a single screenshot
[16:28] <SketchCow> Take that, carbon footprint
[16:32] <SketchCow> I wrote a script to generate a list of the top 20-30 pages of the archivebot collection and do the items in there first, so the collection already looks pretty nice for 99% of the manual browsings that will happen in it.
[16:32] <SketchCow> Brewster was happy, I was happy.
[16:34] *** manjaro-u has quit IRC (Quit: Konversation terminated!)
[16:36] <SketchCow> https://archive.org/details/archiveteam_2018_olympics
[16:43] <SketchCow> https://archive.org/details/archiveteam_24syv
[16:45] *** eientei95 has quit IRC (Read error: Connection reset by peer)
[16:45] *** eientei95 has joined #archiveteam-bs
[16:50] *** Ravenloft has quit IRC (Read error: Operation timed out)
[16:51] *** mistym- has joined #archiveteam-bs
[16:53] *** mistym has quit IRC (Ping timeout: 745 seconds)
[16:55] <SketchCow> https://archive.org/details/archiveteam_yourshot
[17:26] *** DogsRNice has joined #archiveteam-bs
[17:32] *** apache2 has quit IRC (Remote host closed the connection)
[17:32] *** Mateon1 has quit IRC (Write error: Broken pipe)
[17:32] *** Mateon1 has joined #archiveteam-bs
[17:32] *** apache2 has joined #archiveteam-bs
[17:36] <SketchCow> https://archive.org/search.php?query=mediatype%3Acollection%20description%3A%2Aforthcoming%2A
[17:36] <SketchCow> All of these are collections I need to add descriptions to
[18:07] *** systwi_ is now known as systwi
[18:30] *** X-Scale` has joined #archiveteam-bs
[18:32] *** X-Scale has quit IRC (Ping timeout: 252 seconds)
[18:32] *** X-Scale` is now known as X-Scale
[18:44] *** odemgi has joined #archiveteam-bs
[18:59] <Raccoon> what's the background history on "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD"
[19:00] <Raccoon> btw, duckduckgo gives a few archiveteam results, then launches into some russian cp comment, before arriving at shakespeare :)
[19:06] *** manjaro-u has joined #archiveteam-bs
[19:22] *** sotty has joined #archiveteam-bs
[19:56] *** X-Scale` has joined #archiveteam-bs
[19:57] *** X-Scale has quit IRC (Ping timeout: 252 seconds)
[19:57] *** X-Scale` is now known as X-Scale
[20:10] *** legoktm has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
[20:11] *** legoktm has joined #archiveteam-bs
[20:22] *** schbirid has quit IRC (Quit: Leaving)
[21:15] *** icedice has quit IRC (Ping timeout: 252 seconds)
[21:21] *** wyatt8740 has joined #archiveteam-bs
[21:29] *** icedice has joined #archiveteam-bs
[22:10] *** Panasonic has joined #archiveteam-bs
[22:36] <paul2520> thanks jodizzle 
[22:47] <paul2520> what did you do to get the https://transfer.notkiska.pw/oPAdn/wikipedia-Timeline_of_the_2019_Turkish_offensive_into_north-eastern_Syria-citations.txt file?
[22:51] *** wyatt8740 has quit IRC (Read error: Operation timed out)
[22:54] <JAA> Gah, why are government sites always so awful?
[22:55] <JAA> Looking into https://www.fbo.gov/ currently. It does POST requests, stores the search parameters in a session store I assume and then does pagination with GET and cookies. :-|
[22:56] <JAA> It's being migrated to an SPA site. Not sure what's wrose.
[22:56] <JAA> worse*
[22:57] <jodizzle> Yeah, going through government websites is like taking a trip through different web design patterns.
[22:57] <jodizzle> Probably a consequence of cheap contracting, in a lot of cases.
[22:57] <jodizzle> Some of them are pretty nice though.  No JS, lightweight
[22:58] <jodizzle> paul2520: I did some work in a Python shell to request that Wikipedia page and fetch the citation links with CSS selectors.
[22:59] <JAA> I'll be grabbing FBO with qwarc shortly, and it'll be a pain.
[22:59] <JAA> I wonder how well the pagination of 3 million results will work.
[23:08] <paul2520> that sounds great jodizzle -- feel like putting it in a gist?
[23:10] <jodizzle> paul2520: Sure, I can try and dig up what I did later.
[23:26] *** BlueMax has joined #archiveteam-bs
[23:56] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
[23:56] *** BartoCH has quit IRC (Remote host closed the connection)
[23:59] *** RichardG has joined #archiveteam-bs