[00:01] *** enowaldo has joined #archiveteam-bs
[00:10] *** icedice has joined #archiveteam-bs
[00:12] *** enowaldo has quit IRC (Read error: Operation timed out)
[00:16] *** DogsRNice has quit IRC (Quit: Leaving)
[00:26] *** tomaspark has quit IRC (Read error: Connection reset by peer)
[01:09] <godane> SketchCow: so this may be interesting to you
[01:09] <godane> japanese manuals : http://gizport.jp/manual/common/6478
[01:10] <Flashfire> http://outofprintarchive.com godane
[01:15] <godane> i think i was grabbing those but cause of the remastering i stopped grabbing those
[01:15] <Flashfire> Remastering?
[01:15] *** icedice has quit IRC (Ping timeout: 506 seconds)
[01:16] *** Oddly2 has joined #archiveteam-bs
[01:16] <godane> there doing rescans of magazine they scanned
[01:17] *** tomaspark has joined #archiveteam-bs
[01:32] *** Oddly2 has quit IRC (Ping timeout: 360 seconds)
[01:51] <godane> so metadata can be grabbed from those manual pages
[01:52] <godane> right now its manual brute force but i can change that with this : http://gizport.jp/manual/1/?id=80380
[01:53] <godane> even thought the manual/1/ should be manual/1797514/ i can get the right metadata with just manual/1/
[01:53] <godane> no need to get the product number right witch is good
[01:54] <godane> *which
[02:03] *** enowaldo has joined #archiveteam-bs
[02:07] *** enowaldo has quit IRC (Read error: Operation timed out)
[02:08] *** Zerote has quit IRC (Ping timeout: 600 seconds)
[03:15] *** qw3rty117 has joined #archiveteam-bs
[03:21] *** qw3rty116 has quit IRC (Read error: Operation timed out)
[03:23] *** enowaldo has joined #archiveteam-bs
[03:27] *** enowaldo has quit IRC (Ping timeout: 252 seconds)
[03:49] *** cfarquhar has quit IRC (The Lounge - https://thelounge.chat)
[03:50] *** cfarquhar has joined #archiveteam-bs
[03:52] *** odemgi_ has joined #archiveteam-bs
[03:55] *** odemgi has quit IRC (Ping timeout: 252 seconds)
[05:06] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
[05:06] *** markedL has quit IRC (The Lounge - https://thelounge.chat)
[05:06] *** markedL has joined #archiveteam-bs
[05:06] *** asdf0101 has joined #archiveteam-bs
[05:12] *** wp494 has quit IRC (Read error: Operation timed out)
[05:12] *** wp494 has joined #archiveteam-bs
[05:12] *** enowaldo has joined #archiveteam-bs
[05:20] *** enowaldo has quit IRC (Read error: Operation timed out)
[05:22] *** enowaldo has joined #archiveteam-bs
[05:38] *** enowaldo has quit IRC (Read error: Operation timed out)
[05:48] *** killsushi has quit IRC (Quit: Leaving)
[05:54] *** killsushi has joined #archiveteam-bs
[06:40] *** Zerote has joined #archiveteam-bs
[07:40] *** benjins has quit IRC (Read error: Connection reset by peer)
[08:13] *** enowaldo has joined #archiveteam-bs
[08:13] *** Zerote has quit IRC (Read error: Operation timed out)
[08:16] *** enowaldo has quit IRC (Read error: Connection reset by peer)
[08:19] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer)
[08:24] *** Zerote has joined #archiveteam-bs
[08:28] *** killsushi has quit IRC (Quit: Leaving)
[09:08] *** Gfy has quit IRC (Ping timeout: 265 seconds)
[09:11] *** BlueMax has quit IRC (Quit: Leaving)
[09:47] *** eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in)
[09:54] *** eientei95 has joined #archiveteam-bs
[11:12] *** VerifiedJ has joined #archiveteam-bs
[11:18] *** Despatche has joined #archiveteam-bs
[11:23] *** Oddly has joined #archiveteam-bs
[11:50] *** enowaldo has joined #archiveteam-bs
[11:51] <godane> ok i found a better way to grab metadata from these pdfs
[11:52] <godane> these pdfs have title and author metadata in them so i grab the info from that
[11:53] <godane> it may not be the best all the time but better then nothing
[11:53] <godane> also these are going be called japanese manual $number : $title1 or something
[11:55] *** enowaldo_ has joined #archiveteam-bs
[11:57] *** enowaldo has quit IRC (Read error: Operation timed out)
[12:11] <eientei95> Flashfire: https://transfer.notkiska.pw/NE9ZZ/outofprintarchive.txt There's the file links to all of outofprintarchive's scans
[12:56] *** enowaldo_ has quit IRC (Ping timeout: 252 seconds)
[13:07] <godane> SketchCow: i'm starting to upload some of the japanese manuals : https://archive.org/details/japanese-manual-2549
[13:07] <godane> note that not all are japanese language cause i have noticed some english manuals on the website
[13:08] <godane> but most are japanese
[13:10] *** wyatt8740 has joined #archiveteam-bs
[13:26] <godane> so here where all the manuals are going : https://archive.org/search.php?query=subject%3A%22japanese+manuals%22
[13:27] <godane> there in godaneinbox for now but i put a keyword so people can see what i have added
[13:54] *** Gfy has joined #archiveteam-bs
[14:12] *** odemgi_ has quit IRC (Read error: Connection reset by peer)
[14:12] *** odemgi_ has joined #archiveteam-bs
[14:18] *** enowaldo has joined #archiveteam-bs
[14:23] *** tomaspark has quit IRC (Read error: Operation timed out)
[14:24] *** tomaspark has joined #archiveteam-bs
[14:45] *** wyatt8740 has quit IRC (Read error: Operation timed out)
[15:00] *** marked1 has quit IRC (Read error: Operation timed out)
[15:01] *** enowaldo has quit IRC (Read error: Operation timed out)
[15:08] *** marked1 has joined #archiveteam-bs
[15:12] *** Zerote has quit IRC (Ping timeout: 600 seconds)
[15:38] *** Zerote has joined #archiveteam-bs
[16:28] <betamax> JAA: I see you noted the issue of infinite loops when archiving twitter with snscrape recently (https://github.com/JustAnotherArchivist/snscrape/issues/40)
[16:29] <betamax> any known workarounds? (I have a list of ~450 EU election-related twitter accounts, and am now only realising how much I depend on snscrape!)
[16:29] <JAA> Yup, and fixed it. I guess I should release a new version though. Maybe I'll do that later today.
[16:29] <JAA> You can install snscrape from git if you're in a hurry.
[16:31] <betamax> Is that the "pip3 install git+...." method of installation?
[16:31] <JAA> Yep
[16:31] <betamax> Thanks, will try it
[16:31] <JAA> You might have to uninstall it first, not sure how pip handles that exactly.
[16:32] <JAA> (And --user if you don't want a system-wide install and aren't using pyenv.)
[16:32] <betamax> Also, I can't mention enough what a great tool snscrape is - fantastic work!
[16:32] <JAA> Thanks, glad it's proving useful. :-)
[16:32] <JAA> It can now also extract Twitter outlinks, by the way.
[16:33] <betamax> At risk of causing wrath as I haven't yet looked at the docs, how do I do that?
[16:33] <JAA> On another note, please add those Twitter accounts to the relevant wiki pages under https://archiveteam.org/index.php?title=ArchiveBot/2019_European_Union_parliamentary_elections
[16:33] <JAA> Haha, cute, thinking there are docs.
[16:34] <JAA> ;-)
[16:34] <JAA> Check out https://github.com/JustAnotherArchivist/little-things/blob/master/snscrape-twitter-user
[16:36] <betamax> Ah yes - will add the accounts to the wiki ASAP
[16:37] <betamax> On another note, if I had a list of ~170 candidate sites (ie: personal campaign sites for a specific candidate) and wanted to do a full (recursive) crawl with archivebot, what's the best way to feed in all 170-odd sites?
[16:37] <betamax> Or is that generally an overload and I should archive (and generate warcs) locally, and upload to IA after (although they won't get in wayback that way)
[16:48] <JAA> Oh, awesome!
[16:49] <JAA> Well, we can do !a < (undocumented, ops-only, do not use unless you know what you're doing), but then that job will get huge and might not finish in time. I don't know how much we can handle through AB in time. One way to find out.
[16:50] <JAA> Candidates should probably get a separate page per country.
[16:53] <JAA> Wait no, that can go on the "election" page for each country.
[16:55] <betamax> Seeing as I don't have ops (or even voice right now) in archivebot, I've stuck the links (164 total) in pastebin for you/someone with ops to do: https://pastebin.com/raw/x3FCE89w
[17:02] <JAA> I'll start feeding them through ArchiveBot and see how quickly that goes. All of those are UK candidates I presume?
[17:05] *** HashbangI has quit IRC (Remote host closed the connection)
[17:06] <betamax> Yes, all UK
[17:08] <betamax> If there is an archivebot regex that can deal with wordpress-style calendars (where each day is it's own page, even if there are no events for that day, streatching back for years and years) then I'd recommend using that - those calendars were a real problem when trying to archive candidate websites in the US midterms
[17:08] <betamax> *deal with => ignore
[17:21] *** HashbangI has joined #archiveteam-bs
[18:27] *** fuzzy8021 has quit IRC (Read error: Operation timed out)
[18:47] *** JH88 has joined #archiveteam-bs
[19:15] *** DogsRNice has joined #archiveteam-bs
[19:30] *** enowaldo has joined #archiveteam-bs
[19:50] <eythian> Hey, is there a way to archive things that link to Dropbox? The case I just saw was here: https://languagelog.ldc.upenn.edu/nll/?p=42639
[19:50] *** zhongfu has quit IRC (Ping timeout: 615 seconds)
[19:52] <ealgase> eythian: it's excluded from WBM
[19:52] <ealgase> but archivebot probably still will get it
[20:01] <JAA> Depends on how you queue it.
[20:01] <eythian> I don't think I can drive archivebot in a way that'll fetch it
[20:01] <JAA> If you !ao a Dropbox ?dl=1 URL, it won't grab the file.
[20:02] <JAA> If you !a a site that links to Dropbox ?dl=1 URLs, it should grab the files.
[20:03] <eythian> This links to a bitly link that links to a dl=0 URL
[20:03] <eythian> Should be a law... :)
[20:28] *** benjins has joined #archiveteam-bs
[21:02] *** jut has quit IRC (Ping timeout: 252 seconds)
[21:05] *** BartoCH has quit IRC (Ping timeout: 615 seconds)
[21:15] *** VerifiedJ has quit IRC (Quit: Leaving)
[21:29] *** jut has joined #archiveteam-bs
[21:42] *** BartoCH has joined #archiveteam-bs
[21:42] *** enowaldo has quit IRC (Ping timeout: 252 seconds)
[21:57] *** killsushi has joined #archiveteam-bs
[22:32] *** godane has quit IRC (Ping timeout: 268 seconds)
[22:48] *** godane has joined #archiveteam-bs
[22:56] *** jut has quit IRC (Ping timeout: 252 seconds)
[22:57] *** enowaldo has joined #archiveteam-bs
[23:06] *** wp494 has quit IRC (Ping timeout: 268 seconds)
[23:08] *** wp494 has joined #archiveteam-bs
[23:11] *** enowaldo has quit IRC (Ping timeout: 492 seconds)
[23:27] *** BlueMax has joined #archiveteam-bs
[23:40] *** Oddly has quit IRC (Read error: Operation timed out)