[00:01] *** enowaldo has joined #archiveteam-bs [00:10] *** icedice has joined #archiveteam-bs [00:12] *** enowaldo has quit IRC (Read error: Operation timed out) [00:16] *** DogsRNice has quit IRC (Quit: Leaving) [00:26] *** tomaspark has quit IRC (Read error: Connection reset by peer) [01:09] SketchCow: so this may be interesting to you [01:09] japanese manuals : http://gizport.jp/manual/common/6478 [01:10] http://outofprintarchive.com godane [01:15] i think i was grabbing those but cause of the remastering i stopped grabbing those [01:15] Remastering? [01:15] *** icedice has quit IRC (Ping timeout: 506 seconds) [01:16] *** Oddly2 has joined #archiveteam-bs [01:16] there doing rescans of magazine they scanned [01:17] *** tomaspark has joined #archiveteam-bs [01:32] *** Oddly2 has quit IRC (Ping timeout: 360 seconds) [01:51] so metadata can be grabbed from those manual pages [01:52] right now its manual brute force but i can change that with this : http://gizport.jp/manual/1/?id=80380 [01:53] even thought the manual/1/ should be manual/1797514/ i can get the right metadata with just manual/1/ [01:53] no need to get the product number right witch is good [01:54] *which [02:03] *** enowaldo has joined #archiveteam-bs [02:07] *** enowaldo has quit IRC (Read error: Operation timed out) [02:08] *** Zerote has quit IRC (Ping timeout: 600 seconds) [03:15] *** qw3rty117 has joined #archiveteam-bs [03:21] *** qw3rty116 has quit IRC (Read error: Operation timed out) [03:23] *** enowaldo has joined #archiveteam-bs [03:27] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [03:49] *** cfarquhar has quit IRC (The Lounge - https://thelounge.chat) [03:50] *** cfarquhar has joined #archiveteam-bs [03:52] *** odemgi_ has joined #archiveteam-bs [03:55] *** odemgi has quit IRC (Ping timeout: 252 seconds) [05:06] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat) [05:06] *** markedL has quit IRC (The Lounge - https://thelounge.chat) [05:06] *** markedL has joined #archiveteam-bs [05:06] *** asdf0101 has joined #archiveteam-bs [05:12] *** wp494 has quit IRC (Read error: Operation timed out) [05:12] *** wp494 has joined #archiveteam-bs [05:12] *** enowaldo has joined #archiveteam-bs [05:20] *** enowaldo has quit IRC (Read error: Operation timed out) [05:22] *** enowaldo has joined #archiveteam-bs [05:38] *** enowaldo has quit IRC (Read error: Operation timed out) [05:48] *** killsushi has quit IRC (Quit: Leaving) [05:54] *** killsushi has joined #archiveteam-bs [06:40] *** Zerote has joined #archiveteam-bs [07:40] *** benjins has quit IRC (Read error: Connection reset by peer) [08:13] *** enowaldo has joined #archiveteam-bs [08:13] *** Zerote has quit IRC (Read error: Operation timed out) [08:16] *** enowaldo has quit IRC (Read error: Connection reset by peer) [08:19] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [08:24] *** Zerote has joined #archiveteam-bs [08:28] *** killsushi has quit IRC (Quit: Leaving) [09:08] *** Gfy has quit IRC (Ping timeout: 265 seconds) [09:11] *** BlueMax has quit IRC (Quit: Leaving) [09:47] *** eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in) [09:54] *** eientei95 has joined #archiveteam-bs [11:12] *** VerifiedJ has joined #archiveteam-bs [11:18] *** Despatche has joined #archiveteam-bs [11:23] *** Oddly has joined #archiveteam-bs [11:50] *** enowaldo has joined #archiveteam-bs [11:51] ok i found a better way to grab metadata from these pdfs [11:52] these pdfs have title and author metadata in them so i grab the info from that [11:53] it may not be the best all the time but better then nothing [11:53] also these are going be called japanese manual $number : $title1 or something [11:55] *** enowaldo_ has joined #archiveteam-bs [11:57] *** enowaldo has quit IRC (Read error: Operation timed out) [12:11] Flashfire: https://transfer.notkiska.pw/NE9ZZ/outofprintarchive.txt There's the file links to all of outofprintarchive's scans [12:56] *** enowaldo_ has quit IRC (Ping timeout: 252 seconds) [13:07] SketchCow: i'm starting to upload some of the japanese manuals : https://archive.org/details/japanese-manual-2549 [13:07] note that not all are japanese language cause i have noticed some english manuals on the website [13:08] but most are japanese [13:10] *** wyatt8740 has joined #archiveteam-bs [13:26] so here where all the manuals are going : https://archive.org/search.php?query=subject%3A%22japanese+manuals%22 [13:27] there in godaneinbox for now but i put a keyword so people can see what i have added [13:54] *** Gfy has joined #archiveteam-bs [14:12] *** odemgi_ has quit IRC (Read error: Connection reset by peer) [14:12] *** odemgi_ has joined #archiveteam-bs [14:18] *** enowaldo has joined #archiveteam-bs [14:23] *** tomaspark has quit IRC (Read error: Operation timed out) [14:24] *** tomaspark has joined #archiveteam-bs [14:45] *** wyatt8740 has quit IRC (Read error: Operation timed out) [15:00] *** marked1 has quit IRC (Read error: Operation timed out) [15:01] *** enowaldo has quit IRC (Read error: Operation timed out) [15:08] *** marked1 has joined #archiveteam-bs [15:12] *** Zerote has quit IRC (Ping timeout: 600 seconds) [15:38] *** Zerote has joined #archiveteam-bs [16:28] JAA: I see you noted the issue of infinite loops when archiving twitter with snscrape recently (https://github.com/JustAnotherArchivist/snscrape/issues/40) [16:29] any known workarounds? (I have a list of ~450 EU election-related twitter accounts, and am now only realising how much I depend on snscrape!) [16:29] Yup, and fixed it. I guess I should release a new version though. Maybe I'll do that later today. [16:29] You can install snscrape from git if you're in a hurry. [16:31] Is that the "pip3 install git+...." method of installation? [16:31] Yep [16:31] Thanks, will try it [16:31] You might have to uninstall it first, not sure how pip handles that exactly. [16:32] (And --user if you don't want a system-wide install and aren't using pyenv.) [16:32] Also, I can't mention enough what a great tool snscrape is - fantastic work! [16:32] Thanks, glad it's proving useful. :-) [16:32] It can now also extract Twitter outlinks, by the way. [16:33] At risk of causing wrath as I haven't yet looked at the docs, how do I do that? [16:33] On another note, please add those Twitter accounts to the relevant wiki pages under https://archiveteam.org/index.php?title=ArchiveBot/2019_European_Union_parliamentary_elections [16:33] Haha, cute, thinking there are docs. [16:34] ;-) [16:34] Check out https://github.com/JustAnotherArchivist/little-things/blob/master/snscrape-twitter-user [16:36] Ah yes - will add the accounts to the wiki ASAP [16:37] On another note, if I had a list of ~170 candidate sites (ie: personal campaign sites for a specific candidate) and wanted to do a full (recursive) crawl with archivebot, what's the best way to feed in all 170-odd sites? [16:37] Or is that generally an overload and I should archive (and generate warcs) locally, and upload to IA after (although they won't get in wayback that way) [16:48] Oh, awesome! [16:49] Well, we can do !a < (undocumented, ops-only, do not use unless you know what you're doing), but then that job will get huge and might not finish in time. I don't know how much we can handle through AB in time. One way to find out. [16:50] Candidates should probably get a separate page per country. [16:53] Wait no, that can go on the "election" page for each country. [16:55] Seeing as I don't have ops (or even voice right now) in archivebot, I've stuck the links (164 total) in pastebin for you/someone with ops to do: https://pastebin.com/raw/x3FCE89w [17:02] I'll start feeding them through ArchiveBot and see how quickly that goes. All of those are UK candidates I presume? [17:05] *** HashbangI has quit IRC (Remote host closed the connection) [17:06] Yes, all UK [17:08] If there is an archivebot regex that can deal with wordpress-style calendars (where each day is it's own page, even if there are no events for that day, streatching back for years and years) then I'd recommend using that - those calendars were a real problem when trying to archive candidate websites in the US midterms [17:08] *deal with => ignore [17:21] *** HashbangI has joined #archiveteam-bs [18:27] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [18:47] *** JH88 has joined #archiveteam-bs [19:15] *** DogsRNice has joined #archiveteam-bs [19:30] *** enowaldo has joined #archiveteam-bs [19:50] Hey, is there a way to archive things that link to Dropbox? The case I just saw was here: https://languagelog.ldc.upenn.edu/nll/?p=42639 [19:50] *** zhongfu has quit IRC (Ping timeout: 615 seconds) [19:52] eythian: it's excluded from WBM [19:52] but archivebot probably still will get it [20:01] Depends on how you queue it. [20:01] I don't think I can drive archivebot in a way that'll fetch it [20:01] If you !ao a Dropbox ?dl=1 URL, it won't grab the file. [20:02] If you !a a site that links to Dropbox ?dl=1 URLs, it should grab the files. [20:03] This links to a bitly link that links to a dl=0 URL [20:03] Should be a law... :) [20:28] *** benjins has joined #archiveteam-bs [21:02] *** jut has quit IRC (Ping timeout: 252 seconds) [21:05] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [21:15] *** VerifiedJ has quit IRC (Quit: Leaving) [21:29] *** jut has joined #archiveteam-bs [21:42] *** BartoCH has joined #archiveteam-bs [21:42] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [21:57] *** killsushi has joined #archiveteam-bs [22:32] *** godane has quit IRC (Ping timeout: 268 seconds) [22:48] *** godane has joined #archiveteam-bs [22:56] *** jut has quit IRC (Ping timeout: 252 seconds) [22:57] *** enowaldo has joined #archiveteam-bs [23:06] *** wp494 has quit IRC (Ping timeout: 268 seconds) [23:08] *** wp494 has joined #archiveteam-bs [23:11] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [23:27] *** BlueMax has joined #archiveteam-bs [23:40] *** Oddly has quit IRC (Read error: Operation timed out)