#archiveteam-bs 2019-05-19,Sun

↑back Search

Time Nickname Message
00:01 🔗 enowaldo has joined #archiveteam-bs
00:10 🔗 icedice has joined #archiveteam-bs
00:12 🔗 enowaldo has quit IRC (Read error: Operation timed out)
00:16 🔗 DogsRNice has quit IRC (Quit: Leaving)
00:26 🔗 tomaspark has quit IRC (Read error: Connection reset by peer)
01:09 🔗 godane SketchCow: so this may be interesting to you
01:09 🔗 godane japanese manuals : http://gizport.jp/manual/common/6478
01:10 🔗 Flashfire http://outofprintarchive.com godane
01:15 🔗 godane i think i was grabbing those but cause of the remastering i stopped grabbing those
01:15 🔗 Flashfire Remastering?
01:15 🔗 icedice has quit IRC (Ping timeout: 506 seconds)
01:16 🔗 Oddly2 has joined #archiveteam-bs
01:16 🔗 godane there doing rescans of magazine they scanned
01:17 🔗 tomaspark has joined #archiveteam-bs
01:32 🔗 Oddly2 has quit IRC (Ping timeout: 360 seconds)
01:51 🔗 godane so metadata can be grabbed from those manual pages
01:52 🔗 godane right now its manual brute force but i can change that with this : http://gizport.jp/manual/1/?id=80380
01:53 🔗 godane even thought the manual/1/ should be manual/1797514/ i can get the right metadata with just manual/1/
01:53 🔗 godane no need to get the product number right witch is good
01:54 🔗 godane *which
02:03 🔗 enowaldo has joined #archiveteam-bs
02:07 🔗 enowaldo has quit IRC (Read error: Operation timed out)
02:08 🔗 Zerote has quit IRC (Ping timeout: 600 seconds)
03:15 🔗 qw3rty117 has joined #archiveteam-bs
03:21 🔗 qw3rty116 has quit IRC (Read error: Operation timed out)
03:23 🔗 enowaldo has joined #archiveteam-bs
03:27 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
03:49 🔗 cfarquhar has quit IRC (The Lounge - https://thelounge.chat)
03:50 🔗 cfarquhar has joined #archiveteam-bs
03:52 🔗 odemgi_ has joined #archiveteam-bs
03:55 🔗 odemgi has quit IRC (Ping timeout: 252 seconds)
05:06 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
05:06 🔗 markedL has quit IRC (The Lounge - https://thelounge.chat)
05:06 🔗 markedL has joined #archiveteam-bs
05:06 🔗 asdf0101 has joined #archiveteam-bs
05:12 🔗 wp494 has quit IRC (Read error: Operation timed out)
05:12 🔗 wp494 has joined #archiveteam-bs
05:12 🔗 enowaldo has joined #archiveteam-bs
05:20 🔗 enowaldo has quit IRC (Read error: Operation timed out)
05:22 🔗 enowaldo has joined #archiveteam-bs
05:38 🔗 enowaldo has quit IRC (Read error: Operation timed out)
05:48 🔗 killsushi has quit IRC (Quit: Leaving)
05:54 🔗 killsushi has joined #archiveteam-bs
06:40 🔗 Zerote has joined #archiveteam-bs
07:40 🔗 benjins has quit IRC (Read error: Connection reset by peer)
08:13 🔗 enowaldo has joined #archiveteam-bs
08:13 🔗 Zerote has quit IRC (Read error: Operation timed out)
08:16 🔗 enowaldo has quit IRC (Read error: Connection reset by peer)
08:19 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
08:24 🔗 Zerote has joined #archiveteam-bs
08:28 🔗 killsushi has quit IRC (Quit: Leaving)
09:08 🔗 Gfy has quit IRC (Ping timeout: 265 seconds)
09:11 🔗 BlueMax has quit IRC (Quit: Leaving)
09:47 🔗 eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in)
09:54 🔗 eientei95 has joined #archiveteam-bs
11:12 🔗 VerifiedJ has joined #archiveteam-bs
11:18 🔗 Despatche has joined #archiveteam-bs
11:23 🔗 Oddly has joined #archiveteam-bs
11:50 🔗 enowaldo has joined #archiveteam-bs
11:51 🔗 godane ok i found a better way to grab metadata from these pdfs
11:52 🔗 godane these pdfs have title and author metadata in them so i grab the info from that
11:53 🔗 godane it may not be the best all the time but better then nothing
11:53 🔗 godane also these are going be called japanese manual $number : $title1 or something
11:55 🔗 enowaldo_ has joined #archiveteam-bs
11:57 🔗 enowaldo has quit IRC (Read error: Operation timed out)
12:11 🔗 eientei95 Flashfire: https://transfer.notkiska.pw/NE9ZZ/outofprintarchive.txt There's the file links to all of outofprintarchive's scans
12:56 🔗 enowaldo_ has quit IRC (Ping timeout: 252 seconds)
13:07 🔗 godane SketchCow: i'm starting to upload some of the japanese manuals : https://archive.org/details/japanese-manual-2549
13:07 🔗 godane note that not all are japanese language cause i have noticed some english manuals on the website
13:08 🔗 godane but most are japanese
13:10 🔗 wyatt8740 has joined #archiveteam-bs
13:26 🔗 godane so here where all the manuals are going : https://archive.org/search.php?query=subject%3A%22japanese+manuals%22
13:27 🔗 godane there in godaneinbox for now but i put a keyword so people can see what i have added
13:54 🔗 Gfy has joined #archiveteam-bs
14:12 🔗 odemgi_ has quit IRC (Read error: Connection reset by peer)
14:12 🔗 odemgi_ has joined #archiveteam-bs
14:18 🔗 enowaldo has joined #archiveteam-bs
14:23 🔗 tomaspark has quit IRC (Read error: Operation timed out)
14:24 🔗 tomaspark has joined #archiveteam-bs
14:45 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
15:00 🔗 marked1 has quit IRC (Read error: Operation timed out)
15:01 🔗 enowaldo has quit IRC (Read error: Operation timed out)
15:08 🔗 marked1 has joined #archiveteam-bs
15:12 🔗 Zerote has quit IRC (Ping timeout: 600 seconds)
15:38 🔗 Zerote has joined #archiveteam-bs
16:28 🔗 betamax JAA: I see you noted the issue of infinite loops when archiving twitter with snscrape recently (https://github.com/JustAnotherArchivist/snscrape/issues/40)
16:29 🔗 betamax any known workarounds? (I have a list of ~450 EU election-related twitter accounts, and am now only realising how much I depend on snscrape!)
16:29 🔗 JAA Yup, and fixed it. I guess I should release a new version though. Maybe I'll do that later today.
16:29 🔗 JAA You can install snscrape from git if you're in a hurry.
16:31 🔗 betamax Is that the "pip3 install git+...." method of installation?
16:31 🔗 JAA Yep
16:31 🔗 betamax Thanks, will try it
16:31 🔗 JAA You might have to uninstall it first, not sure how pip handles that exactly.
16:32 🔗 JAA (And --user if you don't want a system-wide install and aren't using pyenv.)
16:32 🔗 betamax Also, I can't mention enough what a great tool snscrape is - fantastic work!
16:32 🔗 JAA Thanks, glad it's proving useful. :-)
16:32 🔗 JAA It can now also extract Twitter outlinks, by the way.
16:33 🔗 betamax At risk of causing wrath as I haven't yet looked at the docs, how do I do that?
16:33 🔗 JAA On another note, please add those Twitter accounts to the relevant wiki pages under https://archiveteam.org/index.php?title=ArchiveBot/2019_European_Union_parliamentary_elections
16:33 🔗 JAA Haha, cute, thinking there are docs.
16:34 🔗 JAA ;-)
16:34 🔗 JAA Check out https://github.com/JustAnotherArchivist/little-things/blob/master/snscrape-twitter-user
16:36 🔗 betamax Ah yes - will add the accounts to the wiki ASAP
16:37 🔗 betamax On another note, if I had a list of ~170 candidate sites (ie: personal campaign sites for a specific candidate) and wanted to do a full (recursive) crawl with archivebot, what's the best way to feed in all 170-odd sites?
16:37 🔗 betamax Or is that generally an overload and I should archive (and generate warcs) locally, and upload to IA after (although they won't get in wayback that way)
16:48 🔗 JAA Oh, awesome!
16:49 🔗 JAA Well, we can do !a < (undocumented, ops-only, do not use unless you know what you're doing), but then that job will get huge and might not finish in time. I don't know how much we can handle through AB in time. One way to find out.
16:50 🔗 JAA Candidates should probably get a separate page per country.
16:53 🔗 JAA Wait no, that can go on the "election" page for each country.
16:55 🔗 betamax Seeing as I don't have ops (or even voice right now) in archivebot, I've stuck the links (164 total) in pastebin for you/someone with ops to do: https://pastebin.com/raw/x3FCE89w
17:02 🔗 JAA I'll start feeding them through ArchiveBot and see how quickly that goes. All of those are UK candidates I presume?
17:05 🔗 HashbangI has quit IRC (Remote host closed the connection)
17:06 🔗 betamax Yes, all UK
17:08 🔗 betamax If there is an archivebot regex that can deal with wordpress-style calendars (where each day is it's own page, even if there are no events for that day, streatching back for years and years) then I'd recommend using that - those calendars were a real problem when trying to archive candidate websites in the US midterms
17:08 🔗 betamax *deal with => ignore
17:21 🔗 HashbangI has joined #archiveteam-bs
18:27 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
18:47 🔗 JH88 has joined #archiveteam-bs
19:15 🔗 DogsRNice has joined #archiveteam-bs
19:30 🔗 enowaldo has joined #archiveteam-bs
19:50 🔗 eythian Hey, is there a way to archive things that link to Dropbox? The case I just saw was here: https://languagelog.ldc.upenn.edu/nll/?p=42639
19:50 🔗 zhongfu has quit IRC (Ping timeout: 615 seconds)
19:52 🔗 ealgase eythian: it's excluded from WBM
19:52 🔗 ealgase but archivebot probably still will get it
20:01 🔗 JAA Depends on how you queue it.
20:01 🔗 eythian I don't think I can drive archivebot in a way that'll fetch it
20:01 🔗 JAA If you !ao a Dropbox ?dl=1 URL, it won't grab the file.
20:02 🔗 JAA If you !a a site that links to Dropbox ?dl=1 URLs, it should grab the files.
20:03 🔗 eythian This links to a bitly link that links to a dl=0 URL
20:03 🔗 eythian Should be a law... :)
20:28 🔗 benjins has joined #archiveteam-bs
21:02 🔗 jut has quit IRC (Ping timeout: 252 seconds)
21:05 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
21:15 🔗 VerifiedJ has quit IRC (Quit: Leaving)
21:29 🔗 jut has joined #archiveteam-bs
21:42 🔗 BartoCH has joined #archiveteam-bs
21:42 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
21:57 🔗 killsushi has joined #archiveteam-bs
22:32 🔗 godane has quit IRC (Ping timeout: 268 seconds)
22:48 🔗 godane has joined #archiveteam-bs
22:56 🔗 jut has quit IRC (Ping timeout: 252 seconds)
22:57 🔗 enowaldo has joined #archiveteam-bs
23:06 🔗 wp494 has quit IRC (Ping timeout: 268 seconds)
23:08 🔗 wp494 has joined #archiveteam-bs
23:11 🔗 enowaldo has quit IRC (Ping timeout: 492 seconds)
23:27 🔗 BlueMax has joined #archiveteam-bs
23:40 🔗 Oddly has quit IRC (Read error: Operation timed out)

irclogger-viewer