#archiveteam-bs 2019-05-19,Sun

↑back Search

Time	Nickname	Message
00:01 ^🔗		enowaldo has joined #archiveteam-bs
00:10 ^🔗		icedice has joined #archiveteam-bs
00:12 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
00:16 ^🔗		DogsRNice has quit IRC (Quit: Leaving)
00:26 ^🔗		tomaspark has quit IRC (Read error: Connection reset by peer)
01:09 ^🔗	godane	SketchCow: so this may be interesting to you
01:09 ^🔗	godane	japanese manuals : http://gizport.jp/manual/common/6478
01:10 ^🔗	Flashfire	http://outofprintarchive.com godane
01:15 ^🔗	godane	i think i was grabbing those but cause of the remastering i stopped grabbing those
01:15 ^🔗	Flashfire	Remastering?
01:15 ^🔗		icedice has quit IRC (Ping timeout: 506 seconds)
01:16 ^🔗		Oddly2 has joined #archiveteam-bs
01:16 ^🔗	godane	there doing rescans of magazine they scanned
01:17 ^🔗		tomaspark has joined #archiveteam-bs
01:32 ^🔗		Oddly2 has quit IRC (Ping timeout: 360 seconds)
01:51 ^🔗	godane	so metadata can be grabbed from those manual pages
01:52 ^🔗	godane	right now its manual brute force but i can change that with this : http://gizport.jp/manual/1/?id=80380
01:53 ^🔗	godane	even thought the manual/1/ should be manual/1797514/ i can get the right metadata with just manual/1/
01:53 ^🔗	godane	no need to get the product number right witch is good
01:54 ^🔗	godane	*which
02:03 ^🔗		enowaldo has joined #archiveteam-bs
02:07 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
02:08 ^🔗		Zerote has quit IRC (Ping timeout: 600 seconds)
03:15 ^🔗		qw3rty117 has joined #archiveteam-bs
03:21 ^🔗		qw3rty116 has quit IRC (Read error: Operation timed out)
03:23 ^🔗		enowaldo has joined #archiveteam-bs
03:27 ^🔗		enowaldo has quit IRC (Ping timeout: 252 seconds)
03:49 ^🔗		cfarquhar has quit IRC (The Lounge - https://thelounge.chat)
03:50 ^🔗		cfarquhar has joined #archiveteam-bs
03:52 ^🔗		odemgi_ has joined #archiveteam-bs
03:55 ^🔗		odemgi has quit IRC (Ping timeout: 252 seconds)
05:06 ^🔗		asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
05:06 ^🔗		markedL has quit IRC (The Lounge - https://thelounge.chat)
05:06 ^🔗		markedL has joined #archiveteam-bs
05:06 ^🔗		asdf0101 has joined #archiveteam-bs
05:12 ^🔗		wp494 has quit IRC (Read error: Operation timed out)
05:12 ^🔗		wp494 has joined #archiveteam-bs
05:12 ^🔗		enowaldo has joined #archiveteam-bs
05:20 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
05:22 ^🔗		enowaldo has joined #archiveteam-bs
05:38 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
05:48 ^🔗		killsushi has quit IRC (Quit: Leaving)
05:54 ^🔗		killsushi has joined #archiveteam-bs
06:40 ^🔗		Zerote has joined #archiveteam-bs
07:40 ^🔗		benjins has quit IRC (Read error: Connection reset by peer)
08:13 ^🔗		enowaldo has joined #archiveteam-bs
08:13 ^🔗		Zerote has quit IRC (Read error: Operation timed out)
08:16 ^🔗		enowaldo has quit IRC (Read error: Connection reset by peer)
08:19 ^🔗		Despatche has quit IRC (Quit: Read error: Connection reset by deer)
08:24 ^🔗		Zerote has joined #archiveteam-bs
08:28 ^🔗		killsushi has quit IRC (Quit: Leaving)
09:08 ^🔗		Gfy has quit IRC (Ping timeout: 265 seconds)
09:11 ^🔗		BlueMax has quit IRC (Quit: Leaving)
09:47 ^🔗		eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in)
09:54 ^🔗		eientei95 has joined #archiveteam-bs
11:12 ^🔗		VerifiedJ has joined #archiveteam-bs
11:18 ^🔗		Despatche has joined #archiveteam-bs
11:23 ^🔗		Oddly has joined #archiveteam-bs
11:50 ^🔗		enowaldo has joined #archiveteam-bs
11:51 ^🔗	godane	ok i found a better way to grab metadata from these pdfs
11:52 ^🔗	godane	these pdfs have title and author metadata in them so i grab the info from that
11:53 ^🔗	godane	it may not be the best all the time but better then nothing
11:53 ^🔗	godane	also these are going be called japanese manual $number : $title1 or something
11:55 ^🔗		enowaldo_ has joined #archiveteam-bs
11:57 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
12:11 ^🔗	eientei95	Flashfire: https://transfer.notkiska.pw/NE9ZZ/outofprintarchive.txt There's the file links to all of outofprintarchive's scans
12:56 ^🔗		enowaldo_ has quit IRC (Ping timeout: 252 seconds)
13:07 ^🔗	godane	SketchCow: i'm starting to upload some of the japanese manuals : https://archive.org/details/japanese-manual-2549
13:07 ^🔗	godane	note that not all are japanese language cause i have noticed some english manuals on the website
13:08 ^🔗	godane	but most are japanese
13:10 ^🔗		wyatt8740 has joined #archiveteam-bs
13:26 ^🔗	godane	so here where all the manuals are going : https://archive.org/search.php?query=subject%3A%22japanese+manuals%22
13:27 ^🔗	godane	there in godaneinbox for now but i put a keyword so people can see what i have added
13:54 ^🔗		Gfy has joined #archiveteam-bs
14:12 ^🔗		odemgi_ has quit IRC (Read error: Connection reset by peer)
14:12 ^🔗		odemgi_ has joined #archiveteam-bs
14:18 ^🔗		enowaldo has joined #archiveteam-bs
14:23 ^🔗		tomaspark has quit IRC (Read error: Operation timed out)
14:24 ^🔗		tomaspark has joined #archiveteam-bs
14:45 ^🔗		wyatt8740 has quit IRC (Read error: Operation timed out)
15:00 ^🔗		marked1 has quit IRC (Read error: Operation timed out)
15:01 ^🔗		enowaldo has quit IRC (Read error: Operation timed out)
15:08 ^🔗		marked1 has joined #archiveteam-bs
15:12 ^🔗		Zerote has quit IRC (Ping timeout: 600 seconds)
15:38 ^🔗		Zerote has joined #archiveteam-bs
16:28 ^🔗	betamax	JAA: I see you noted the issue of infinite loops when archiving twitter with snscrape recently (https://github.com/JustAnotherArchivist/snscrape/issues/40)
16:29 ^🔗	betamax	any known workarounds? (I have a list of ~450 EU election-related twitter accounts, and am now only realising how much I depend on snscrape!)
16:29 ^🔗	JAA	Yup, and fixed it. I guess I should release a new version though. Maybe I'll do that later today.
16:29 ^🔗	JAA	You can install snscrape from git if you're in a hurry.
16:31 ^🔗	betamax	Is that the "pip3 install git+...." method of installation?
16:31 ^🔗	JAA	Yep
16:31 ^🔗	betamax	Thanks, will try it
16:31 ^🔗	JAA	You might have to uninstall it first, not sure how pip handles that exactly.
16:32 ^🔗	JAA	(And --user if you don't want a system-wide install and aren't using pyenv.)
16:32 ^🔗	betamax	Also, I can't mention enough what a great tool snscrape is - fantastic work!
16:32 ^🔗	JAA	Thanks, glad it's proving useful. :-)
16:32 ^🔗	JAA	It can now also extract Twitter outlinks, by the way.
16:33 ^🔗	betamax	At risk of causing wrath as I haven't yet looked at the docs, how do I do that?
16:33 ^🔗	JAA	On another note, please add those Twitter accounts to the relevant wiki pages under https://archiveteam.org/index.php?title=ArchiveBot/2019_European_Union_parliamentary_elections
16:33 ^🔗	JAA	Haha, cute, thinking there are docs.
16:34 ^🔗	JAA	;-)
16:34 ^🔗	JAA	Check out https://github.com/JustAnotherArchivist/little-things/blob/master/snscrape-twitter-user
16:36 ^🔗	betamax	Ah yes - will add the accounts to the wiki ASAP
16:37 ^🔗	betamax	On another note, if I had a list of ~170 candidate sites (ie: personal campaign sites for a specific candidate) and wanted to do a full (recursive) crawl with archivebot, what's the best way to feed in all 170-odd sites?
16:37 ^🔗	betamax	Or is that generally an overload and I should archive (and generate warcs) locally, and upload to IA after (although they won't get in wayback that way)
16:48 ^🔗	JAA	Oh, awesome!
16:49 ^🔗	JAA	Well, we can do !a < (undocumented, ops-only, do not use unless you know what you're doing), but then that job will get huge and might not finish in time. I don't know how much we can handle through AB in time. One way to find out.
16:50 ^🔗	JAA	Candidates should probably get a separate page per country.
16:53 ^🔗	JAA	Wait no, that can go on the "election" page for each country.
16:55 ^🔗	betamax	Seeing as I don't have ops (or even voice right now) in archivebot, I've stuck the links (164 total) in pastebin for you/someone with ops to do: https://pastebin.com/raw/x3FCE89w
17:02 ^🔗	JAA	I'll start feeding them through ArchiveBot and see how quickly that goes. All of those are UK candidates I presume?
17:05 ^🔗		HashbangI has quit IRC (Remote host closed the connection)
17:06 ^🔗	betamax	Yes, all UK
17:08 ^🔗	betamax	If there is an archivebot regex that can deal with wordpress-style calendars (where each day is it's own page, even if there are no events for that day, streatching back for years and years) then I'd recommend using that - those calendars were a real problem when trying to archive candidate websites in the US midterms
17:08 ^🔗	betamax	*deal with => ignore
17:21 ^🔗		HashbangI has joined #archiveteam-bs
18:27 ^🔗		fuzzy8021 has quit IRC (Read error: Operation timed out)
18:47 ^🔗		JH88 has joined #archiveteam-bs
19:15 ^🔗		DogsRNice has joined #archiveteam-bs
19:30 ^🔗		enowaldo has joined #archiveteam-bs
19:50 ^🔗	eythian	Hey, is there a way to archive things that link to Dropbox? The case I just saw was here: https://languagelog.ldc.upenn.edu/nll/?p=42639
19:50 ^🔗		zhongfu has quit IRC (Ping timeout: 615 seconds)
19:52 ^🔗	ealgase	eythian: it's excluded from WBM
19:52 ^🔗	ealgase	but archivebot probably still will get it
20:01 ^🔗	JAA	Depends on how you queue it.
20:01 ^🔗	eythian	I don't think I can drive archivebot in a way that'll fetch it
20:01 ^🔗	JAA	If you !ao a Dropbox ?dl=1 URL, it won't grab the file.
20:02 ^🔗	JAA	If you !a a site that links to Dropbox ?dl=1 URLs, it should grab the files.
20:03 ^🔗	eythian	This links to a bitly link that links to a dl=0 URL
20:03 ^🔗	eythian	Should be a law... :)
20:28 ^🔗		benjins has joined #archiveteam-bs
21:02 ^🔗		jut has quit IRC (Ping timeout: 252 seconds)
21:05 ^🔗		BartoCH has quit IRC (Ping timeout: 615 seconds)
21:15 ^🔗		VerifiedJ has quit IRC (Quit: Leaving)
21:29 ^🔗		jut has joined #archiveteam-bs
21:42 ^🔗		BartoCH has joined #archiveteam-bs
21:42 ^🔗		enowaldo has quit IRC (Ping timeout: 252 seconds)
21:57 ^🔗		killsushi has joined #archiveteam-bs
22:32 ^🔗		godane has quit IRC (Ping timeout: 268 seconds)
22:48 ^🔗		godane has joined #archiveteam-bs
22:56 ^🔗		jut has quit IRC (Ping timeout: 252 seconds)
22:57 ^🔗		enowaldo has joined #archiveteam-bs
23:06 ^🔗		wp494 has quit IRC (Ping timeout: 268 seconds)
23:08 ^🔗		wp494 has joined #archiveteam-bs
23:11 ^🔗		enowaldo has quit IRC (Ping timeout: 492 seconds)
23:27 ^🔗		BlueMax has joined #archiveteam-bs
23:40 ^🔗		Oddly has quit IRC (Read error: Operation timed out)

irclogger-viewer