Time |
Nickname |
Message |
00:01
🔗
|
|
enowaldo has joined #archiveteam-bs |
00:10
🔗
|
|
icedice has joined #archiveteam-bs |
00:12
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
00:16
🔗
|
|
DogsRNice has quit IRC (Quit: Leaving) |
00:26
🔗
|
|
tomaspark has quit IRC (Read error: Connection reset by peer) |
01:09
🔗
|
godane |
SketchCow: so this may be interesting to you |
01:09
🔗
|
godane |
japanese manuals : http://gizport.jp/manual/common/6478 |
01:10
🔗
|
Flashfire |
http://outofprintarchive.com godane |
01:15
🔗
|
godane |
i think i was grabbing those but cause of the remastering i stopped grabbing those |
01:15
🔗
|
Flashfire |
Remastering? |
01:15
🔗
|
|
icedice has quit IRC (Ping timeout: 506 seconds) |
01:16
🔗
|
|
Oddly2 has joined #archiveteam-bs |
01:16
🔗
|
godane |
there doing rescans of magazine they scanned |
01:17
🔗
|
|
tomaspark has joined #archiveteam-bs |
01:32
🔗
|
|
Oddly2 has quit IRC (Ping timeout: 360 seconds) |
01:51
🔗
|
godane |
so metadata can be grabbed from those manual pages |
01:52
🔗
|
godane |
right now its manual brute force but i can change that with this : http://gizport.jp/manual/1/?id=80380 |
01:53
🔗
|
godane |
even thought the manual/1/ should be manual/1797514/ i can get the right metadata with just manual/1/ |
01:53
🔗
|
godane |
no need to get the product number right witch is good |
01:54
🔗
|
godane |
*which |
02:03
🔗
|
|
enowaldo has joined #archiveteam-bs |
02:07
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
02:08
🔗
|
|
Zerote has quit IRC (Ping timeout: 600 seconds) |
03:15
🔗
|
|
qw3rty117 has joined #archiveteam-bs |
03:21
🔗
|
|
qw3rty116 has quit IRC (Read error: Operation timed out) |
03:23
🔗
|
|
enowaldo has joined #archiveteam-bs |
03:27
🔗
|
|
enowaldo has quit IRC (Ping timeout: 252 seconds) |
03:49
🔗
|
|
cfarquhar has quit IRC (The Lounge - https://thelounge.chat) |
03:50
🔗
|
|
cfarquhar has joined #archiveteam-bs |
03:52
🔗
|
|
odemgi_ has joined #archiveteam-bs |
03:55
🔗
|
|
odemgi has quit IRC (Ping timeout: 252 seconds) |
05:06
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
05:06
🔗
|
|
markedL has quit IRC (The Lounge - https://thelounge.chat) |
05:06
🔗
|
|
markedL has joined #archiveteam-bs |
05:06
🔗
|
|
asdf0101 has joined #archiveteam-bs |
05:12
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
05:12
🔗
|
|
wp494 has joined #archiveteam-bs |
05:12
🔗
|
|
enowaldo has joined #archiveteam-bs |
05:20
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
05:22
🔗
|
|
enowaldo has joined #archiveteam-bs |
05:38
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
05:48
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
05:54
🔗
|
|
killsushi has joined #archiveteam-bs |
06:40
🔗
|
|
Zerote has joined #archiveteam-bs |
07:40
🔗
|
|
benjins has quit IRC (Read error: Connection reset by peer) |
08:13
🔗
|
|
enowaldo has joined #archiveteam-bs |
08:13
🔗
|
|
Zerote has quit IRC (Read error: Operation timed out) |
08:16
🔗
|
|
enowaldo has quit IRC (Read error: Connection reset by peer) |
08:19
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
08:24
🔗
|
|
Zerote has joined #archiveteam-bs |
08:28
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
09:08
🔗
|
|
Gfy has quit IRC (Ping timeout: 265 seconds) |
09:11
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
09:47
🔗
|
|
eientei95 has quit IRC (Quit: ZNC 1.7.0+deb0+bionic1 - https://znc.in) |
09:54
🔗
|
|
eientei95 has joined #archiveteam-bs |
11:12
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
11:18
🔗
|
|
Despatche has joined #archiveteam-bs |
11:23
🔗
|
|
Oddly has joined #archiveteam-bs |
11:50
🔗
|
|
enowaldo has joined #archiveteam-bs |
11:51
🔗
|
godane |
ok i found a better way to grab metadata from these pdfs |
11:52
🔗
|
godane |
these pdfs have title and author metadata in them so i grab the info from that |
11:53
🔗
|
godane |
it may not be the best all the time but better then nothing |
11:53
🔗
|
godane |
also these are going be called japanese manual $number : $title1 or something |
11:55
🔗
|
|
enowaldo_ has joined #archiveteam-bs |
11:57
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
12:11
🔗
|
eientei95 |
Flashfire: https://transfer.notkiska.pw/NE9ZZ/outofprintarchive.txt There's the file links to all of outofprintarchive's scans |
12:56
🔗
|
|
enowaldo_ has quit IRC (Ping timeout: 252 seconds) |
13:07
🔗
|
godane |
SketchCow: i'm starting to upload some of the japanese manuals : https://archive.org/details/japanese-manual-2549 |
13:07
🔗
|
godane |
note that not all are japanese language cause i have noticed some english manuals on the website |
13:08
🔗
|
godane |
but most are japanese |
13:10
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
13:26
🔗
|
godane |
so here where all the manuals are going : https://archive.org/search.php?query=subject%3A%22japanese+manuals%22 |
13:27
🔗
|
godane |
there in godaneinbox for now but i put a keyword so people can see what i have added |
13:54
🔗
|
|
Gfy has joined #archiveteam-bs |
14:12
🔗
|
|
odemgi_ has quit IRC (Read error: Connection reset by peer) |
14:12
🔗
|
|
odemgi_ has joined #archiveteam-bs |
14:18
🔗
|
|
enowaldo has joined #archiveteam-bs |
14:23
🔗
|
|
tomaspark has quit IRC (Read error: Operation timed out) |
14:24
🔗
|
|
tomaspark has joined #archiveteam-bs |
14:45
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
15:00
🔗
|
|
marked1 has quit IRC (Read error: Operation timed out) |
15:01
🔗
|
|
enowaldo has quit IRC (Read error: Operation timed out) |
15:08
🔗
|
|
marked1 has joined #archiveteam-bs |
15:12
🔗
|
|
Zerote has quit IRC (Ping timeout: 600 seconds) |
15:38
🔗
|
|
Zerote has joined #archiveteam-bs |
16:28
🔗
|
betamax |
JAA: I see you noted the issue of infinite loops when archiving twitter with snscrape recently (https://github.com/JustAnotherArchivist/snscrape/issues/40) |
16:29
🔗
|
betamax |
any known workarounds? (I have a list of ~450 EU election-related twitter accounts, and am now only realising how much I depend on snscrape!) |
16:29
🔗
|
JAA |
Yup, and fixed it. I guess I should release a new version though. Maybe I'll do that later today. |
16:29
🔗
|
JAA |
You can install snscrape from git if you're in a hurry. |
16:31
🔗
|
betamax |
Is that the "pip3 install git+...." method of installation? |
16:31
🔗
|
JAA |
Yep |
16:31
🔗
|
betamax |
Thanks, will try it |
16:31
🔗
|
JAA |
You might have to uninstall it first, not sure how pip handles that exactly. |
16:32
🔗
|
JAA |
(And --user if you don't want a system-wide install and aren't using pyenv.) |
16:32
🔗
|
betamax |
Also, I can't mention enough what a great tool snscrape is - fantastic work! |
16:32
🔗
|
JAA |
Thanks, glad it's proving useful. :-) |
16:32
🔗
|
JAA |
It can now also extract Twitter outlinks, by the way. |
16:33
🔗
|
betamax |
At risk of causing wrath as I haven't yet looked at the docs, how do I do that? |
16:33
🔗
|
JAA |
On another note, please add those Twitter accounts to the relevant wiki pages under https://archiveteam.org/index.php?title=ArchiveBot/2019_European_Union_parliamentary_elections |
16:33
🔗
|
JAA |
Haha, cute, thinking there are docs. |
16:34
🔗
|
JAA |
;-) |
16:34
🔗
|
JAA |
Check out https://github.com/JustAnotherArchivist/little-things/blob/master/snscrape-twitter-user |
16:36
🔗
|
betamax |
Ah yes - will add the accounts to the wiki ASAP |
16:37
🔗
|
betamax |
On another note, if I had a list of ~170 candidate sites (ie: personal campaign sites for a specific candidate) and wanted to do a full (recursive) crawl with archivebot, what's the best way to feed in all 170-odd sites? |
16:37
🔗
|
betamax |
Or is that generally an overload and I should archive (and generate warcs) locally, and upload to IA after (although they won't get in wayback that way) |
16:48
🔗
|
JAA |
Oh, awesome! |
16:49
🔗
|
JAA |
Well, we can do !a < (undocumented, ops-only, do not use unless you know what you're doing), but then that job will get huge and might not finish in time. I don't know how much we can handle through AB in time. One way to find out. |
16:50
🔗
|
JAA |
Candidates should probably get a separate page per country. |
16:53
🔗
|
JAA |
Wait no, that can go on the "election" page for each country. |
16:55
🔗
|
betamax |
Seeing as I don't have ops (or even voice right now) in archivebot, I've stuck the links (164 total) in pastebin for you/someone with ops to do: https://pastebin.com/raw/x3FCE89w |
17:02
🔗
|
JAA |
I'll start feeding them through ArchiveBot and see how quickly that goes. All of those are UK candidates I presume? |
17:05
🔗
|
|
HashbangI has quit IRC (Remote host closed the connection) |
17:06
🔗
|
betamax |
Yes, all UK |
17:08
🔗
|
betamax |
If there is an archivebot regex that can deal with wordpress-style calendars (where each day is it's own page, even if there are no events for that day, streatching back for years and years) then I'd recommend using that - those calendars were a real problem when trying to archive candidate websites in the US midterms |
17:08
🔗
|
betamax |
*deal with => ignore |
17:21
🔗
|
|
HashbangI has joined #archiveteam-bs |
18:27
🔗
|
|
fuzzy8021 has quit IRC (Read error: Operation timed out) |
18:47
🔗
|
|
JH88 has joined #archiveteam-bs |
19:15
🔗
|
|
DogsRNice has joined #archiveteam-bs |
19:30
🔗
|
|
enowaldo has joined #archiveteam-bs |
19:50
🔗
|
eythian |
Hey, is there a way to archive things that link to Dropbox? The case I just saw was here: https://languagelog.ldc.upenn.edu/nll/?p=42639 |
19:50
🔗
|
|
zhongfu has quit IRC (Ping timeout: 615 seconds) |
19:52
🔗
|
ealgase |
eythian: it's excluded from WBM |
19:52
🔗
|
ealgase |
but archivebot probably still will get it |
20:01
🔗
|
JAA |
Depends on how you queue it. |
20:01
🔗
|
eythian |
I don't think I can drive archivebot in a way that'll fetch it |
20:01
🔗
|
JAA |
If you !ao a Dropbox ?dl=1 URL, it won't grab the file. |
20:02
🔗
|
JAA |
If you !a a site that links to Dropbox ?dl=1 URLs, it should grab the files. |
20:03
🔗
|
eythian |
This links to a bitly link that links to a dl=0 URL |
20:03
🔗
|
eythian |
Should be a law... :) |
20:28
🔗
|
|
benjins has joined #archiveteam-bs |
21:02
🔗
|
|
jut has quit IRC (Ping timeout: 252 seconds) |
21:05
🔗
|
|
BartoCH has quit IRC (Ping timeout: 615 seconds) |
21:15
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
21:29
🔗
|
|
jut has joined #archiveteam-bs |
21:42
🔗
|
|
BartoCH has joined #archiveteam-bs |
21:42
🔗
|
|
enowaldo has quit IRC (Ping timeout: 252 seconds) |
21:57
🔗
|
|
killsushi has joined #archiveteam-bs |
22:32
🔗
|
|
godane has quit IRC (Ping timeout: 268 seconds) |
22:48
🔗
|
|
godane has joined #archiveteam-bs |
22:56
🔗
|
|
jut has quit IRC (Ping timeout: 252 seconds) |
22:57
🔗
|
|
enowaldo has joined #archiveteam-bs |
23:06
🔗
|
|
wp494 has quit IRC (Ping timeout: 268 seconds) |
23:08
🔗
|
|
wp494 has joined #archiveteam-bs |
23:11
🔗
|
|
enowaldo has quit IRC (Ping timeout: 492 seconds) |
23:27
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:40
🔗
|
|
Oddly has quit IRC (Read error: Operation timed out) |