Time |
Nickname |
Message |
00:14
🔗
|
|
bRick5773 has quit IRC (Quit: Leaving.) |
00:18
🔗
|
|
BlueMaxim has joined #archiveteam |
01:43
🔗
|
|
LastNinja has quit IRC (Ping timeout: 260 seconds) |
01:47
🔗
|
|
ZexaronS- has joined #archiveteam |
01:47
🔗
|
|
ZexaronS has quit IRC (Read error: Connection reset by peer) |
01:53
🔗
|
|
Pixi has quit IRC (Ping timeout: 255 seconds) |
01:55
🔗
|
|
Pixi has joined #archiveteam |
01:56
🔗
|
|
db48x has joined #archiveteam |
02:01
🔗
|
|
Svekla has joined #archiveteam |
02:01
🔗
|
|
Burak has quit IRC (Read error: Connection reset by peer) |
02:33
🔗
|
|
Stilett0 is now known as Stiletto |
02:46
🔗
|
|
Odd0002 has quit IRC (Quit: ZNC - http://znc.in) |
02:53
🔗
|
|
Odd0002 has joined #archiveteam |
04:03
🔗
|
|
qw3rty112 has joined #archiveteam |
04:07
🔗
|
|
qw3rty111 has quit IRC (Read error: Operation timed out) |
04:14
🔗
|
|
pizzaiolo has quit IRC (pizzaiolo) |
05:01
🔗
|
|
qw3rty113 has joined #archiveteam |
05:05
🔗
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
06:43
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 255 seconds) |
07:08
🔗
|
|
kimmer12 has joined #archiveteam |
07:14
🔗
|
|
kimmer1 has quit IRC (Read error: Operation timed out) |
07:23
🔗
|
|
kimmer1 has joined #archiveteam |
07:28
🔗
|
|
kimmer13 has joined #archiveteam |
07:30
🔗
|
|
kimmer12 has quit IRC (Ping timeout: 633 seconds) |
07:34
🔗
|
|
kimmer1 has quit IRC (Read error: Operation timed out) |
07:38
🔗
|
|
kimmer1 has joined #archiveteam |
07:44
🔗
|
|
kimmer13 has quit IRC (Ping timeout: 633 seconds) |
08:37
🔗
|
|
ZexaronS- has quit IRC (Read error: Connection reset by peer) |
08:38
🔗
|
|
ZexaronS- has joined #archiveteam |
09:35
🔗
|
|
schbirid has joined #archiveteam |
11:12
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
11:31
🔗
|
|
Uzerus_ has quit IRC (Quit: Page closed) |
11:39
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:12
🔗
|
|
pizzaiolo has joined #archiveteam |
12:17
🔗
|
|
jschwart has joined #archiveteam |
12:30
🔗
|
|
bRick5772 has joined #archiveteam |
12:33
🔗
|
|
odemg has quit IRC (Quit: Leaving) |
12:36
🔗
|
|
Ctrl has joined #archiveteam |
12:41
🔗
|
|
kimmer1 has quit IRC (Remote host closed the connection) |
12:42
🔗
|
|
kimmer1 has joined #archiveteam |
12:46
🔗
|
|
icedice has joined #archiveteam |
12:50
🔗
|
|
ZexaronS- has quit IRC (Quit: Leaving) |
13:12
🔗
|
|
odemg has joined #archiveteam |
13:31
🔗
|
|
LastNinja has joined #archiveteam |
14:04
🔗
|
|
Spaghetto has joined #archiveteam |
14:07
🔗
|
|
Spaghetto has quit IRC (Client Quit) |
14:31
🔗
|
|
RichardG has joined #archiveteam |
14:39
🔗
|
|
icedice2 has joined #archiveteam |
14:46
🔗
|
|
icedice has quit IRC (Ping timeout: 506 seconds) |
15:00
🔗
|
Uzerus |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
15:20
🔗
|
|
icedice2 has quit IRC (Quit: Leaving) |
15:40
🔗
|
|
kimmer12 has joined #archiveteam |
15:47
🔗
|
|
kimmer1 has quit IRC (Ping timeout: 632 seconds) |
16:08
🔗
|
|
seatsea has joined #archiveteam |
17:15
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
17:31
🔗
|
Vito` |
hey |
17:32
🔗
|
Vito` |
so Lytro unceremoniously shutdown pictures.lytro.com, their hosting for their lightfield images, also breaking all old embeds |
17:32
🔗
|
Vito` |
https://www.theverge.com/2017/12/6/16742314/lytro-focus-photos-support-cameras-illum |
17:33
🔗
|
LastNinja |
reading now |
17:33
🔗
|
LastNinja |
thanks |
17:33
🔗
|
Vito` |
IA captured a lot of embeds and some of the galleries, but they're all broken, because the viewer JS reads in additional URLs from a JSON manifest, which IA doesn't know how to parse |
17:34
🔗
|
Vito` |
the viewer JS as captured by IA seems to work if all the files referenced in the JSON are present |
17:35
🔗
|
Vito` |
so I downloaded all the pictures.lytro.com and lfe-cdn.lytro.com from IA, pulled out all the hosted UUIDs, downloaded all the JSON files for each captured embed or gallery, and generated all the CDN URLs for all the component images |
17:35
🔗
|
Vito` |
I have ~2.5GB of files off the CDN that IA doesn't have from just that, but the final list of JSON-referenced files is 1.2M URLs |
17:36
🔗
|
Vito` |
does AT want my WARCs and this list of URLs? I feel like it could be captured faster without me doing it myself with wget |
17:38
🔗
|
Vito` |
I can make a wiki page with the details if someone wants to check my work |
17:40
🔗
|
|
yy has joined #archiveteam |
17:40
🔗
|
|
yy has quit IRC (Client Quit) |
17:48
🔗
|
Somebody2 |
Vito`: yes, please upload them. I'm not sure if anyone will speak up to *use* them right now, but better to have it in any case. |
17:51
🔗
|
Uzerus |
ewww... |
17:51
🔗
|
Uzerus |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
17:52
🔗
|
Uzerus |
wiki... |
17:53
🔗
|
Vito` |
Somebody2: does that mean I should start on that 1.2M URL list myself then? |
17:53
🔗
|
Vito` |
I didn't know if there was like a warrior queue or something |
17:55
🔗
|
Uzerus |
it can be downloaded by archivebot, just list all urls in one file, one per line |
17:57
🔗
|
Vito` |
Thanks |
17:57
🔗
|
Uzerus |
upload that file somewhere and give us link to that |
17:57
🔗
|
Somebody2 |
Vito`: yes, that's probably a good idea |
18:18
🔗
|
|
schbirid has joined #archiveteam |
18:33
🔗
|
|
nertzy has joined #archiveteam |
18:36
🔗
|
|
du_ has joined #archiveteam |
18:46
🔗
|
Somebody2 |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD (it's changed since I was last told) |
18:50
🔗
|
SketchCow |
-------------------------------------------------------- |
18:50
🔗
|
SketchCow |
FOS WAS DOWN DUE TO A POWER OUTAGE, NOW DESPERATELY TRYING TO CLAW BACK |
18:50
🔗
|
SketchCow |
-------------------------------------------------------- |
19:10
🔗
|
Vito` |
okay I think I did this right |
19:10
🔗
|
Vito` |
http://www.archiveteam.org/index.php?title=Lytro has details on what happened and what I did |
19:10
🔗
|
Vito` |
https://archive.org/details/lytro-hosted-partial-missing-files has my WARCs plus the file with the 1.2M other URLs I have not yet downloaded |
19:11
🔗
|
Vito` |
it's tagged "archiveteam" so it can be moved/ingested later |
19:12
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 248 seconds) |
19:12
🔗
|
|
Mateon1 has joined #archiveteam |
19:28
🔗
|
|
nwithan8 has joined #archiveteam |
19:32
🔗
|
Uzerus |
|
19:34
🔗
|
|
kimmer12 has quit IRC (Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org) |
19:34
🔗
|
|
kimmer1 has joined #archiveteam |
19:34
🔗
|
|
Stilett0 has joined #archiveteam |
19:35
🔗
|
arkiver |
Vito`: how did you create these WARCs? |
19:35
🔗
|
Vito` |
arkiver: wget. command lines are in the wiki page |
19:37
🔗
|
arkiver |
it's that the in the records WARC-Target-URI has a value like <URL> |
19:37
🔗
|
arkiver |
which should not have the < and > |
19:38
🔗
|
arkiver |
I'd say the WARCs are invalid |
19:38
🔗
|
arkiver |
which wget version did you use |
19:39
🔗
|
Vito` |
GNU Wget 1.19.2 built on darwin15.6.0. |
19:39
🔗
|
Vito` |
-cares +digest -gpgme +https +ipv6 -iri +large-file -metalink -nls |
19:39
🔗
|
Vito` |
+ntlm +opie -psl +ssl/openssl |
19:41
🔗
|
Vito` |
https://github.com/iipc/warc-specifications/issues/23 |
19:41
🔗
|
Vito` |
looks like the WARC standard originally had it as <URL> but no-one implemented that so it was removed |
19:42
🔗
|
Vito` |
I guess no-one except wget? |
19:43
🔗
|
arkiver |
versions of wget after the version we use yes |
19:43
🔗
|
arkiver |
wget-lua doesn't have the problem |
19:44
🔗
|
|
nwithan8 has quit IRC (Quit: Page closed) |
19:47
🔗
|
Vito` |
I'll pull down the 1.2M URL list on a wget 1.16 machine, which doesn't put angle brackets around the WARC-Target-URI |
19:47
🔗
|
Vito` |
is it actually a problem for the other WARCs? Do I need to redownload them? |
19:48
🔗
|
arkiver |
we could in theory fix the WARCs |
19:48
🔗
|
arkiver |
but if the URLs are still available it might be better to just archive them again |
19:48
🔗
|
arkiver |
and yes, it's a problem with the WARCs |
19:48
🔗
|
arkiver |
the cdx now takes '<https' as the domain |
19:52
🔗
|
Vito` |
the cdx files I uploaded don't have angle brackets |
19:53
🔗
|
arkiver |
those cdx files are not used |
19:54
🔗
|
arkiver |
IA always derives it's own CDX files |
19:54
🔗
|
arkiver |
feel free to upload CDX files, but it's not used by IA |
19:56
🔗
|
Vito` |
ah |
19:58
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
20:28
🔗
|
Vito` |
looks like wget 1.18 and later might generate WARC-Target-URI with brackets: https://savannah.gnu.org/bugs/?47281 |
20:28
🔗
|
Vito` |
1.18 was released in June 2016 |
21:01
🔗
|
|
BlueMaxim has joined #archiveteam |
21:22
🔗
|
Somebody2 |
JAA: thanks, looking |
21:38
🔗
|
godane |
SketchCow: i got lucky to not be uploading anything to that when that happend |
21:52
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:02
🔗
|
Somebody2 |
JAA: found it: https://archive.org/details/mininova_20170323_wpull |
22:04
🔗
|
JAA |
Somebody2: Wrong channel ;-) But yes, that's the one. |
22:06
🔗
|
godane |
so i got 6 new tapes from savers today |
22:07
🔗
|
godane |
one is called Bellydance Fitness for Beginners |
22:08
🔗
|
godane |
i also got YogaKids vhs tape |
22:09
🔗
|
godane |
1 Quack Pack tape, 1 Tomon & Pumbaa tape, 1 Alvin and the chipmunks tape, and 1 felix cat tape |
22:29
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
22:29
🔗
|
|
MMovie has joined #archiveteam |
22:44
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
22:52
🔗
|
|
pizzaiolo has quit IRC (pizzaiolo) |
22:54
🔗
|
|
RichardG_ has joined #archiveteam |
22:54
🔗
|
|
ndiddy_ has joined #archiveteam |
22:55
🔗
|
|
K4k_ has joined #archiveteam |
22:57
🔗
|
|
ppsym has joined #archiveteam |
23:02
🔗
|
|
tuluu_ has joined #archiveteam |
23:05
🔗
|
|
RichardG has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
Ctrl has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
Nemo_bis has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
MrDignity has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
ndiddy has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
espes__ has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
tuluu has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
PurpleSym has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
K4k has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
Rai-chan has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
i0npulse has quit IRC (se.hub irc.underworld.no) |
23:05
🔗
|
|
medowar has quit IRC (se.hub irc.underworld.no) |
23:08
🔗
|
|
espes___ has joined #archiveteam |
23:20
🔗
|
|
ppsym is now known as PurpleSym |
23:45
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
23:46
🔗
|
|
BlueMaxim has joined #archiveteam |
23:58
🔗
|
|
MrDignity has joined #archiveteam |