#archiveteam 2017-12-17,Sun

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***bRick5773 has quit IRC (Quit: Leaving.)
BlueMaxim has joined #archiveteam
[00:14]
.................. (idle for 1h25mn)
LastNinja has quit IRC (Ping timeout: 260 seconds)
ZexaronS- has joined #archiveteam
ZexaronS has quit IRC (Read error: Connection reset by peer)
[01:43]
Pixi has quit IRC (Ping timeout: 255 seconds)
Pixi has joined #archiveteam
db48x has joined #archiveteam
[01:53]
Svekla has joined #archiveteam
Burak has quit IRC (Read error: Connection reset by peer)
[02:01]
....... (idle for 32mn)
Stilett0 is now known as Stiletto [02:33]
Odd0002 has quit IRC (Quit: ZNC - http://znc.in) [02:46]
Odd0002 has joined #archiveteam [02:53]
............... (idle for 1h10mn)
qw3rty112 has joined #archiveteam
qw3rty111 has quit IRC (Read error: Operation timed out)
[04:03]
pizzaiolo has quit IRC (pizzaiolo) [04:14]
.......... (idle for 47mn)
qw3rty113 has joined #archiveteam
qw3rty112 has quit IRC (Read error: Operation timed out)
[05:01]
.................... (idle for 1h38mn)
RichardG_ has quit IRC (Ping timeout: 255 seconds) [06:43]
...... (idle for 25mn)
kimmer12 has joined #archiveteam [07:08]
kimmer1 has quit IRC (Read error: Operation timed out) [07:14]
kimmer1 has joined #archiveteam [07:23]
kimmer13 has joined #archiveteam
kimmer12 has quit IRC (Ping timeout: 633 seconds)
kimmer1 has quit IRC (Read error: Operation timed out)
kimmer1 has joined #archiveteam
[07:28]
kimmer13 has quit IRC (Ping timeout: 633 seconds) [07:44]
........... (idle for 53mn)
ZexaronS- has quit IRC (Read error: Connection reset by peer)
ZexaronS- has joined #archiveteam
[08:37]
............ (idle for 57mn)
schbirid has joined #archiveteam [09:35]
.................... (idle for 1h37mn)
schbirid has quit IRC (Quit: Leaving) [11:12]
.... (idle for 19mn)
Uzerus_ has quit IRC (Quit: Page closed) [11:31]
BlueMaxim has quit IRC (Quit: Leaving) [11:39]
....... (idle for 33mn)
pizzaiolo has joined #archiveteam [12:12]
jschwart has joined #archiveteam [12:17]
bRick5772 has joined #archiveteam
odemg has quit IRC (Quit: Leaving)
Ctrl has joined #archiveteam
[12:30]
kimmer1 has quit IRC (Remote host closed the connection)
kimmer1 has joined #archiveteam
icedice has joined #archiveteam
ZexaronS- has quit IRC (Quit: Leaving)
[12:41]
..... (idle for 22mn)
odemg has joined #archiveteam [13:12]
.... (idle for 19mn)
LastNinja has joined #archiveteam [13:31]
....... (idle for 33mn)
Spaghetto has joined #archiveteam
Spaghetto has quit IRC (Client Quit)
[14:04]
..... (idle for 24mn)
RichardG has joined #archiveteam [14:31]
icedice2 has joined #archiveteam [14:39]
icedice has quit IRC (Ping timeout: 506 seconds) [14:46]
UzerusWHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [15:00]
..... (idle for 20mn)
***icedice2 has quit IRC (Quit: Leaving) [15:20]
..... (idle for 20mn)
kimmer12 has joined #archiveteam [15:40]
kimmer1 has quit IRC (Ping timeout: 632 seconds) [15:47]
..... (idle for 21mn)
seatsea has joined #archiveteam [16:08]
.............. (idle for 1h7mn)
Stiletto has quit IRC (Read error: Operation timed out) [17:15]
.... (idle for 16mn)
Vito`hey
so Lytro unceremoniously shutdown pictures.lytro.com, their hosting for their lightfield images, also breaking all old embeds
https://www.theverge.com/2017/12/6/16742314/lytro-focus-photos-support-cameras-illum
[17:31]
LastNinjareading now
thanks
[17:33]
Vito`IA captured a lot of embeds and some of the galleries, but they're all broken, because the viewer JS reads in additional URLs from a JSON manifest, which IA doesn't know how to parse
the viewer JS as captured by IA seems to work if all the files referenced in the JSON are present
so I downloaded all the pictures.lytro.com and lfe-cdn.lytro.com from IA, pulled out all the hosted UUIDs, downloaded all the JSON files for each captured embed or gallery, and generated all the CDN URLs for all the component images
I have ~2.5GB of files off the CDN that IA doesn't have from just that, but the final list of JSON-referenced files is 1.2M URLs
does AT want my WARCs and this list of URLs? I feel like it could be captured faster without me doing it myself with wget
I can make a wiki page with the details if someone wants to check my work
[17:33]
***yy has joined #archiveteam
yy has quit IRC (Client Quit)
[17:40]
Somebody2Vito`: yes, please upload them. I'm not sure if anyone will speak up to *use* them right now, but better to have it in any case. [17:48]
Uzerusewww...
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
wiki...
[17:51]
Vito`Somebody2: does that mean I should start on that 1.2M URL list myself then?
I didn't know if there was like a warrior queue or something
[17:53]
Uzerusit can be downloaded by archivebot, just list all urls in one file, one per line [17:55]
Vito`Thanks [17:57]
Uzerusupload that file somewhere and give us link to that [17:57]
Somebody2Vito`: yes, that's probably a good idea [17:57]
..... (idle for 21mn)
***schbirid has joined #archiveteam [18:18]
.... (idle for 15mn)
nertzy has joined #archiveteam
du_ has joined #archiveteam
[18:33]
Somebody2WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD (it's changed since I was last told) [18:46]
SketchCow--------------------------------------------------------
FOS WAS DOWN DUE TO A POWER OUTAGE, NOW DESPERATELY TRYING TO CLAW BACK
--------------------------------------------------------
[18:50]
..... (idle for 20mn)
Vito`okay I think I did this right
http://www.archiveteam.org/index.php?title=Lytro has details on what happened and what I did
https://archive.org/details/lytro-hosted-partial-missing-files has my WARCs plus the file with the 1.2M other URLs I have not yet downloaded
it's tagged "archiveteam" so it can be moved/ingested later
[19:10]
***Mateon1 has quit IRC (Ping timeout: 248 seconds)
Mateon1 has joined #archiveteam
[19:12]
.... (idle for 16mn)
nwithan8 has joined #archiveteam [19:28]
Uzerus [19:32]
***kimmer12 has quit IRC (Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org)
kimmer1 has joined #archiveteam
Stilett0 has joined #archiveteam
[19:34]
arkiverVito`: how did you create these WARCs? [19:35]
Vito`arkiver: wget. command lines are in the wiki page [19:35]
arkiverit's that the in the records WARC-Target-URI has a value like <URL>
which should not have the < and >
I'd say the WARCs are invalid
which wget version did you use
[19:37]
Vito`GNU Wget 1.19.2 built on darwin15.6.0.
-cares +digest -gpgme +https +ipv6 -iri +large-file -metalink -nls
+ntlm +opie -psl +ssl/openssl
https://github.com/iipc/warc-specifications/issues/23
looks like the WARC standard originally had it as <URL> but no-one implemented that so it was removed
I guess no-one except wget?
[19:39]
arkiverversions of wget after the version we use yes
wget-lua doesn't have the problem
[19:43]
***nwithan8 has quit IRC (Quit: Page closed) [19:44]
Vito`I'll pull down the 1.2M URL list on a wget 1.16 machine, which doesn't put angle brackets around the WARC-Target-URI
is it actually a problem for the other WARCs? Do I need to redownload them?
[19:47]
arkiverwe could in theory fix the WARCs
but if the URLs are still available it might be better to just archive them again
and yes, it's a problem with the WARCs
the cdx now takes '<https' as the domain
[19:48]
Vito`the cdx files I uploaded don't have angle brackets [19:52]
arkiverthose cdx files are not used
IA always derives it's own CDX files
feel free to upload CDX files, but it's not used by IA
[19:53]
Vito`ah [19:56]
***bRick5772 has quit IRC (Quit: Leaving.) [19:58]
....... (idle for 30mn)
Vito`looks like wget 1.18 and later might generate WARC-Target-URI with brackets: https://savannah.gnu.org/bugs/?47281
1.18 was released in June 2016
[20:28]
....... (idle for 33mn)
***BlueMaxim has joined #archiveteam [21:01]
..... (idle for 21mn)
Somebody2JAA: thanks, looking [21:22]
.... (idle for 16mn)
godaneSketchCow: i got lucky to not be uploading anything to that when that happend [21:38]
***schbirid has quit IRC (Quit: Leaving) [21:52]
Somebody2JAA: found it: https://archive.org/details/mininova_20170323_wpull [22:02]
JAASomebody2: Wrong channel ;-) But yes, that's the one. [22:04]
godaneso i got 6 new tapes from savers today
one is called Bellydance Fitness for Beginners
i also got YogaKids vhs tape
1 Quack Pack tape, 1 Tomon & Pumbaa tape, 1 Alvin and the chipmunks tape, and 1 felix cat tape
[22:06]
..... (idle for 20mn)
***MMovie has quit IRC (Read error: Connection reset by peer)
MMovie has joined #archiveteam
[22:29]
.... (idle for 15mn)
jschwart has quit IRC (Quit: Konversation terminated!) [22:44]
pizzaiolo has quit IRC (pizzaiolo)
RichardG_ has joined #archiveteam
ndiddy_ has joined #archiveteam
K4k_ has joined #archiveteam
ppsym has joined #archiveteam
[22:52]
tuluu_ has joined #archiveteam
RichardG has quit IRC (se.hub irc.underworld.no)
Ctrl has quit IRC (se.hub irc.underworld.no)
Nemo_bis has quit IRC (se.hub irc.underworld.no)
MrDignity has quit IRC (se.hub irc.underworld.no)
ndiddy has quit IRC (se.hub irc.underworld.no)
espes__ has quit IRC (se.hub irc.underworld.no)
tuluu has quit IRC (se.hub irc.underworld.no)
PurpleSym has quit IRC (se.hub irc.underworld.no)
K4k has quit IRC (se.hub irc.underworld.no)
Rai-chan has quit IRC (se.hub irc.underworld.no)
i0npulse has quit IRC (se.hub irc.underworld.no)
medowar has quit IRC (se.hub irc.underworld.no)
espes___ has joined #archiveteam
[23:02]
ppsym is now known as PurpleSym [23:20]
...... (idle for 25mn)
BlueMaxim has quit IRC (Read error: Connection reset by peer)
BlueMaxim has joined #archiveteam
[23:45]
MrDignity has joined #archiveteam [23:58]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)