#archiveteam 2017-12-17,Sun

↑back Search

Time Nickname Message
00:14 🔗 bRick5773 has quit IRC (Quit: Leaving.)
00:18 🔗 BlueMaxim has joined #archiveteam
01:43 🔗 LastNinja has quit IRC (Ping timeout: 260 seconds)
01:47 🔗 ZexaronS- has joined #archiveteam
01:47 🔗 ZexaronS has quit IRC (Read error: Connection reset by peer)
01:53 🔗 Pixi has quit IRC (Ping timeout: 255 seconds)
01:55 🔗 Pixi has joined #archiveteam
01:56 🔗 db48x has joined #archiveteam
02:01 🔗 Svekla has joined #archiveteam
02:01 🔗 Burak has quit IRC (Read error: Connection reset by peer)
02:33 🔗 Stilett0 is now known as Stiletto
02:46 🔗 Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
02:53 🔗 Odd0002 has joined #archiveteam
04:03 🔗 qw3rty112 has joined #archiveteam
04:07 🔗 qw3rty111 has quit IRC (Read error: Operation timed out)
04:14 🔗 pizzaiolo has quit IRC (pizzaiolo)
05:01 🔗 qw3rty113 has joined #archiveteam
05:05 🔗 qw3rty112 has quit IRC (Read error: Operation timed out)
06:43 🔗 RichardG_ has quit IRC (Ping timeout: 255 seconds)
07:08 🔗 kimmer12 has joined #archiveteam
07:14 🔗 kimmer1 has quit IRC (Read error: Operation timed out)
07:23 🔗 kimmer1 has joined #archiveteam
07:28 🔗 kimmer13 has joined #archiveteam
07:30 🔗 kimmer12 has quit IRC (Ping timeout: 633 seconds)
07:34 🔗 kimmer1 has quit IRC (Read error: Operation timed out)
07:38 🔗 kimmer1 has joined #archiveteam
07:44 🔗 kimmer13 has quit IRC (Ping timeout: 633 seconds)
08:37 🔗 ZexaronS- has quit IRC (Read error: Connection reset by peer)
08:38 🔗 ZexaronS- has joined #archiveteam
09:35 🔗 schbirid has joined #archiveteam
11:12 🔗 schbirid has quit IRC (Quit: Leaving)
11:31 🔗 Uzerus_ has quit IRC (Quit: Page closed)
11:39 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:12 🔗 pizzaiolo has joined #archiveteam
12:17 🔗 jschwart has joined #archiveteam
12:30 🔗 bRick5772 has joined #archiveteam
12:33 🔗 odemg has quit IRC (Quit: Leaving)
12:36 🔗 Ctrl has joined #archiveteam
12:41 🔗 kimmer1 has quit IRC (Remote host closed the connection)
12:42 🔗 kimmer1 has joined #archiveteam
12:46 🔗 icedice has joined #archiveteam
12:50 🔗 ZexaronS- has quit IRC (Quit: Leaving)
13:12 🔗 odemg has joined #archiveteam
13:31 🔗 LastNinja has joined #archiveteam
14:04 🔗 Spaghetto has joined #archiveteam
14:07 🔗 Spaghetto has quit IRC (Client Quit)
14:31 🔗 RichardG has joined #archiveteam
14:39 🔗 icedice2 has joined #archiveteam
14:46 🔗 icedice has quit IRC (Ping timeout: 506 seconds)
15:00 🔗 Uzerus WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
15:20 🔗 icedice2 has quit IRC (Quit: Leaving)
15:40 🔗 kimmer12 has joined #archiveteam
15:47 🔗 kimmer1 has quit IRC (Ping timeout: 632 seconds)
16:08 🔗 seatsea has joined #archiveteam
17:15 🔗 Stiletto has quit IRC (Read error: Operation timed out)
17:31 🔗 Vito` hey
17:32 🔗 Vito` so Lytro unceremoniously shutdown pictures.lytro.com, their hosting for their lightfield images, also breaking all old embeds
17:32 🔗 Vito` https://www.theverge.com/2017/12/6/16742314/lytro-focus-photos-support-cameras-illum
17:33 🔗 LastNinja reading now
17:33 🔗 LastNinja thanks
17:33 🔗 Vito` IA captured a lot of embeds and some of the galleries, but they're all broken, because the viewer JS reads in additional URLs from a JSON manifest, which IA doesn't know how to parse
17:34 🔗 Vito` the viewer JS as captured by IA seems to work if all the files referenced in the JSON are present
17:35 🔗 Vito` so I downloaded all the pictures.lytro.com and lfe-cdn.lytro.com from IA, pulled out all the hosted UUIDs, downloaded all the JSON files for each captured embed or gallery, and generated all the CDN URLs for all the component images
17:35 🔗 Vito` I have ~2.5GB of files off the CDN that IA doesn't have from just that, but the final list of JSON-referenced files is 1.2M URLs
17:36 🔗 Vito` does AT want my WARCs and this list of URLs? I feel like it could be captured faster without me doing it myself with wget
17:38 🔗 Vito` I can make a wiki page with the details if someone wants to check my work
17:40 🔗 yy has joined #archiveteam
17:40 🔗 yy has quit IRC (Client Quit)
17:48 🔗 Somebody2 Vito`: yes, please upload them. I'm not sure if anyone will speak up to *use* them right now, but better to have it in any case.
17:51 🔗 Uzerus ewww...
17:51 🔗 Uzerus WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
17:52 🔗 Uzerus wiki...
17:53 🔗 Vito` Somebody2: does that mean I should start on that 1.2M URL list myself then?
17:53 🔗 Vito` I didn't know if there was like a warrior queue or something
17:55 🔗 Uzerus it can be downloaded by archivebot, just list all urls in one file, one per line
17:57 🔗 Vito` Thanks
17:57 🔗 Uzerus upload that file somewhere and give us link to that
17:57 🔗 Somebody2 Vito`: yes, that's probably a good idea
18:18 🔗 schbirid has joined #archiveteam
18:33 🔗 nertzy has joined #archiveteam
18:36 🔗 du_ has joined #archiveteam
18:46 🔗 Somebody2 WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD (it's changed since I was last told)
18:50 🔗 SketchCow --------------------------------------------------------
18:50 🔗 SketchCow FOS WAS DOWN DUE TO A POWER OUTAGE, NOW DESPERATELY TRYING TO CLAW BACK
18:50 🔗 SketchCow --------------------------------------------------------
19:10 🔗 Vito` okay I think I did this right
19:10 🔗 Vito` http://www.archiveteam.org/index.php?title=Lytro has details on what happened and what I did
19:10 🔗 Vito` https://archive.org/details/lytro-hosted-partial-missing-files has my WARCs plus the file with the 1.2M other URLs I have not yet downloaded
19:11 🔗 Vito` it's tagged "archiveteam" so it can be moved/ingested later
19:12 🔗 Mateon1 has quit IRC (Ping timeout: 248 seconds)
19:12 🔗 Mateon1 has joined #archiveteam
19:28 🔗 nwithan8 has joined #archiveteam
19:32 🔗 Uzerus 
19:34 🔗 kimmer12 has quit IRC (Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org)
19:34 🔗 kimmer1 has joined #archiveteam
19:34 🔗 Stilett0 has joined #archiveteam
19:35 🔗 arkiver Vito`: how did you create these WARCs?
19:35 🔗 Vito` arkiver: wget. command lines are in the wiki page
19:37 🔗 arkiver it's that the in the records WARC-Target-URI has a value like <URL>
19:37 🔗 arkiver which should not have the < and >
19:38 🔗 arkiver I'd say the WARCs are invalid
19:38 🔗 arkiver which wget version did you use
19:39 🔗 Vito` GNU Wget 1.19.2 built on darwin15.6.0.
19:39 🔗 Vito` -cares +digest -gpgme +https +ipv6 -iri +large-file -metalink -nls
19:39 🔗 Vito` +ntlm +opie -psl +ssl/openssl
19:41 🔗 Vito` https://github.com/iipc/warc-specifications/issues/23
19:41 🔗 Vito` looks like the WARC standard originally had it as <URL> but no-one implemented that so it was removed
19:42 🔗 Vito` I guess no-one except wget?
19:43 🔗 arkiver versions of wget after the version we use yes
19:43 🔗 arkiver wget-lua doesn't have the problem
19:44 🔗 nwithan8 has quit IRC (Quit: Page closed)
19:47 🔗 Vito` I'll pull down the 1.2M URL list on a wget 1.16 machine, which doesn't put angle brackets around the WARC-Target-URI
19:47 🔗 Vito` is it actually a problem for the other WARCs? Do I need to redownload them?
19:48 🔗 arkiver we could in theory fix the WARCs
19:48 🔗 arkiver but if the URLs are still available it might be better to just archive them again
19:48 🔗 arkiver and yes, it's a problem with the WARCs
19:48 🔗 arkiver the cdx now takes '<https' as the domain
19:52 🔗 Vito` the cdx files I uploaded don't have angle brackets
19:53 🔗 arkiver those cdx files are not used
19:54 🔗 arkiver IA always derives it's own CDX files
19:54 🔗 arkiver feel free to upload CDX files, but it's not used by IA
19:56 🔗 Vito` ah
19:58 🔗 bRick5772 has quit IRC (Quit: Leaving.)
20:28 🔗 Vito` looks like wget 1.18 and later might generate WARC-Target-URI with brackets: https://savannah.gnu.org/bugs/?47281
20:28 🔗 Vito` 1.18 was released in June 2016
21:01 🔗 BlueMaxim has joined #archiveteam
21:22 🔗 Somebody2 JAA: thanks, looking
21:38 🔗 godane SketchCow: i got lucky to not be uploading anything to that when that happend
21:52 🔗 schbirid has quit IRC (Quit: Leaving)
22:02 🔗 Somebody2 JAA: found it: https://archive.org/details/mininova_20170323_wpull
22:04 🔗 JAA Somebody2: Wrong channel ;-) But yes, that's the one.
22:06 🔗 godane so i got 6 new tapes from savers today
22:07 🔗 godane one is called Bellydance Fitness for Beginners
22:08 🔗 godane i also got YogaKids vhs tape
22:09 🔗 godane 1 Quack Pack tape, 1 Tomon & Pumbaa tape, 1 Alvin and the chipmunks tape, and 1 felix cat tape
22:29 🔗 MMovie has quit IRC (Read error: Connection reset by peer)
22:29 🔗 MMovie has joined #archiveteam
22:44 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:52 🔗 pizzaiolo has quit IRC (pizzaiolo)
22:54 🔗 RichardG_ has joined #archiveteam
22:54 🔗 ndiddy_ has joined #archiveteam
22:55 🔗 K4k_ has joined #archiveteam
22:57 🔗 ppsym has joined #archiveteam
23:02 🔗 tuluu_ has joined #archiveteam
23:05 🔗 RichardG has quit IRC (se.hub irc.underworld.no)
23:05 🔗 Ctrl has quit IRC (se.hub irc.underworld.no)
23:05 🔗 Nemo_bis has quit IRC (se.hub irc.underworld.no)
23:05 🔗 MrDignity has quit IRC (se.hub irc.underworld.no)
23:05 🔗 ndiddy has quit IRC (se.hub irc.underworld.no)
23:05 🔗 espes__ has quit IRC (se.hub irc.underworld.no)
23:05 🔗 tuluu has quit IRC (se.hub irc.underworld.no)
23:05 🔗 PurpleSym has quit IRC (se.hub irc.underworld.no)
23:05 🔗 K4k has quit IRC (se.hub irc.underworld.no)
23:05 🔗 Rai-chan has quit IRC (se.hub irc.underworld.no)
23:05 🔗 i0npulse has quit IRC (se.hub irc.underworld.no)
23:05 🔗 medowar has quit IRC (se.hub irc.underworld.no)
23:08 🔗 espes___ has joined #archiveteam
23:20 🔗 ppsym is now known as PurpleSym
23:45 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
23:46 🔗 BlueMaxim has joined #archiveteam
23:58 🔗 MrDignity has joined #archiveteam

irclogger-viewer