[00:24] *** jshoard has quit IRC (Quit: Leaving)
[01:00] *** Arcorann has joined #archiveteam-bs
[01:01] *** Arcorann has quit IRC (Remote host closed the connection)
[01:01] *** Arcorann has joined #archiveteam-bs
[01:38] *** scorche has quit IRC (Read error: Operation timed out)
[01:48] *** scorche has joined #archiveteam-bs
[01:53] *** Mateon1 has quit IRC (Ping timeout: 272 seconds)
[01:53] *** Ctrl has quit IRC (Read error: Operation timed out)
[01:54] *** Mateon1 has joined #archiveteam-bs
[01:55] *** brayden has quit IRC (Ping timeout: 272 seconds)
[01:55] *** Laverne has quit IRC (Ping timeout: 272 seconds)
[02:29] *** Ctrl has joined #archiveteam-bs
[02:38] *** brayden has joined #archiveteam-bs
[02:38] *** Laverne has joined #archiveteam-bs
[02:55] *** asdf01011 has quit IRC (Remote host closed the connection)
[03:12] *** qw3rty_ has joined #archiveteam-bs
[03:13] *** asdf01011 has joined #archiveteam-bs
[03:15] *** qw3rty__ has quit IRC (Ping timeout: 265 seconds)
[03:23] <SketchCow> My 50th birthday party tomorrow: http://50.textfiles.com   
[05:12] *** benjinsmi has quit IRC (Read error: Connection reset by peer)
[06:32] *** larryv has quit IRC (Quit: larryv)
[06:40] *** endrift has quit IRC (Read error: Operation timed out)
[06:48] *** endrift has joined #archiveteam-bs
[08:10] *** jshoard has joined #archiveteam-bs
[08:48] *** BlueMax has quit IRC (Read error: Connection reset by peer)
[11:04] *** scorche has quit IRC (Read error: Operation timed out)
[11:35] *** benjins has joined #archiveteam-bs
[11:46] *** scorche has joined #archiveteam-bs
[12:11] *** VADemon has joined #archiveteam-bs
[13:19] *** VADemon has quit IRC (Read error: Connection reset by peer)
[14:02] *** dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
[14:03] *** dashcloud has joined #archiveteam-bs
[14:21] *** nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
[14:23] *** nepeat has joined #archiveteam-bs
[14:36] *** Lord_Nigh has quit IRC (Read error: Operation timed out)
[14:45] *** Lord_Nigh has joined #archiveteam-bs
[14:55] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds)
[14:57] *** Lord_Nigh has joined #archiveteam-bs
[14:58] *** Lord_Nigh has quit IRC (Remote host closed the connection)
[15:24] *** Arcorann has quit IRC (Read error: Connection reset by peer)
[16:07] *** VADemon has joined #archiveteam-bs
[16:35] *** godane has quit IRC (Read error: Connection reset by peer)
[16:51] *** godane has joined #archiveteam-bs
[17:50] *** DigiDigi has quit IRC (Remote host closed the connection)
[18:16] *** DigiDigi has joined #archiveteam-bs
[19:06] *** Laverne has quit IRC (Ping timeout: 272 seconds)
[19:07] *** brayden has quit IRC (Ping timeout: 272 seconds)
[19:13] <jodizzle> cm: So what kind of solution are you referring to?
[19:14] <cm> well my naive approach is to wget all the encosure files (mp3s)
[19:14] <cm> for each item that i archive, i replace the enclosure url in the rss feed with a link to my own copy
[19:14] <cm> but this is not portable
[19:15] <jodizzle> Oh I see, so you're saying that you're creating your own podcast feed?
[19:15] <cm> yeah
[19:15] <jodizzle> Based on the original feed, but where items reference what you've already downloaded
[19:16] <cm> yeah
[19:16] <jodizzle> That's interesting
[19:16] <cm> but creating the archive feed could be a separate step if you have a well-defined archive format
[19:17] <jodizzle> What do you mean by "archive format", in this case?
[19:17] <cm> simply wgetting the rss and the enclosures doesn't work, because wget doesn't store the name of the url that was wgotten
[19:18] <cm> maybe warc would be enough?
[19:18] <jodizzle> What would you be uputting in the WARCs, to be clear?
[19:19] <cm> the mp3 files and rss i guess
[19:19] <cm> warc remembers the url of the fetched file
[19:21] <jodizzle> Yeah, I guess one approach would be to save all the podcasts downloaded into WARCs, so you'd get data + metdata for each.  You could also save the rss feed at the time of the grab.  (Basically what you said.)
[19:21] <cm> i haven't thought through what could be bundled in a single warc file, and what would have to be separate
[19:21] <jodizzle> Then, you can generate a CDX of the WARCs that acts as an index for which URLs you've gotten
[19:22] <cm> cdx?
[19:23] <jodizzle> https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/
[19:25] <jodizzle> Also, typically a WARC is a single thing, in this case, either one request or response.  But you can bundle WARCs together into gzipped .warc.gz files.
[19:26] <jodizzle> That's how they're usually stored in bulk.
[19:26] <jodizzle> (Here are examples of different records, for reference: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records)
[19:27] <cm> so i guess the cleanest and most consistent way would be to treat the podcast rss as a web page, and crawl it like you would a website
[19:27] <cm> then you get an archive of the rss and the content files
[19:27] <jodizzle> But this is all archival details.  I am a little confused by what you mean by "portable".  What would a "portable feed archive" be?
[19:28] <cm> the current format of my feed archives is a directory with feed.xml, and subdirectories written by wget containing the content
[19:29] <cm> the whole directory is served by a webserver, and the feed.xml has links to my copies of the content
[19:30] <cm> one way to make it portable would be to use relative links to the content, but i dont think that is possible in RSS
[19:32] <cm> so i thought about putting a custom keyword in place of the webroot for each archived piece of content, then to make a usable RSS feed would would replace that keyword with whatever the webroot happens to be
[19:33] <jodizzle> Oh, so you mean it's not currently portable because if the IP address/domain changes, the links in feed.xml will break?
[19:33] <cm> yeah
[19:34] <jodizzle> Okay, yeah.  If you can't use relative links in RSS (is that defintely true), then I don't think there's anyway around something like what you described.
[19:34] <jodizzle> You could have feed.xml created periodically by a cronjob or similar via a script that refernces a "webroot" setting
[19:35] <jodizzle> Or something like that.
[19:36] *** VADemon has quit IRC (left4dead)
[19:38] <cm> i couldn't find anything definitive saying RSS does not support relative links
[19:38] <cm> but there are at least a significant number of readers that dont support it
[19:39] <cm> and it makes sense that readers would not store the prefix of the URL used to fetch the rss feed, which would be necessary to determine to full url for a relative link
[19:41] <jodizzle> Hm, I don't know.  It seems like if feed readers have to fetch feeds from web domains, they could keep track of those domains and use them to resolve relative links?
[19:41] <cm> yeah true
[19:41] <jodizzle> But if they don't do that, they don't do that.  Maybe there's some more complexity to it that I'm not thinking of.
[19:42] <cm> maybe i'll do a test
[19:42] *** obskyr has quit IRC (Read error: Operation timed out)
[19:43] *** sHATNER has quit IRC (Read error: Operation timed out)
[19:43] *** omglolba- has quit IRC (Read error: Operation timed out)
[19:46] *** omglolbah has joined #archiveteam-bs
[19:46] *** closure has quit IRC (Read error: Operation timed out)
[19:48] *** obskyr has joined #archiveteam-bs
[19:50] *** closure has joined #archiveteam-bs
[19:50] *** Maylay has quit IRC (Read error: Operation timed out)
[19:50] <cm> yeah my default podcasts app rejects items with relative enclosure links
[19:51] <cm> i.e. doesn't display them
[19:52] <nico_32> SketchCow: have a nice party & birthday!
[19:52] <jodizzle> cm: Unfortunate.
[19:53] <cm> now i guess warc files are basically an annotated transcript of an http response
[19:54] <cm> does warc have any way to refer to a standalone file?
[19:54] <cm> i.e. "the next thing the server sent was the contents of this file"
[19:54] <cm> with a pointer to a html or mp3 file on disk
[19:56] *** Maylay has joined #archiveteam-bs
[19:58] <jodizzle> I think you could manufacture something like that, but typically you would just store the bytes in the WARC.
[19:59] <jodizzle> So a response warc would contain the response bytes and all the headers and metadata of that response
[20:01] <cm> yeah
[20:01] <cm> then to view the file you need a server side script to strip out the metadata
[20:03] <jodizzle> In a sense, yes.  Though typically you'd be reading a library or using some toolkit that's built for reading WARC data.
[20:05] <jodizzle> But yes, I think I see what you mean.  I think the links in the feed.xml you generate would basically have to route to some webserver endpoint which does the work necessary to read from the WARC.
[20:07] <cm> couldn't i use the warc server for the rss as well
[20:10] <jodizzle> What do you mean?
[20:10] *** sHATNER has joined #archiveteam-bs
[20:10] *** brayden has joined #archiveteam-bs
[20:10] <cm> pywb for example
[20:10] <cm> if im using it to browse a website, it will rewrite links to point to archived copies, right?
[20:11] *** Laverne has joined #archiveteam-bs
[20:13] <cm> so couldnt i let pywb rewrite the links in the rss feed?  or does it not do that
[20:13] <jodizzle> Does pywb have a browsing feature like that?  I've only ever used it in the context of a local WBM to view WARCs I've generated separately.
[20:13] <jodizzle> If it does though, that sounds pretty cool
[20:14] <cm> idk actually
[20:14] <cm> i assumed it works like web.archive.org
[20:14] <cm> but come to think of it, web.archive.org doesn't do that for rss, only html
[20:15] <jodizzle> And in terms of how this would play with an RSS feed reader, I don't know.
[20:31] *** dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
[20:35] *** scorche has quit IRC (Read error: Operation timed out)
[20:48] *** dashcloud has joined #archiveteam-bs
[21:01] *** icedice has joined #archiveteam-bs
[21:05] *** icedice has quit IRC (Client Quit)
[21:17] *** scorche has joined #archiveteam-bs
[21:37] *** scorche has quit IRC (Read error: Operation timed out)
[21:57] *** lennier1 has joined #archiveteam-bs
[22:01] *** paul2520 has quit IRC (Read error: Operation timed out)
[22:01] *** Jake has quit IRC (Read error: Operation timed out)
[22:01] *** endrift has quit IRC (Read error: Operation timed out)
[22:02] *** britmob has quit IRC (Read error: Operation timed out)
[22:02] *** paul2520 has joined #archiveteam-bs
[22:02] *** Jake has joined #archiveteam-bs
[22:02] *** endrift has joined #archiveteam-bs
[22:02] *** britmob has joined #archiveteam-bs
[22:02] *** systwi_ has joined #archiveteam-bs
[22:03] *** Meli has joined #archiveteam-bs
[22:03] *** Hecatz- has joined #archiveteam-bs
[22:04] *** asdf01011 has quit IRC (Read error: Operation timed out)
[22:04] *** scorche has joined #archiveteam-bs
[22:05] *** voltagex_ has joined #archiveteam-bs
[22:05] *** nico_32_ has joined #archiveteam-bs
[22:05] *** Coderjo has joined #archiveteam-bs
[22:05] *** systwi has quit IRC (Read error: Operation timed out)
[22:06] *** colona_ has joined #archiveteam-bs
[22:07] *** systwi_ is now known as systwi
[22:07] *** AlsoJAA_ has joined #archiveteam-bs
[22:07] *** JAA sets mode: +o AlsoJAA_
[22:08] *** N4Y_ has joined #archiveteam-bs
[22:09] *** nightpoo- has joined #archiveteam-bs
[22:09] *** second_ has joined #archiveteam-bs
[22:09] *** actually_ has joined #archiveteam-bs
[22:12] *** obskyr has quit IRC (Ping timeout: 745 seconds)
[22:12] *** nepeat has quit IRC (Ping timeout: 745 seconds)
[22:12] *** apache2_ has quit IRC (Ping timeout: 745 seconds)
[22:12] *** Meli-sama has quit IRC (Ping timeout: 745 seconds)
[22:12] *** nightpool has quit IRC (Ping timeout: 745 seconds)
[22:12] *** PotcFdk has quit IRC (Ping timeout: 745 seconds)
[22:12] *** voltagex has quit IRC (Ping timeout: 745 seconds)
[22:12] *** mr_archiv has quit IRC (Ping timeout: 745 seconds)
[22:12] *** AlsoJAA has quit IRC (Ping timeout: 745 seconds)
[22:12] *** N4Y has quit IRC (Ping timeout: 745 seconds)
[22:12] *** nico_32 has quit IRC (Ping timeout: 745 seconds)
[22:12] *** N4Y_ is now known as N4Y
[22:12] *** atg has quit IRC (Ping timeout: 745 seconds)
[22:12] *** Mateon1 has quit IRC (Ping timeout: 745 seconds)
[22:12] *** second has quit IRC (Ping timeout: 745 seconds)
[22:12] *** DFJustin has quit IRC (Ping timeout: 745 seconds)
[22:12] *** Flashfire has quit IRC (Ping timeout: 745 seconds)
[22:12] *** acridAxid has quit IRC (Ping timeout: 745 seconds)
[22:12] *** zhongfu has quit IRC (Ping timeout: 745 seconds)
[22:12] *** igloo25 has quit IRC (Ping timeout: 745 seconds)
[22:12] *** Coderjo_ has quit IRC (Ping timeout: 745 seconds)
[22:12] *** step has quit IRC (Ping timeout: 745 seconds)
[22:13] *** Hecatz has quit IRC (Ping timeout: 745 seconds)
[22:13] *** Hecatz- is now known as Hecatz
[22:13] *** colona has quit IRC (Ping timeout: 745 seconds)
[22:14] *** Mateon1 has joined #archiveteam-bs
[22:16] *** mr_archiv has joined #archiveteam-bs
[22:52] *** BlueMax has joined #archiveteam-bs
[23:35] *** Lord_Nigh has joined #archiveteam-bs
[23:38] *** Lord_Nigh has quit IRC (Client Quit)
[23:42] *** jshoard has quit IRC (Quit: Leaving)
[23:44] *** Lord_Nigh has joined #archiveteam-bs
[23:52] *** Arcorann has joined #archiveteam-bs