| Time |
Nickname |
Message |
|
00:24
🔗
|
|
jshoard has quit IRC (Quit: Leaving) |
|
01:00
🔗
|
|
Arcorann has joined #archiveteam-bs |
|
01:01
🔗
|
|
Arcorann has quit IRC (Remote host closed the connection) |
|
01:01
🔗
|
|
Arcorann has joined #archiveteam-bs |
|
01:38
🔗
|
|
scorche has quit IRC (Read error: Operation timed out) |
|
01:48
🔗
|
|
scorche has joined #archiveteam-bs |
|
01:53
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 272 seconds) |
|
01:53
🔗
|
|
Ctrl has quit IRC (Read error: Operation timed out) |
|
01:54
🔗
|
|
Mateon1 has joined #archiveteam-bs |
|
01:55
🔗
|
|
brayden has quit IRC (Ping timeout: 272 seconds) |
|
01:55
🔗
|
|
Laverne has quit IRC (Ping timeout: 272 seconds) |
|
02:29
🔗
|
|
Ctrl has joined #archiveteam-bs |
|
02:38
🔗
|
|
brayden has joined #archiveteam-bs |
|
02:38
🔗
|
|
Laverne has joined #archiveteam-bs |
|
02:55
🔗
|
|
asdf01011 has quit IRC (Remote host closed the connection) |
|
03:12
🔗
|
|
qw3rty_ has joined #archiveteam-bs |
|
03:13
🔗
|
|
asdf01011 has joined #archiveteam-bs |
|
03:15
🔗
|
|
qw3rty__ has quit IRC (Ping timeout: 265 seconds) |
|
03:23
🔗
|
SketchCow |
My 50th birthday party tomorrow: http://50.textfiles.com |
|
05:12
🔗
|
|
benjinsmi has quit IRC (Read error: Connection reset by peer) |
|
06:32
🔗
|
|
larryv has quit IRC (Quit: larryv) |
|
06:40
🔗
|
|
endrift has quit IRC (Read error: Operation timed out) |
|
06:48
🔗
|
|
endrift has joined #archiveteam-bs |
|
08:10
🔗
|
|
jshoard has joined #archiveteam-bs |
|
08:48
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
|
11:04
🔗
|
|
scorche has quit IRC (Read error: Operation timed out) |
|
11:35
🔗
|
|
benjins has joined #archiveteam-bs |
|
11:46
🔗
|
|
scorche has joined #archiveteam-bs |
|
12:11
🔗
|
|
VADemon has joined #archiveteam-bs |
|
13:19
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
|
14:02
🔗
|
|
dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
|
14:03
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
14:21
🔗
|
|
nepeat has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) |
|
14:23
🔗
|
|
nepeat has joined #archiveteam-bs |
|
14:36
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
|
14:45
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
|
14:55
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 272 seconds) |
|
14:57
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
|
14:58
🔗
|
|
Lord_Nigh has quit IRC (Remote host closed the connection) |
|
15:24
🔗
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
|
16:07
🔗
|
|
VADemon has joined #archiveteam-bs |
|
16:35
🔗
|
|
godane has quit IRC (Read error: Connection reset by peer) |
|
16:51
🔗
|
|
godane has joined #archiveteam-bs |
|
17:50
🔗
|
|
DigiDigi has quit IRC (Remote host closed the connection) |
|
18:16
🔗
|
|
DigiDigi has joined #archiveteam-bs |
|
19:06
🔗
|
|
Laverne has quit IRC (Ping timeout: 272 seconds) |
|
19:07
🔗
|
|
brayden has quit IRC (Ping timeout: 272 seconds) |
|
19:13
🔗
|
jodizzle |
cm: So what kind of solution are you referring to? |
|
19:14
🔗
|
cm |
well my naive approach is to wget all the encosure files (mp3s) |
|
19:14
🔗
|
cm |
for each item that i archive, i replace the enclosure url in the rss feed with a link to my own copy |
|
19:14
🔗
|
cm |
but this is not portable |
|
19:15
🔗
|
jodizzle |
Oh I see, so you're saying that you're creating your own podcast feed? |
|
19:15
🔗
|
cm |
yeah |
|
19:15
🔗
|
jodizzle |
Based on the original feed, but where items reference what you've already downloaded |
|
19:16
🔗
|
cm |
yeah |
|
19:16
🔗
|
jodizzle |
That's interesting |
|
19:16
🔗
|
cm |
but creating the archive feed could be a separate step if you have a well-defined archive format |
|
19:17
🔗
|
jodizzle |
What do you mean by "archive format", in this case? |
|
19:17
🔗
|
cm |
simply wgetting the rss and the enclosures doesn't work, because wget doesn't store the name of the url that was wgotten |
|
19:18
🔗
|
cm |
maybe warc would be enough? |
|
19:18
🔗
|
jodizzle |
What would you be uputting in the WARCs, to be clear? |
|
19:19
🔗
|
cm |
the mp3 files and rss i guess |
|
19:19
🔗
|
cm |
warc remembers the url of the fetched file |
|
19:21
🔗
|
jodizzle |
Yeah, I guess one approach would be to save all the podcasts downloaded into WARCs, so you'd get data + metdata for each. You could also save the rss feed at the time of the grab. (Basically what you said.) |
|
19:21
🔗
|
cm |
i haven't thought through what could be bundled in a single warc file, and what would have to be separate |
|
19:21
🔗
|
jodizzle |
Then, you can generate a CDX of the WARCs that acts as an index for which URLs you've gotten |
|
19:22
🔗
|
cm |
cdx? |
|
19:23
🔗
|
jodizzle |
https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/ |
|
19:25
🔗
|
jodizzle |
Also, typically a WARC is a single thing, in this case, either one request or response. But you can bundle WARCs together into gzipped .warc.gz files. |
|
19:26
🔗
|
jodizzle |
That's how they're usually stored in bulk. |
|
19:26
🔗
|
jodizzle |
(Here are examples of different records, for reference: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#annex-b-informative-examples-of-warc-records) |
|
19:27
🔗
|
cm |
so i guess the cleanest and most consistent way would be to treat the podcast rss as a web page, and crawl it like you would a website |
|
19:27
🔗
|
cm |
then you get an archive of the rss and the content files |
|
19:27
🔗
|
jodizzle |
But this is all archival details. I am a little confused by what you mean by "portable". What would a "portable feed archive" be? |
|
19:28
🔗
|
cm |
the current format of my feed archives is a directory with feed.xml, and subdirectories written by wget containing the content |
|
19:29
🔗
|
cm |
the whole directory is served by a webserver, and the feed.xml has links to my copies of the content |
|
19:30
🔗
|
cm |
one way to make it portable would be to use relative links to the content, but i dont think that is possible in RSS |
|
19:32
🔗
|
cm |
so i thought about putting a custom keyword in place of the webroot for each archived piece of content, then to make a usable RSS feed would would replace that keyword with whatever the webroot happens to be |
|
19:33
🔗
|
jodizzle |
Oh, so you mean it's not currently portable because if the IP address/domain changes, the links in feed.xml will break? |
|
19:33
🔗
|
cm |
yeah |
|
19:34
🔗
|
jodizzle |
Okay, yeah. If you can't use relative links in RSS (is that defintely true), then I don't think there's anyway around something like what you described. |
|
19:34
🔗
|
jodizzle |
You could have feed.xml created periodically by a cronjob or similar via a script that refernces a "webroot" setting |
|
19:35
🔗
|
jodizzle |
Or something like that. |
|
19:36
🔗
|
|
VADemon has quit IRC (left4dead) |
|
19:38
🔗
|
cm |
i couldn't find anything definitive saying RSS does not support relative links |
|
19:38
🔗
|
cm |
but there are at least a significant number of readers that dont support it |
|
19:39
🔗
|
cm |
and it makes sense that readers would not store the prefix of the URL used to fetch the rss feed, which would be necessary to determine to full url for a relative link |
|
19:41
🔗
|
jodizzle |
Hm, I don't know. It seems like if feed readers have to fetch feeds from web domains, they could keep track of those domains and use them to resolve relative links? |
|
19:41
🔗
|
cm |
yeah true |
|
19:41
🔗
|
jodizzle |
But if they don't do that, they don't do that. Maybe there's some more complexity to it that I'm not thinking of. |
|
19:42
🔗
|
cm |
maybe i'll do a test |
|
19:42
🔗
|
|
obskyr has quit IRC (Read error: Operation timed out) |
|
19:43
🔗
|
|
sHATNER has quit IRC (Read error: Operation timed out) |
|
19:43
🔗
|
|
omglolba- has quit IRC (Read error: Operation timed out) |
|
19:46
🔗
|
|
omglolbah has joined #archiveteam-bs |
|
19:46
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
|
19:48
🔗
|
|
obskyr has joined #archiveteam-bs |
|
19:50
🔗
|
|
closure has joined #archiveteam-bs |
|
19:50
🔗
|
|
Maylay has quit IRC (Read error: Operation timed out) |
|
19:50
🔗
|
cm |
yeah my default podcasts app rejects items with relative enclosure links |
|
19:51
🔗
|
cm |
i.e. doesn't display them |
|
19:52
🔗
|
nico_32 |
SketchCow: have a nice party & birthday! |
|
19:52
🔗
|
jodizzle |
cm: Unfortunate. |
|
19:53
🔗
|
cm |
now i guess warc files are basically an annotated transcript of an http response |
|
19:54
🔗
|
cm |
does warc have any way to refer to a standalone file? |
|
19:54
🔗
|
cm |
i.e. "the next thing the server sent was the contents of this file" |
|
19:54
🔗
|
cm |
with a pointer to a html or mp3 file on disk |
|
19:56
🔗
|
|
Maylay has joined #archiveteam-bs |
|
19:58
🔗
|
jodizzle |
I think you could manufacture something like that, but typically you would just store the bytes in the WARC. |
|
19:59
🔗
|
jodizzle |
So a response warc would contain the response bytes and all the headers and metadata of that response |
|
20:01
🔗
|
cm |
yeah |
|
20:01
🔗
|
cm |
then to view the file you need a server side script to strip out the metadata |
|
20:03
🔗
|
jodizzle |
In a sense, yes. Though typically you'd be reading a library or using some toolkit that's built for reading WARC data. |
|
20:05
🔗
|
jodizzle |
But yes, I think I see what you mean. I think the links in the feed.xml you generate would basically have to route to some webserver endpoint which does the work necessary to read from the WARC. |
|
20:07
🔗
|
cm |
couldn't i use the warc server for the rss as well |
|
20:10
🔗
|
jodizzle |
What do you mean? |
|
20:10
🔗
|
|
sHATNER has joined #archiveteam-bs |
|
20:10
🔗
|
|
brayden has joined #archiveteam-bs |
|
20:10
🔗
|
cm |
pywb for example |
|
20:10
🔗
|
cm |
if im using it to browse a website, it will rewrite links to point to archived copies, right? |
|
20:11
🔗
|
|
Laverne has joined #archiveteam-bs |
|
20:13
🔗
|
cm |
so couldnt i let pywb rewrite the links in the rss feed? or does it not do that |
|
20:13
🔗
|
jodizzle |
Does pywb have a browsing feature like that? I've only ever used it in the context of a local WBM to view WARCs I've generated separately. |
|
20:13
🔗
|
jodizzle |
If it does though, that sounds pretty cool |
|
20:14
🔗
|
cm |
idk actually |
|
20:14
🔗
|
cm |
i assumed it works like web.archive.org |
|
20:14
🔗
|
cm |
but come to think of it, web.archive.org doesn't do that for rss, only html |
|
20:15
🔗
|
jodizzle |
And in terms of how this would play with an RSS feed reader, I don't know. |
|
20:31
🔗
|
|
dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
|
20:35
🔗
|
|
scorche has quit IRC (Read error: Operation timed out) |
|
20:48
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
21:01
🔗
|
|
icedice has joined #archiveteam-bs |
|
21:05
🔗
|
|
icedice has quit IRC (Client Quit) |
|
21:17
🔗
|
|
scorche has joined #archiveteam-bs |
|
21:37
🔗
|
|
scorche has quit IRC (Read error: Operation timed out) |
|
21:57
🔗
|
|
lennier1 has joined #archiveteam-bs |
|
22:01
🔗
|
|
paul2520 has quit IRC (Read error: Operation timed out) |
|
22:01
🔗
|
|
Jake has quit IRC (Read error: Operation timed out) |
|
22:01
🔗
|
|
endrift has quit IRC (Read error: Operation timed out) |
|
22:02
🔗
|
|
britmob has quit IRC (Read error: Operation timed out) |
|
22:02
🔗
|
|
paul2520 has joined #archiveteam-bs |
|
22:02
🔗
|
|
Jake has joined #archiveteam-bs |
|
22:02
🔗
|
|
endrift has joined #archiveteam-bs |
|
22:02
🔗
|
|
britmob has joined #archiveteam-bs |
|
22:02
🔗
|
|
systwi_ has joined #archiveteam-bs |
|
22:03
🔗
|
|
Meli has joined #archiveteam-bs |
|
22:03
🔗
|
|
Hecatz- has joined #archiveteam-bs |
|
22:04
🔗
|
|
asdf01011 has quit IRC (Read error: Operation timed out) |
|
22:04
🔗
|
|
scorche has joined #archiveteam-bs |
|
22:05
🔗
|
|
voltagex_ has joined #archiveteam-bs |
|
22:05
🔗
|
|
nico_32_ has joined #archiveteam-bs |
|
22:05
🔗
|
|
Coderjo has joined #archiveteam-bs |
|
22:05
🔗
|
|
systwi has quit IRC (Read error: Operation timed out) |
|
22:06
🔗
|
|
colona_ has joined #archiveteam-bs |
|
22:07
🔗
|
|
systwi_ is now known as systwi |
|
22:07
🔗
|
|
AlsoJAA_ has joined #archiveteam-bs |
|
22:07
🔗
|
|
JAA sets mode: +o AlsoJAA_ |
|
22:08
🔗
|
|
N4Y_ has joined #archiveteam-bs |
|
22:09
🔗
|
|
nightpoo- has joined #archiveteam-bs |
|
22:09
🔗
|
|
second_ has joined #archiveteam-bs |
|
22:09
🔗
|
|
actually_ has joined #archiveteam-bs |
|
22:12
🔗
|
|
obskyr has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
nepeat has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
apache2_ has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
Meli-sama has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
nightpool has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
PotcFdk has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
voltagex has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
mr_archiv has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
AlsoJAA has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
N4Y has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
nico_32 has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
N4Y_ is now known as N4Y |
|
22:12
🔗
|
|
atg has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
second has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
DFJustin has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
Flashfire has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
acridAxid has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
zhongfu has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
igloo25 has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
Coderjo_ has quit IRC (Ping timeout: 745 seconds) |
|
22:12
🔗
|
|
step has quit IRC (Ping timeout: 745 seconds) |
|
22:13
🔗
|
|
Hecatz has quit IRC (Ping timeout: 745 seconds) |
|
22:13
🔗
|
|
Hecatz- is now known as Hecatz |
|
22:13
🔗
|
|
colona has quit IRC (Ping timeout: 745 seconds) |
|
22:14
🔗
|
|
Mateon1 has joined #archiveteam-bs |
|
22:16
🔗
|
|
mr_archiv has joined #archiveteam-bs |
|
22:52
🔗
|
|
BlueMax has joined #archiveteam-bs |
|
23:35
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
|
23:38
🔗
|
|
Lord_Nigh has quit IRC (Client Quit) |
|
23:42
🔗
|
|
jshoard has quit IRC (Quit: Leaving) |
|
23:44
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
|
23:52
🔗
|
|
Arcorann has joined #archiveteam-bs |