[06:44] <godane> so some good news is some weird way with glenn beck stuff
[06:44] <godane> turns out i can start getting the hd version of the show now
[06:45] <godane> the show is a single show with just glenn beck on it now
[06:47] <godane> this maybe only sort of a storage problem in that this 1 hour shows will 1G byte
[07:15] <godane> the other good news is there is real metadata now to go with the shows
[07:20] <godane> the other fun is that i can't down grade the hd to what it use to be
[07:26] <joepie91> godane: you were the one who I grabbed the NHK podcasts for, right?
[07:36] <godane> yes
[07:36] <joepie91> godane: what do you want me to do with them? is it okay if I just give you an rsync endpoint to grab them from?
[07:36] <godane> ok
[07:37] <joepie91> (there might be duplicates, but the file modification date indicates whether they are unique... it's set to the actual recording date)
[07:37] <joepie91> alright
[07:37] <joepie91> godane: will do so later today and PM you the details
[07:37] <godane> you can do it now if you want
[13:51] <Smiley> ok warrior fired up for these grabs
[13:51] <Smiley> I'll run as long as I can at home
[13:53] <godane> any way to grab asp files when you get errors like this: http://computerpoweruser.com/articles/archive/create/cre6197/cre6197.asp
[13:53] <godane> i want to do full back of all the old computerpoweruser articles if its possable
[13:55] <Smiley> they've screwed up their server side includes godane so I don't know of anyway.. :/
[14:09] <Smiley> Sigh, all my threads have "yahooed!" :(
[14:13] <Smiley> someone is telling me you can use GETS + grabbing a directory listing and grabbing indiviual files godane - no idea if it'll work
[14:16] <godane> how do i do that?
[14:22] <godane> Smiley: ?
[14:23] <Smiley> I don't know godane it's just something someone said,and they won't give me more details.
[14:23] <Smiley> so it's likely a deadend
[14:29] <godane> Smiley: is the guy your talking to on a irc channel?
[14:30] <godane> i need to know what the hell it is
[14:38] <m1das> godane: does that asp file link to a working article or is it just broken?
[14:38] <godane> its just broking
[14:39] <godane> but i can get to a working article that is newer
[14:39] <godane> about 2004ish
[14:39] <godane> example: http://www.computerpoweruser.com/articles/archive/c0504/50c04/50c04.asp
[15:14] <kyan> So I downloaded a 10GB tar.bz2 with wget-warc, as well as some smaller stuff, and megawarced it all together. But, the resulting .warc.gz file is only 7.6GB. What's up with that?
[15:22] <sep332> perhaps the warc rearranged things - say, alphabetically - so it compressed better?
[17:44] <godane> Smiley: anything about the 'GETS' yet?
[18:12] <Smiley> non./
[20:24] <Coderjoe> kyan: the tar.bz2 likely compressed a bit more when run through gzip
[20:26] <Coderjoe> a warc.gz file is not one single gzip stream. each record is a separate stream.
[20:26] <kyan> Coderjoe, Actually I decompressed it, the raw WARC was only 7 or 8 gb so I think something got left out
[20:26] <Coderjoe> hmm
[20:27] <Coderjoe> was the tar.bz2 file just in the same directory as the warc files? what did it contain?
[20:29] <Coderjoe> megawarc creates three files:
[20:29] <Coderjoe> FILE.warc.gz is the concatenated .warc.gz FILE.tar contains any non-warc files from the .tar FILE.json.gz contains metadata
[20:31] <kyan> Coderjoe, I downloaded the tar.bz2 with wget-warc, with the automatic-delete option enabled (so only the warc was kept). Then that warc (and all the others) got fed to megawarc, and then deleted. I'll see if I can track down where they ended up (the files have all been sent out to the Internet Archive now).
[20:32] <kyan> Coderjoe,
[20:32] <kyan> https://archive.org/details/AMJ_BarrelData_6561_88c1c7eb-067b-4547-949c-e3111a189bab.2013-12-16-19-44-26-244140-_E
[20:32] <kyan> The log output went to the xz fileâ¦
[20:33] <kyan> it's from an automated system I've been working on; I was keeping an eye on this item especially because it's the first really big thing the program has worked with
[20:39] <kyan> Coderjoe, looking at the json.gz it looks like the file megawarc got was only 7gb. So, I conclude that Wget munged it :P
[20:48] <Coderjoe> or megawarc did
[20:51] <xmc> if megawarc munged it, that's worrying
[20:51] <xmc> if wget munged it, that's also worrying
[20:53] <kyan> Coderjoe, I don't think megawarc did since it doesn't (I don't think?) touch the original warc
[20:56] <kyan> I don't know though, I 'm not too familiar with the software involved
[21:01] <DFJustin> it's pretty common for the connection to drop at some point in a 10gb download, that would be my guess
[21:01] <kyan> DFJustin, usually wget resumes automatically I think? The download log seemed to indicate it got all the way to 100% complete
[21:02] <kyan> (although I did some downloads of big MPEG transport stream files with wget-warc (10 to 20 gb) and they seemmed to come through fine, so Im' not sure what's different)
[21:03] <DFJustin> depends on the parameters
[21:03] <kyan> and that was with a lot of dropped connections since it was to a server in greece (I think) on a flaky connection
[21:03] <kyan> Ok I guess I'll look at those items then, and compare the settings to what my script is doing
[21:04] <DFJustin> and on how the server on the other end behaves
[21:04] <kyan> see what I changed
[21:04] <kyan> I seeâ¦ I'll do some investigating :)
[21:04] <kyan> thanks!
[21:11] <arkiver> hmm
[21:11] <arkiver> has jason scott been online?
[21:11] <arkiver> haven't seen him for some time...
[21:13] <DFJustin> he was on this morning
[21:16] <arkiver> ah well
[21:16] <arkiver> I sended SketchCow some emails a while ago and I didn't get a reply yet, so I thought he maybe was on holiday or something
[21:17] <DFJustin> he's been flying back to new york I think
[21:18] <DFJustin> https://twitter.com/textfiles/status/413037877791326208/photo/1
[21:22] <arkiver> ah, so that's why
[21:22] <arkiver> thank you, guess I just have to wait some time... :)
[21:45] <m1das> flying or sliding, it's all the same
[23:26] <SketchCow> His terrible crack addition has overtaken his responsibilities.
[23:41] <dashcloud> this might be interesting to some folks here, on the use of PhantomJS for a massive site: http://sorcery.smugmug.com/2013/12/17/using-phantomjs-at-scale/
[23:43] <BiggieJ> i wish I undstood more about the massive node.js implementation we have at work
[23:45] <Baljem> SketchCow: ah, see, there's your mistake; try being addicted to /incredible/ crack instead.