Time |
Nickname |
Message |
06:44
π
|
godane |
so some good news is some weird way with glenn beck stuff |
06:44
π
|
godane |
turns out i can start getting the hd version of the show now |
06:45
π
|
godane |
the show is a single show with just glenn beck on it now |
06:47
π
|
godane |
this maybe only sort of a storage problem in that this 1 hour shows will 1G byte |
07:15
π
|
godane |
the other good news is there is real metadata now to go with the shows |
07:20
π
|
godane |
the other fun is that i can't down grade the hd to what it use to be |
07:26
π
|
joepie91 |
godane: you were the one who I grabbed the NHK podcasts for, right? |
07:36
π
|
godane |
yes |
07:36
π
|
joepie91 |
godane: what do you want me to do with them? is it okay if I just give you an rsync endpoint to grab them from? |
07:36
π
|
godane |
ok |
07:37
π
|
joepie91 |
(there might be duplicates, but the file modification date indicates whether they are unique... it's set to the actual recording date) |
07:37
π
|
joepie91 |
alright |
07:37
π
|
joepie91 |
godane: will do so later today and PM you the details |
07:37
π
|
godane |
you can do it now if you want |
13:51
π
|
Smiley |
ok warrior fired up for these grabs |
13:51
π
|
Smiley |
I'll run as long as I can at home |
13:53
π
|
godane |
any way to grab asp files when you get errors like this: http://computerpoweruser.com/articles/archive/create/cre6197/cre6197.asp |
13:53
π
|
godane |
i want to do full back of all the old computerpoweruser articles if its possable |
13:55
π
|
Smiley |
they've screwed up their server side includes godane so I don't know of anyway.. :/ |
14:09
π
|
Smiley |
Sigh, all my threads have "yahooed!" :( |
14:13
π
|
Smiley |
someone is telling me you can use GETS + grabbing a directory listing and grabbing indiviual files godane - no idea if it'll work |
14:16
π
|
godane |
how do i do that? |
14:22
π
|
godane |
Smiley: ? |
14:23
π
|
Smiley |
I don't know godane it's just something someone said,and they won't give me more details. |
14:23
π
|
Smiley |
so it's likely a deadend |
14:29
π
|
godane |
Smiley: is the guy your talking to on a irc channel? |
14:30
π
|
godane |
i need to know what the hell it is |
14:38
π
|
m1das |
godane: does that asp file link to a working article or is it just broken? |
14:38
π
|
godane |
its just broking |
14:39
π
|
godane |
but i can get to a working article that is newer |
14:39
π
|
godane |
about 2004ish |
14:39
π
|
godane |
example: http://www.computerpoweruser.com/articles/archive/c0504/50c04/50c04.asp |
15:14
π
|
kyan |
So I downloaded a 10GB tar.bz2 with wget-warc, as well as some smaller stuff, and megawarced it all together. But, the resulting .warc.gz file is only 7.6GB. What's up with that? |
15:22
π
|
sep332 |
perhaps the warc rearranged things - say, alphabetically - so it compressed better? |
17:44
π
|
godane |
Smiley: anything about the 'GETS' yet? |
18:12
π
|
Smiley |
non./ |
20:24
π
|
Coderjoe |
kyan: the tar.bz2 likely compressed a bit more when run through gzip |
20:26
π
|
Coderjoe |
a warc.gz file is not one single gzip stream. each record is a separate stream. |
20:26
π
|
kyan |
Coderjoe, Actually I decompressed it, the raw WARC was only 7 or 8 gb so I think something got left out |
20:26
π
|
Coderjoe |
hmm |
20:27
π
|
Coderjoe |
was the tar.bz2 file just in the same directory as the warc files? what did it contain? |
20:29
π
|
Coderjoe |
megawarc creates three files: |
20:29
π
|
Coderjoe |
FILE.warc.gz is the concatenated .warc.gz FILE.tar contains any non-warc files from the .tar FILE.json.gz contains metadata |
20:31
π
|
kyan |
Coderjoe, I downloaded the tar.bz2 with wget-warc, with the automatic-delete option enabled (so only the warc was kept). Then that warc (and all the others) got fed to megawarc, and then deleted. I'll see if I can track down where they ended up (the files have all been sent out to the Internet Archive now). |
20:32
π
|
kyan |
Coderjoe, |
20:32
π
|
kyan |
https://archive.org/details/AMJ_BarrelData_6561_88c1c7eb-067b-4547-949c-e3111a189bab.2013-12-16-19-44-26-244140-_E |
20:32
π
|
kyan |
The log output went to the xz fileΓ’ΒΒ¦ |
20:33
π
|
kyan |
it's from an automated system I've been working on; I was keeping an eye on this item especially because it's the first really big thing the program has worked with |
20:39
π
|
kyan |
Coderjoe, looking at the json.gz it looks like the file megawarc got was only 7gb. So, I conclude that Wget munged it :P |
20:48
π
|
Coderjoe |
or megawarc did |
20:51
π
|
xmc |
if megawarc munged it, that's worrying |
20:51
π
|
xmc |
if wget munged it, that's also worrying |
20:53
π
|
kyan |
Coderjoe, I don't think megawarc did since it doesn't (I don't think?) touch the original warc |
20:56
π
|
kyan |
I don't know though, I 'm not too familiar with the software involved |
21:01
π
|
DFJustin |
it's pretty common for the connection to drop at some point in a 10gb download, that would be my guess |
21:01
π
|
kyan |
DFJustin, usually wget resumes automatically I think? The download log seemed to indicate it got all the way to 100% complete |
21:02
π
|
kyan |
(although I did some downloads of big MPEG transport stream files with wget-warc (10 to 20 gb) and they seemmed to come through fine, so Im' not sure what's different) |
21:03
π
|
DFJustin |
depends on the parameters |
21:03
π
|
kyan |
and that was with a lot of dropped connections since it was to a server in greece (I think) on a flaky connection |
21:03
π
|
kyan |
Ok I guess I'll look at those items then, and compare the settings to what my script is doing |
21:04
π
|
DFJustin |
and on how the server on the other end behaves |
21:04
π
|
kyan |
see what I changed |
21:04
π
|
kyan |
I seeΓ’ΒΒ¦ I'll do some investigating :) |
21:04
π
|
kyan |
thanks! |
21:11
π
|
arkiver |
hmm |
21:11
π
|
arkiver |
has jason scott been online? |
21:11
π
|
arkiver |
haven't seen him for some time... |
21:13
π
|
DFJustin |
he was on this morning |
21:16
π
|
arkiver |
ah well |
21:16
π
|
arkiver |
I sended SketchCow some emails a while ago and I didn't get a reply yet, so I thought he maybe was on holiday or something |
21:17
π
|
DFJustin |
he's been flying back to new york I think |
21:18
π
|
DFJustin |
https://twitter.com/textfiles/status/413037877791326208/photo/1 |
21:22
π
|
arkiver |
ah, so that's why |
21:22
π
|
arkiver |
thank you, guess I just have to wait some time... :) |
21:45
π
|
m1das |
flying or sliding, it's all the same |
23:26
π
|
SketchCow |
His terrible crack addition has overtaken his responsibilities. |
23:41
π
|
dashcloud |
this might be interesting to some folks here, on the use of PhantomJS for a massive site: http://sorcery.smugmug.com/2013/12/17/using-phantomjs-at-scale/ |
23:43
π
|
BiggieJ |
i wish I undstood more about the massive node.js implementation we have at work |
23:45
π
|
Baljem |
SketchCow: ah, see, there's your mistake; try being addicted to /incredible/ crack instead. |