| Time |
Nickname |
Message |
|
00:13
🔗
|
Coderjoe |
trying to find some video you saw over a year ago on youtube is like ... |
|
00:13
🔗
|
Coderjoe |
trying to find a book in the remains of a library explosion? |
|
00:13
🔗
|
Coderjoe |
(a specific book, of course) |
|
00:15
🔗
|
yipdw |
trying to find some video you saw over a year ago on YouTube is like THIS VIDEO HAS BEEN TAKEN DOWN DUE TO A COPYRIGHT CLAIM FROM UNITED MEDIA CONGLOMERATE |
|
00:18
🔗
|
Coderjoe |
well, except the video I am trying to find was entirely original content |
|
00:19
🔗
|
Ymgve |
doesn't stop someone from DMCAing it on a hunch |
|
02:48
🔗
|
underscor |
Coderjoe: Hahaha, yeah |
|
02:49
🔗
|
underscor |
Well, with something on the order of 36 hours of content uploaded a minute |
|
02:49
🔗
|
chronomex |
about 25 hours of content dmca'd a minute |
|
02:49
🔗
|
underscor |
hahaa |
|
02:50
🔗
|
yipdw |
tef: the decompression feature you added to warc2warc seems to work fine |
|
02:50
🔗
|
yipdw |
tef: I'm uploading a warc2warc'd warc to a Wayback instance to check the result |
|
02:50
🔗
|
Paradoks |
If only we could watch all the new Youtube content Back to the Future II style. |
|
02:51
🔗
|
yipdw |
damnit |
|
02:51
🔗
|
yipdw |
I tripped over my DNS server |
|
02:51
🔗
|
chronomex |
lol |
|
02:52
🔗
|
yipdw |
or more specifically its serial cable |
|
03:28
🔗
|
yipdw |
oh, hmm |
|
03:28
🔗
|
yipdw |
tef: I think warc2warc may be losing data |
|
03:28
🔗
|
dashcloud |
anyone used this filesystem before? http://www.lessfs.com/ it describes itself as: A high performance inline data deduplicating filesystem for Linux. |
|
03:29
🔗
|
yipdw |
tef: is there a place that I can send you examples? or would you prefer me to upload example WARCs somewhere?" |
|
03:30
🔗
|
tef |
thomas.figg@hanzoarchives.com |
|
03:30
🔗
|
yipdw |
ah, ok |
|
03:30
🔗
|
tef |
it *shouldn't be losing data* obviously |
|
03:30
🔗
|
yipdw |
right |
|
03:30
🔗
|
tef |
I will look at it tomorrow morning, ish. Cos it's almost 4 am |
|
03:30
🔗
|
yipdw |
I noticed, however, that one WARC record was truncated to zero bytes |
|
03:31
🔗
|
yipdw |
sure, no problem |
|
03:31
🔗
|
tef |
hmm, that would suggest something failed to decode |
|
03:31
🔗
|
yipdw |
yeah, possibly. there's not much more that can be done until I send you examples though :) |
|
03:31
🔗
|
yipdw |
so yeah, I'll write that up etc |
|
03:37
🔗
|
tef |
obv if it is easier to upload them and link them, I am all for that too |
|
03:38
🔗
|
tef |
I'm quite happy to work from 'this warc does not work as expected' |
|
03:39
🔗
|
yipdw |
sure |
|
03:39
🔗
|
yipdw |
actually, would you prefer that I file an issue in Bitbucket? |
|
03:39
🔗
|
yipdw |
none of this is really private |
|
03:41
🔗
|
tef |
i'm just happy with test data :3 |
|
03:41
🔗
|
tef |
it's more than I get a lot of the time... |
|
04:01
🔗
|
tef |
yipdw: that's perfect, thanks. I will get back to you tomorrow |
|
06:40
🔗
|
savetz |
mobileme-grab quit on me with "error downloading 'sboyack' - do I need to do anything special to make sure that gets grabbed again? |
|
06:40
🔗
|
savetz |
the reason is apparently because my hard drive filled up |
|
07:21
🔗
|
yipdw |
savetz: you can either run dld-single.sh to explicitly redownload it, or just let it linger in the queue -- it'll be requeued at some point |
|
12:03
🔗
|
Ymgve |
SketchCow: seen this? http://www.metafilter.com/111701/Putting-kickstarter-out-of-business |
|
12:03
🔗
|
Ymgve |
Might be of interest since you've used Kickstarter |
|
12:08
🔗
|
ersi |
So.. they should be put out of business because they make money? |
|
12:08
🔗
|
ersi |
They offer an umbrella, a brand and a collected spot with people that like to fling their wallets open |
|
12:09
🔗
|
Ymgve |
5% sounds a bit much, though |
|
12:10
🔗
|
ersi |
I'm not able to judge that, but I believe they take a lot of hassle out of asking people for money to do a project |
|
12:11
🔗
|
Ymgve |
true |
|
12:12
🔗
|
ersi |
It sounds like ramblings from an outsider that hasn't tried A) raising money B) tried out kickstarter |
|
14:48
🔗
|
tef |
yipdw: found the bug |
|
14:48
🔗
|
tef |
well technically it is in warc-wget :/ it is producing warc records, with http in them, that *claim* to be chunked, and aren't |
|
14:51
🔗
|
tef |
I can put a work around in that broken http messages are left untouched. I'm not sure what is the right thing to do when you get a transfer-encoded:chunked with no chunks |
|
14:56
🔗
|
tef |
fwiw - if you put the chunks back in, -D will clean them |
|
17:13
🔗
|
savetz |
the MobileMe grabber has been crashing on me in the last 24 hours. is apple being weird? last time= ERROR (3) |
|
17:28
🔗
|
yipdw |
tef: got your message; that's some weird wget-warc behavior |
|
17:29
🔗
|
yipdw |
I'll check what's actually coming back from the server |
|
17:32
🔗
|
Coderjoe_ |
yipdw: I think I know why wget-warc is doing it. I think it snags the result after wget has handled transfer-encoding, rather than the raw network stream. |
|
17:33
🔗
|
Coderjoe_ |
ISTR it hooks a temporary file or something into the output code |
|
17:34
🔗
|
yipdw |
oh, so the response really is chunked but the warc code isn't able to write the chunks? |
|
17:34
🔗
|
yipdw |
hm |
|
17:34
🔗
|
yipdw |
one sec, I'm gonna see what curl does for some of these URIs |
|
17:34
🔗
|
yipdw |
I haven't actually examined what comes back from these servers too closely |
|
17:36
🔗
|
Coderjoe |
I'm fairly sure that is correct |
|
17:37
🔗
|
yipdw |
well, yeah, the response from ff.net is definitely marked as chunked |
|
17:38
🔗
|
yipdw |
hmm |
|
17:39
🔗
|
yipdw |
I wonder if there's a way to get precisely what wget gets |
|
17:40
🔗
|
Coderjoe |
yeah. move the hook into the network handling code, before the transfer-encoding handler (or perhaps IN the transfer-encoding handler?) |
|
17:41
🔗
|
yipdw |
from the way the WARC code is structured, that doesn't look like it's a trivial job |
|
17:41
🔗
|
yipdw |
or rather it looks like it was designed to sit at arm's length from the rest of wget |
|
17:42
🔗
|
yipdw |
but I dunno, I've only skimmed over said code |
|
17:42
🔗
|
yipdw |
alard would know best |
|
17:46
🔗
|
yipdw |
damn |
|
17:46
🔗
|
yipdw |
just been reading the gethttp function in wget's http.c |
|
17:47
🔗
|
yipdw |
that could really use a state machine formalization |
|
17:47
🔗
|
yipdw |
IMO, anyway, but I've been told that I'm weird |
|
17:56
🔗
|
yipdw |
on a lighter note |
|
17:56
🔗
|
yipdw |
http://tctechcrunch2011.files.wordpress.com/2011/11/screen-shot-2011-11-20-at-8-44-25-pm1.png?w=620 |
|
18:35
🔗
|
tef |
warcs are meant to be *raw* traffic somewhat |
|
18:36
🔗
|
tef |
well really - delete any transfer-encoding headers and add a content-length |
|
18:36
🔗
|
tef |
or put the raw traffic in |
|
18:36
🔗
|
tef |
hmm |
|
18:37
🔗
|
savetz |
my mobile-me downloader is crashing constantly. is it me? ERROR (3)s all the time |
|
18:45
🔗
|
SketchCow |
Ymgve: That guy is an idiot. |
|
18:45
🔗
|
SketchCow |
Kickstarter does SO MUCH MORE for the 5% |
|
18:57
🔗
|
balrog |
hey SketchCow |
|
18:57
🔗
|
balrog |
off the top of your head, what would be useful metadata fields for a floppy archival format? |
|
18:58
🔗
|
tef |
yipdw: I think it would be easier to strip out the headers that don't apply - like transfer-encoding |
|
19:08
🔗
|
Coderjoe |
one post: Short version: "I don't like something and don't want to use it, so therefore it should be made to go away." Oh FFS... |
|
19:43
🔗
|
SketchCow |
Oh, god. |
|
19:44
🔗
|
SketchCow |
OK, in short form: There is SO MUCH FUCKING WORK being done on that, the website would REALLY be improved with links to all that. |
|
19:44
🔗
|
SketchCow |
So when I get back from the interview I'm doing, let me find that. |
|
19:44
🔗
|
SketchCow |
I want to do some actualy archive team business today, so look forward to that. |
|
19:51
🔗
|
ersi |
Booya! |
|
20:19
🔗
|
yipdw |
tef: I'll run that by alard, he's more familiar with wget's WARC code than me |
|
20:19
🔗
|
yipdw |
if there ends up being an easy way to hook into what's coming over the wire that'd be worth trying too |
|
20:20
🔗
|
tef |
yes |
|
20:23
🔗
|
yipdw |
this'll be fun, though; I know that people in this channel have generated several terabytes of WARCs that may have invalid records :P |
|
20:23
🔗
|
yipdw |
so going forward we'll need some way to fix that -- maybe stripping out Transfer-Encoding would be easiest |
|
20:24
🔗
|
yipdw |
guess I'll poke around the warc-tools source and see how hard it'd be to write a tool like that |
|
20:24
🔗
|
yipdw |
and I guess this would be an excuse to break out the Hadoop or something |
|
20:40
🔗
|
chronomex |
fix-brokenass-shit.sh |
|
20:55
🔗
|
yipdw |
chronomex: nah |
|
20:55
🔗
|
yipdw |
fix-brokenass-shit.rb if I'm writing it, obviously |
|
20:55
🔗
|
yipdw |
or maybe it can be FixBrokenassShit.hs and then everyone will be angry |
|
20:59
🔗
|
underscor |
lol |
|
21:10
🔗
|
chronomex |
hHaskell? Excellent. |
|
21:11
🔗
|
chronomex |
you can make more people mad by writing it in Erlang. |
|
21:11
🔗
|
chronomex |
YOU DON'T NEED CONCURRENCY FOR THAT, WTF MAN |
|
21:15
🔗
|
yipdw |
yes |
|
21:15
🔗
|
yipdw |
one Erlang process per WARC |
|
21:16
🔗
|
yipdw |
actually, that would probably scale pretty well |
|
21:30
🔗
|
tef |
yipdw: I can hack warc2warc to do it |
|
21:32
🔗
|
tef |
it's literally putting a content = content.replace("Transfer-Encoding: chunked\r\n","") or similar iirc |
|
21:32
🔗
|
tef |
but i'd rather not a specific wget fix in the trunk, but it is easy to fork |
|
21:33
🔗
|
tef |
I guess I can add it as an option 'sniff: chunking' |
|
21:33
🔗
|
tef |
or -W for --wget-workaround |
|
21:33
🔗
|
tef |
or something |
|
21:35
🔗
|
yipdw |
tef: I was just going to write something for use by people here |
|
21:35
🔗
|
yipdw |
as I think it's got the largest concentration of people who have used wget's WARC writer |
|
21:35
🔗
|
yipdw |
or, really, just get something that SketchCow (or whoever) can run on a large batch of WARCs |
|
21:35
🔗
|
yipdw |
and concurrently fix up wget's WARC writer to do The Right Thing |
|
21:36
🔗
|
yipdw |
but, yeah, a fork works too |
|
21:43
🔗
|
closure |
http://blog.archive.org/2012/01/17/12-hours-dark-internet-archive-vs-censorship/ |
|
21:44
🔗
|
yipdw |
I didn't know IA was blacklisted in China |
|
21:44
🔗
|
yipdw |
though I shouldn't be surprised |
|
21:58
🔗
|
chronomex |
ditto |
|
22:02
🔗
|
yipdw |
http://abcnews.go.com/Technology/wikipedia-blackout-websites-wikipedia-reddit-dark-wednesday-protest/story?id=15373251#.TxXwA2OXRN0 |
|
22:02
🔗
|
yipdw |
what crappy lead graphics |
|
22:03
🔗
|
yipdw |
what sort of two-bit hack graphic artists work for the major news media these days? |
|
22:18
🔗
|
yipdw |
ha, and it's awesome seeing what Rupert Murdoch is tweeting |
|
22:18
🔗
|
yipdw |
it seems that Google, a software company, is out to destroy "software creators" |
|
22:18
🔗
|
tef |
yipdw: i'll just make a wget-warc-clean.py or something |
|
22:18
🔗
|
tef |
i tend to avoid names like unfuck |
|
22:18
🔗
|
tef |
:3 |
|
22:18
🔗
|
yipdw |
that man is like Gene Ray, Cubic, but with money |
|
22:18
🔗
|
yipdw |
tef: heh, good thing |
|
22:25
🔗
|
NovaKing |
http://torrentfreak.com/mpaa-internet-blackout-is-a-pr-stunt-users-are-corporate-pawns-120117/ |
|
22:27
🔗
|
yipdw |
well, yeah, it is a PR stunt |
|
22:27
🔗
|
yipdw |
good to know they can state the obvious |
|
22:27
🔗
|
Coderjoe |
bleh. fix-broken-shit.py |
|
22:27
🔗
|
tef |
work repo |
|
22:28
🔗
|
yipdw |
wait |
|
22:28
🔗
|
yipdw |
"The following is a statement by Senator Chris Dodd, Chairman and CEO of the Motion Picture Association of America, Inc. (MPAA)" |
|
22:28
🔗
|
yipdw |
I didn't know corruption had gone tha far! |
|
22:28
🔗
|
yipdw |
+t |
|
22:29
🔗
|
yipdw |
man, talk about cutting out the middleman |
|
22:29
🔗
|
yipdw |
why buy off legislature when you can just install someone to do your work |
|
22:30
🔗
|
chronomex |
because legislature is cheap |
|
22:30
🔗
|
chronomex |
$5000ish |
|
22:30
🔗
|
Coderjoe |
(note, when I said that, I had just read the ruby/haskell/erlang part of my log) |
|
22:30
🔗
|
yipdw |
chronomex: so are some Indian software houses, but that doesn't mean you'll get what you want |
|
22:31
🔗
|
chronomex |
yipdw: like legislature! |
|
22:31
🔗
|
Coderjoe |
it freaks me out when I call the bank credit card number and get someone in india. |
|
22:33
🔗
|
yipdw |
I think that means you need a new bank |
|
22:34
🔗
|
Coderjoe |
yeah... I have to wonder if that's actually how that card got compromised in the first place |
|
22:36
🔗
|
chronomex |
lol |
|
22:36
🔗
|
Coderjoe |
i've been working on leaving them. |
|
22:36
🔗
|
Coderjoe |
but how do I know if a bank I am looking at has outsourced their calls? |
|
22:37
🔗
|
Coderjoe |
and to where |
|
22:38
🔗
|
Coderjoe |
(this is also one of the largest banks in the world we're talking about) |
|
22:40
🔗
|
closure |
Coderjoe: didn't know you were into haskell |
|
22:42
🔗
|
Coderjoe |
I'm curious where I gave the impression I was |
|
22:42
🔗
|
closure |
"ruby/haskell/erlang" above |
|
22:43
🔗
|
Coderjoe |
closure: yipdw is the one that mention haskell. I was reading my channel scrollback |
|
22:43
🔗
|
yipdw |
Coderjoe eats GADTs for breakfast |
|
22:43
🔗
|
Coderjoe |
s/(mention)/\1ed/ |
|
22:45
🔗
|
closure |
>>= and <*> would be fun breakfast cereal indeed |
|
22:47
🔗
|
yipdw |
cherri (.) s |
|
23:18
🔗
|
SketchCow |
Downloading the Sound of Young America - we'll have that going up today. |
|
23:18
🔗
|
SketchCow |
Awww yeah |