#urlteam 2012-03-31,Sat

↑back Search

Time	Nickname	Message
01:34 ^🔗	chronomex	hahaha, that was very weird to see in /away-log
01:35 ^🔗	chronomex	I'd add examples to the spec
01:35 ^🔗	chronomex	also en_us collation is stupid, why not a naive codepoint collation
20:41 ^🔗	soultcer	SketchCow: It's true. Without chronomex going ahead and just writing a small script and saving shorteners I would probably still sit at my desk and try to design some XML scheme for storing shorturls ;-)
20:43 ^🔗	soultcer	chronomex: I am not sure if addressing encoding is the right thing to do. We could just treat all URLs as binary data and leave it to whoever uses our releases to figure out which shortener used which encoding
20:43 ^🔗	chronomex	bytes traverse the network
20:43 ^🔗	chronomex	we record bytes
20:44 ^🔗	chronomex	I don't see how encodings figure into this
20:44 ^🔗	soultcer	And I have to admit, I don't know a lot about Unicode. It took me a while to figure out the whole collation thing.
20:45 ^🔗	soultcer	tinyarrows.com previously used Unicode characters as shortcodes (they seem to have stopped, probably because it was a dumb idea)
20:45 ^🔗	soultcer	But then again, the http header doesn't have any character encoding
20:47 ^🔗	soultcer	Error pages like http://is.gd/mBAh however do
20:48 ^🔗	chronomex	seriously man if you're being this picky, just write a goddamn warc and be done wit it
20:49 ^🔗	soultcer	Sounds like a neat idea, though it would probably eat a lot of storage
20:51 ^🔗	soultcer	Or we stick to the old format and simply ignore those (very, very, very, very) few URLs that contain a newline
21:17 ^🔗	chronomex	storage be damned, ia would appreciate it

irclogger-viewer