#urlteam 2011-12-11,Sun

↑back Search

Time Nickname Message
00:07 🔗 chronomex I've used school's-server, friend's-server, own-server. no complaints about any of them, except school's-server was kind of out of date :P
00:10 🔗 soultcer Hehe my university's "free for all students" server sometimes gets blocked by bit.ly and tinyurl.com
00:10 🔗 soultcer I bet somewhere some user is wondering why he can't visit bit.ly links
00:10 🔗 chronomex that's no good
00:10 🔗 chronomex hahah
00:10 🔗 chronomex oh, I thought you meant as a destination when creating links
00:11 🔗 soultcer For any interesting link there is most probably already a short url in existance at some short url service
00:11 🔗 chronomex my school had 3 shell servers; there were two separate web servers that you ostensibly couldn't ssh into
00:11 🔗 soultcer Well they actually allow ssh to all computers, even the desktop computers in the cs labs
00:12 🔗 chronomex what school is this?
00:12 🔗 soultcer University of Innsbruck, Austria
00:12 🔗 chronomex I went to university of washington
00:12 🔗 chronomex uw has ~40,000 students
00:13 🔗 soultcer Who cares about students, important is the size of their IP range
00:13 🔗 soultcer We have a /16 :D
00:13 🔗 chronomex um, state-run uni, main research institution in the major civil division
00:13 🔗 chronomex uw has 2 /16s and a /17
00:13 🔗 soultcer Oh well, time to move to Washington
00:14 🔗 soultcer On the other hand, Free healthcare, free education, ..., hm, I rather stay at the U of Innsbruck
00:14 🔗 chronomex :P
00:14 🔗 chronomex if you come here I will take you out for a beer or two
00:14 🔗 soultcer Same if you come to Innsbruck ;-)
00:14 🔗 chronomex excellent.
00:18 🔗 soultcer We need a corporate sponsor for the urlteam beer drinking meetup
00:18 🔗 chronomex how about adf.ly
00:18 🔗 chronomex they have money
00:19 🔗 soultcer And the prize for most ridiculous url shortener goes to ... adf.ly
00:19 🔗 chronomex yes!
00:19 🔗 soultcer Haha I bet it will cost them a fortune when I start crawling them
00:20 🔗 soultcer SELECT COUNT(*) FROM hostname LEFT OUTER JOIN dns_mx ON hostname.id = dns_mx.exchange LEFT OUTER JOIN dns_ns ON hostname.id = dns_ns.target LEFT OUTER JOIN dns_cname ON dns_cname.target = hostname.id LEFT OUTER JOIN dns_dname ON dns_dname.target = hostname.id LEFT OUTER JOIN dns_soa ON hostname.id = dns_soa.mname WHERE dns_mx.exchange IS NULL AND dns_ns.target IS NULL AND dns_cname.target IS NULL AND dns_dname.target IS NULL AND dns_s
00:20 🔗 soultcer oa.mname IS NULL AND public_suffix_level(hostname) IS NULL;
00:20 🔗 soultcer whoops
00:23 🔗 chronomex hm, interesting query.
00:24 🔗 soultcer It just has a lot of joins, but it's kinda boring actually
00:24 🔗 chronomex I counter with a query of my own
00:24 🔗 soultcer Haha, SQL Battle, GO!
00:25 🔗 chronomex select number, name, directory.format_rule(number) from ( select trim(alles.name) as name, alles.phone as number from directory.alles where region_uid in (select uid from directory.regions where name = 'Paramaribo') and alles.name > 'A' and alles.country = 'SR' order by name asc, number asc limit 40) as names order by name asc, number asc limit 40;
00:25 🔗 soultcer Is that from your telephone book website?
00:26 🔗 chronomex yeah
00:26 🔗 Coderjoe SELECT lat_degdec, lon_degdec, CO.filenumber, CO.registrationnumber, overall_HAMSL, overall_HAG, date_constructed, entity_name, CO.uniquesysident FROM (SELECT rowid from ASR_CO WHERE lat_degdec between :S AND :N AND lon_degdec between :W AND :E) LC, ASR_CO CO, ASR_RA RA, ASR_EN EN WHERE CO.rowid = LC.rowid AND CO.uniquesysident = RA.uniquesysident AND CO.uniquesysident = EN.uniquesysident AND archive_flag_code = 'C' AND en
00:26 🔗 Coderjoe tity_type = 'O' ORDER BY overall_HAMSL DESC LIMIT 1000
00:26 🔗 chronomex that query runs about ten thousand times faster than a JOIN on LIKE
00:27 🔗 chronomex (you can JOIN on LIKE, but it turns out to be ass)
00:28 🔗 Coderjoe (from http://wegetsignal.org/asr/ )
00:28 🔗 chronomex nice
00:29 🔗 soultcer Did you google that or do you actually know the wegetsignal source code?
00:29 🔗 Coderjoe I wrote that
00:29 🔗 Coderjoe and there is a link to the source
00:29 🔗 soultcer Ah
00:30 🔗 Coderjoe it actually is rather fast, thanks to creative use of indexes and that subselect
00:31 🔗 Coderjoe with the right indexes, that subselect can work entirely within the index to create a list of rowids that fit the bounding box
00:31 🔗 soultcer chronomex: I think your query is broken: http://numbertron.com/whitepage/SR/Paramaribo
00:32 🔗 soultcer Coderjoe: What dbms are you using?
00:32 🔗 chronomex yeah, paramaribo breaks consistently. I don't know why yet.
00:32 🔗 Coderjoe that's with mysql
00:33 🔗 Coderjoe i think there is a link to the mysqldump of the create table
00:33 🔗 Coderjoe yep
00:35 🔗 Coderjoe hmm
00:35 🔗 soultcer chronomex: Where do you get your data from? Did you travel to Surinami and steal a phone book?
00:35 🔗 Coderjoe I haven't updated the dataset in quite some time
00:35 🔗 chronomex soultcer: I scraped their online phone directory.
00:35 🔗 Coderjoe since october 2010
00:36 🔗 soultcer Does the FCC provide an SQL dump of their database?
00:36 🔗 chronomex soultcer: I've got all of Estonia and a few small countries nobody cares about, and I'm currently about 1/3 of the way scraping denmark
00:36 🔗 soultcer chronomex: They make it possible to just scrape it?
00:36 🔗 chronomex soultcer: who?
00:37 🔗 soultcer Denmark, Surinami, ...
00:37 🔗 Coderjoe soultcer: kinda.
00:37 🔗 chronomex oh. usually with smaller countries, the phonebooks are stupidly made such that you can put in %%%% for the name and location, and page through everything.
00:38 🔗 Coderjoe http://wireless.fcc.gov/uls/index.htm?job=downloads
00:38 🔗 chronomex for denmark, estonia, netherlands, etc, they often name/number person webpages sequentially so I can iterate through it more slowly
00:38 🔗 soultcer Coderjoe: I wish every country had as much "Open Data" stuff as the USA
00:38 🔗 chronomex soultcer: it's awesome having everything the government produces be automatically public domain
00:39 🔗 Coderjoe not only do they have a weekly full export, they have daily transaction files
00:39 🔗 Coderjoe so if you write things well, you can just ingest the daily updates to keep up to date
00:39 🔗 chronomex openstreetmap has every-minute diffs so you can have a very-close-to-realtime clone :)
00:40 🔗 soultcer Well, openstreetmap isn't government data
00:40 🔗 chronomex soultcer: one of the best things that we have is the census bureau. they made vector street maps with address ranges of every last place that people live.
00:40 🔗 chronomex as with any huge dataset, it has its warts -- but it's great to have anyway.
00:40 🔗 Coderjoe it is a litle harder when you modify the data, like I do with the asr data (those degdec fields are added by me. the asr data has it as deg,min,sec,direction in separate fields)
00:41 🔗 Coderjoe mmm
00:41 🔗 Coderjoe tiger data
00:41 🔗 soultcer You should try PostgreSQL with the PostGIS extension, it has special data fields for geographical coordinates
00:41 🔗 chronomex Coderjoe: if you switch to postgresql, there's native geospatial data types ... and k-nearest-neighbor searches ...
00:41 🔗 chronomex postgresql is fucking bomb.
00:41 🔗 Coderjoe i hate postgresql
00:41 🔗 soultcer Hm, I wonder if I could scrape the austrian phonebook
00:41 🔗 chronomex why?
00:42 🔗 Coderjoe their permissions stuff is ass
00:42 🔗 chronomex soultcer: if it's scrapable, I'll save you the effort
00:42 🔗 chronomex Coderjoe: hm, okay, haven't done anything needing that.
00:43 🔗 Coderjoe a project at work needed to be able to have separate create table vs update accounts, and be able to handle mysql and postgresql.
00:44 🔗 Coderjoe sure you can just prevent a user from having access to a particular database, but if you want to have different levels of permission within a database it becomes trouble
00:44 🔗 chronomex http://zeppelin.xrtc.net/corp.xrtc.net/zeppelin.corp.xrtc.net/index.html => numbertron machine's munin, for the curious.
00:44 🔗 Coderjoe as well as the wole ownership of objects thing
00:44 🔗 soultcer chronomex: http://www.herold.at/telefonbuch/
00:44 🔗 chronomex I didn't know that, Coderjoe.
00:44 🔗 Coderjoe unless you work around it, only the owner (creator) of the object can drop it
00:44 🔗 soultcer Coderjoe: I think postgresql should be able to handle permissions on table level
00:45 🔗 Coderjoe soultcer: it can, but you have to have the table exist before you can do permissions on it. you can't set a blanket permission on future tables.
00:45 🔗 soultcer Probably not, no
00:46 🔗 Coderjoe and the ability to create tables is blocked or allowed by the ability to update a schema
00:46 🔗 soultcer chronomex: Does Feist v. Rural allow you to also take foreign phone books?
00:46 🔗 Coderjoe schemas are their hackish way of being able to do cross-database queries
00:46 🔗 chronomex soultcer: I don't see why not.
00:46 🔗 soultcer Swee
00:46 🔗 soultcer +t
00:46 🔗 Coderjoe (all the tables have to be in the same "schema" object)
00:46 🔗 chronomex Coderjoe: yeah, the schema thing rankles me, at least it did at first. now I put everything in the same database.
00:47 🔗 Coderjoe see, that doesn't work for everything
00:47 🔗 chronomex indeed.
00:48 🔗 chronomex soultcer: this looks like a pain in the ass.
00:48 🔗 soultcer chronomex: They sell a dvd of their data for about $50
00:48 🔗 chronomex http://www.herold.at/telefonbuch/wien_sieveringer-str_4/FgWMr/apartmenthotel-kaiser-franz-joseph/ <-- FgWMr is probably incremental, but it barfs if you change the details.
00:48 🔗 soultcer But I bet it is encrypted somehow
00:48 🔗 chronomex they're always encrypted or scrambled.
00:49 🔗 soultcer You could run a list of the 1000 most common last names against the db
00:49 🔗 Coderjoe yeah? did you see my complaint about the library of congress on wednesday?
00:49 🔗 chronomex right, and then use that as seeds for other names.
00:49 🔗 chronomex Coderjoe: I think so, yes?
00:49 🔗 soultcer What complaint?
00:49 🔗 Coderjoe "scumbag LoC"
00:50 🔗 soultcer Nope, what was it about?
00:51 🔗 Coderjoe "scumbag LoC: maintains MARC 21 on just about everything. charges thousands of dollars for access."
00:51 🔗 chronomex dude, have you seen MARC-8?
00:51 🔗 Coderjoe no
00:51 🔗 Coderjoe and do you know of MARC-XML is any better than MARC-21?
00:52 🔗 chronomex second strangest unicode encoding I've seen since UTF-EBCDIC
00:52 🔗 chronomex better how?
00:52 🔗 chronomex what kind of better do you want :P
00:53 🔗 Coderjoe better for machine parsing (see http://journal.code4lib.org/articles/3832 for the problems with MARC-21)
00:55 🔗 chronomex hm.
00:55 🔗 chronomex welp, I have to go to work now I suppose.
00:55 🔗 Coderjoe MARC-21 is more of a markup language than a structured data format
00:55 🔗 soultcer Work on a Saturday Evening?
00:55 🔗 Coderjoe I probably shoudl as well
00:55 🔗 chronomex soultcer: I work fri-sat-sun-mon.
00:55 🔗 soultcer Where do you work?
00:56 🔗 chronomex retail hackerspace, http://metrixcreatespace.com/
00:56 🔗 Coderjoe I seem to have broken daily emails from the freenas server, and I have a few other things I can do. I need to get some more hours in as well.
00:57 🔗 soultcer Retail Hackerspaces sound like a nice concept
00:57 🔗 chronomex it is
00:57 🔗 chronomex I don't know of any others though
01:02 🔗 soultcer Well, time to go to bed
01:02 🔗 soultcer Gn8 everyone
01:14 🔗 chronomex g'night
01:18 🔗 bsmith093 is anyone working on knol, its tiny, but it should still be saved, and im pretty sure my niormal wget-warc -mcpke with the ff8 useragent, wont work on a google site
10:52 🔗 ersi bsmith093: AFAIK yes, but it's not related to urlteam afaik
15:02 🔗 underscor bsmith094: #klol

irclogger-viewer