[00:07] I've used school's-server, friend's-server, own-server. no complaints about any of them, except school's-server was kind of out of date :P [00:10] Hehe my university's "free for all students" server sometimes gets blocked by bit.ly and tinyurl.com [00:10] I bet somewhere some user is wondering why he can't visit bit.ly links [00:10] that's no good [00:10] hahah [00:10] oh, I thought you meant as a destination when creating links [00:11] For any interesting link there is most probably already a short url in existance at some short url service [00:11] my school had 3 shell servers; there were two separate web servers that you ostensibly couldn't ssh into [00:11] Well they actually allow ssh to all computers, even the desktop computers in the cs labs [00:12] what school is this? [00:12] University of Innsbruck, Austria [00:12] I went to university of washington [00:12] uw has ~40,000 students [00:13] Who cares about students, important is the size of their IP range [00:13] We have a /16 :D [00:13] um, state-run uni, main research institution in the major civil division [00:13] uw has 2 /16s and a /17 [00:13] Oh well, time to move to Washington [00:14] On the other hand, Free healthcare, free education, ..., hm, I rather stay at the U of Innsbruck [00:14] :P [00:14] if you come here I will take you out for a beer or two [00:14] Same if you come to Innsbruck ;-) [00:14] excellent. [00:18] We need a corporate sponsor for the urlteam beer drinking meetup [00:18] how about adf.ly [00:18] they have money [00:19] And the prize for most ridiculous url shortener goes to ... adf.ly [00:19] yes! [00:19] Haha I bet it will cost them a fortune when I start crawling them [00:20] SELECT COUNT(*) FROM hostname LEFT OUTER JOIN dns_mx ON hostname.id = dns_mx.exchange LEFT OUTER JOIN dns_ns ON hostname.id = dns_ns.target LEFT OUTER JOIN dns_cname ON dns_cname.target = hostname.id LEFT OUTER JOIN dns_dname ON dns_dname.target = hostname.id LEFT OUTER JOIN dns_soa ON hostname.id = dns_soa.mname WHERE dns_mx.exchange IS NULL AND dns_ns.target IS NULL AND dns_cname.target IS NULL AND dns_dname.target IS NULL AND dns_s [00:20] oa.mname IS NULL AND public_suffix_level(hostname) IS NULL; [00:20] whoops [00:23] hm, interesting query. [00:24] It just has a lot of joins, but it's kinda boring actually [00:24] I counter with a query of my own [00:24] Haha, SQL Battle, GO! [00:25] select number, name, directory.format_rule(number) from ( select trim(alles.name) as name, alles.phone as number from directory.alles where region_uid in (select uid from directory.regions where name = 'Paramaribo') and alles.name > 'A' and alles.country = 'SR' order by name asc, number asc limit 40) as names order by name asc, number asc limit 40; [00:25] Is that from your telephone book website? [00:26] yeah [00:26] SELECT lat_degdec, lon_degdec, CO.filenumber, CO.registrationnumber, overall_HAMSL, overall_HAG, date_constructed, entity_name, CO.uniquesysident FROM (SELECT rowid from ASR_CO WHERE lat_degdec between :S AND :N AND lon_degdec between :W AND :E) LC, ASR_CO CO, ASR_RA RA, ASR_EN EN WHERE CO.rowid = LC.rowid AND CO.uniquesysident = RA.uniquesysident AND CO.uniquesysident = EN.uniquesysident AND archive_flag_code = 'C' AND en [00:26] tity_type = 'O' ORDER BY overall_HAMSL DESC LIMIT 1000 [00:26] that query runs about ten thousand times faster than a JOIN on LIKE [00:27] (you can JOIN on LIKE, but it turns out to be ass) [00:28] (from http://wegetsignal.org/asr/ ) [00:28] nice [00:29] Did you google that or do you actually know the wegetsignal source code? [00:29] I wrote that [00:29] and there is a link to the source [00:29] Ah [00:30] it actually is rather fast, thanks to creative use of indexes and that subselect [00:31] with the right indexes, that subselect can work entirely within the index to create a list of rowids that fit the bounding box [00:31] chronomex: I think your query is broken: http://numbertron.com/whitepage/SR/Paramaribo [00:32] Coderjoe: What dbms are you using? [00:32] yeah, paramaribo breaks consistently. I don't know why yet. [00:32] that's with mysql [00:33] i think there is a link to the mysqldump of the create table [00:33] yep [00:35] hmm [00:35] chronomex: Where do you get your data from? Did you travel to Surinami and steal a phone book? [00:35] I haven't updated the dataset in quite some time [00:35] soultcer: I scraped their online phone directory. [00:35] since october 2010 [00:36] Does the FCC provide an SQL dump of their database? [00:36] soultcer: I've got all of Estonia and a few small countries nobody cares about, and I'm currently about 1/3 of the way scraping denmark [00:36] chronomex: They make it possible to just scrape it? [00:36] soultcer: who? [00:37] Denmark, Surinami, ... [00:37] soultcer: kinda. [00:37] oh. usually with smaller countries, the phonebooks are stupidly made such that you can put in %%%% for the name and location, and page through everything. [00:38] http://wireless.fcc.gov/uls/index.htm?job=downloads [00:38] for denmark, estonia, netherlands, etc, they often name/number person webpages sequentially so I can iterate through it more slowly [00:38] Coderjoe: I wish every country had as much "Open Data" stuff as the USA [00:38] soultcer: it's awesome having everything the government produces be automatically public domain [00:39] not only do they have a weekly full export, they have daily transaction files [00:39] so if you write things well, you can just ingest the daily updates to keep up to date [00:39] openstreetmap has every-minute diffs so you can have a very-close-to-realtime clone :) [00:40] Well, openstreetmap isn't government data [00:40] soultcer: one of the best things that we have is the census bureau. they made vector street maps with address ranges of every last place that people live. [00:40] as with any huge dataset, it has its warts -- but it's great to have anyway. [00:40] it is a litle harder when you modify the data, like I do with the asr data (those degdec fields are added by me. the asr data has it as deg,min,sec,direction in separate fields) [00:41] mmm [00:41] tiger data [00:41] You should try PostgreSQL with the PostGIS extension, it has special data fields for geographical coordinates [00:41] Coderjoe: if you switch to postgresql, there's native geospatial data types ... and k-nearest-neighbor searches ... [00:41] postgresql is fucking bomb. [00:41] i hate postgresql [00:41] Hm, I wonder if I could scrape the austrian phonebook [00:41] why? [00:42] their permissions stuff is ass [00:42] soultcer: if it's scrapable, I'll save you the effort [00:42] Coderjoe: hm, okay, haven't done anything needing that. [00:43] a project at work needed to be able to have separate create table vs update accounts, and be able to handle mysql and postgresql. [00:44] sure you can just prevent a user from having access to a particular database, but if you want to have different levels of permission within a database it becomes trouble [00:44] http://zeppelin.xrtc.net/corp.xrtc.net/zeppelin.corp.xrtc.net/index.html => numbertron machine's munin, for the curious. [00:44] as well as the wole ownership of objects thing [00:44] chronomex: http://www.herold.at/telefonbuch/ [00:44] I didn't know that, Coderjoe. [00:44] unless you work around it, only the owner (creator) of the object can drop it [00:44] Coderjoe: I think postgresql should be able to handle permissions on table level [00:45] soultcer: it can, but you have to have the table exist before you can do permissions on it. you can't set a blanket permission on future tables. [00:45] Probably not, no [00:46] and the ability to create tables is blocked or allowed by the ability to update a schema [00:46] chronomex: Does Feist v. Rural allow you to also take foreign phone books? [00:46] schemas are their hackish way of being able to do cross-database queries [00:46] soultcer: I don't see why not. [00:46] Swee [00:46] +t [00:46] (all the tables have to be in the same "schema" object) [00:46] Coderjoe: yeah, the schema thing rankles me, at least it did at first. now I put everything in the same database. [00:47] see, that doesn't work for everything [00:47] indeed. [00:48] soultcer: this looks like a pain in the ass. [00:48] chronomex: They sell a dvd of their data for about $50 [00:48] http://www.herold.at/telefonbuch/wien_sieveringer-str_4/FgWMr/apartmenthotel-kaiser-franz-joseph/ <-- FgWMr is probably incremental, but it barfs if you change the details. [00:48] But I bet it is encrypted somehow [00:48] they're always encrypted or scrambled. [00:49] You could run a list of the 1000 most common last names against the db [00:49] yeah? did you see my complaint about the library of congress on wednesday? [00:49] right, and then use that as seeds for other names. [00:49] Coderjoe: I think so, yes? [00:49] What complaint? [00:49] "scumbag LoC" [00:50] Nope, what was it about? [00:51] "scumbag LoC: maintains MARC 21 on just about everything. charges thousands of dollars for access." [00:51] dude, have you seen MARC-8? [00:51] no [00:51] and do you know of MARC-XML is any better than MARC-21? [00:52] second strangest unicode encoding I've seen since UTF-EBCDIC [00:52] better how? [00:52] what kind of better do you want :P [00:53] better for machine parsing (see http://journal.code4lib.org/articles/3832 for the problems with MARC-21) [00:55] hm. [00:55] welp, I have to go to work now I suppose. [00:55] MARC-21 is more of a markup language than a structured data format [00:55] Work on a Saturday Evening? [00:55] I probably shoudl as well [00:55] soultcer: I work fri-sat-sun-mon. [00:55] Where do you work? [00:56] retail hackerspace, http://metrixcreatespace.com/ [00:56] I seem to have broken daily emails from the freenas server, and I have a few other things I can do. I need to get some more hours in as well. [00:57] Retail Hackerspaces sound like a nice concept [00:57] it is [00:57] I don't know of any others though [01:02] Well, time to go to bed [01:02] Gn8 everyone [01:14] g'night [01:18] is anyone working on knol, its tiny, but it should still be saved, and im pretty sure my niormal wget-warc -mcpke with the ff8 useragent, wont work on a google site [10:52] bsmith093: AFAIK yes, but it's not related to urlteam afaik [15:02] bsmith094: #klol