Time |
Nickname |
Message |
00:07
🔗
|
chronomex |
I've used school's-server, friend's-server, own-server. no complaints about any of them, except school's-server was kind of out of date :P |
00:10
🔗
|
soultcer |
Hehe my university's "free for all students" server sometimes gets blocked by bit.ly and tinyurl.com |
00:10
🔗
|
soultcer |
I bet somewhere some user is wondering why he can't visit bit.ly links |
00:10
🔗
|
chronomex |
that's no good |
00:10
🔗
|
chronomex |
hahah |
00:10
🔗
|
chronomex |
oh, I thought you meant as a destination when creating links |
00:11
🔗
|
soultcer |
For any interesting link there is most probably already a short url in existance at some short url service |
00:11
🔗
|
chronomex |
my school had 3 shell servers; there were two separate web servers that you ostensibly couldn't ssh into |
00:11
🔗
|
soultcer |
Well they actually allow ssh to all computers, even the desktop computers in the cs labs |
00:12
🔗
|
chronomex |
what school is this? |
00:12
🔗
|
soultcer |
University of Innsbruck, Austria |
00:12
🔗
|
chronomex |
I went to university of washington |
00:12
🔗
|
chronomex |
uw has ~40,000 students |
00:13
🔗
|
soultcer |
Who cares about students, important is the size of their IP range |
00:13
🔗
|
soultcer |
We have a /16 :D |
00:13
🔗
|
chronomex |
um, state-run uni, main research institution in the major civil division |
00:13
🔗
|
chronomex |
uw has 2 /16s and a /17 |
00:13
🔗
|
soultcer |
Oh well, time to move to Washington |
00:14
🔗
|
soultcer |
On the other hand, Free healthcare, free education, ..., hm, I rather stay at the U of Innsbruck |
00:14
🔗
|
chronomex |
:P |
00:14
🔗
|
chronomex |
if you come here I will take you out for a beer or two |
00:14
🔗
|
soultcer |
Same if you come to Innsbruck ;-) |
00:14
🔗
|
chronomex |
excellent. |
00:18
🔗
|
soultcer |
We need a corporate sponsor for the urlteam beer drinking meetup |
00:18
🔗
|
chronomex |
how about adf.ly |
00:18
🔗
|
chronomex |
they have money |
00:19
🔗
|
soultcer |
And the prize for most ridiculous url shortener goes to ... adf.ly |
00:19
🔗
|
chronomex |
yes! |
00:19
🔗
|
soultcer |
Haha I bet it will cost them a fortune when I start crawling them |
00:20
🔗
|
soultcer |
SELECT COUNT(*) FROM hostname LEFT OUTER JOIN dns_mx ON hostname.id = dns_mx.exchange LEFT OUTER JOIN dns_ns ON hostname.id = dns_ns.target LEFT OUTER JOIN dns_cname ON dns_cname.target = hostname.id LEFT OUTER JOIN dns_dname ON dns_dname.target = hostname.id LEFT OUTER JOIN dns_soa ON hostname.id = dns_soa.mname WHERE dns_mx.exchange IS NULL AND dns_ns.target IS NULL AND dns_cname.target IS NULL AND dns_dname.target IS NULL AND dns_s |
00:20
🔗
|
soultcer |
oa.mname IS NULL AND public_suffix_level(hostname) IS NULL; |
00:20
🔗
|
soultcer |
whoops |
00:23
🔗
|
chronomex |
hm, interesting query. |
00:24
🔗
|
soultcer |
It just has a lot of joins, but it's kinda boring actually |
00:24
🔗
|
chronomex |
I counter with a query of my own |
00:24
🔗
|
soultcer |
Haha, SQL Battle, GO! |
00:25
🔗
|
chronomex |
select number, name, directory.format_rule(number) from ( select trim(alles.name) as name, alles.phone as number from directory.alles where region_uid in (select uid from directory.regions where name = 'Paramaribo') and alles.name > 'A' and alles.country = 'SR' order by name asc, number asc limit 40) as names order by name asc, number asc limit 40; |
00:25
🔗
|
soultcer |
Is that from your telephone book website? |
00:26
🔗
|
chronomex |
yeah |
00:26
🔗
|
Coderjoe |
SELECT lat_degdec, lon_degdec, CO.filenumber, CO.registrationnumber, overall_HAMSL, overall_HAG, date_constructed, entity_name, CO.uniquesysident FROM (SELECT rowid from ASR_CO WHERE lat_degdec between :S AND :N AND lon_degdec between :W AND :E) LC, ASR_CO CO, ASR_RA RA, ASR_EN EN WHERE CO.rowid = LC.rowid AND CO.uniquesysident = RA.uniquesysident AND CO.uniquesysident = EN.uniquesysident AND archive_flag_code = 'C' AND en |
00:26
🔗
|
Coderjoe |
tity_type = 'O' ORDER BY overall_HAMSL DESC LIMIT 1000 |
00:26
🔗
|
chronomex |
that query runs about ten thousand times faster than a JOIN on LIKE |
00:27
🔗
|
chronomex |
(you can JOIN on LIKE, but it turns out to be ass) |
00:28
🔗
|
Coderjoe |
(from http://wegetsignal.org/asr/ ) |
00:28
🔗
|
chronomex |
nice |
00:29
🔗
|
soultcer |
Did you google that or do you actually know the wegetsignal source code? |
00:29
🔗
|
Coderjoe |
I wrote that |
00:29
🔗
|
Coderjoe |
and there is a link to the source |
00:29
🔗
|
soultcer |
Ah |
00:30
🔗
|
Coderjoe |
it actually is rather fast, thanks to creative use of indexes and that subselect |
00:31
🔗
|
Coderjoe |
with the right indexes, that subselect can work entirely within the index to create a list of rowids that fit the bounding box |
00:31
🔗
|
soultcer |
chronomex: I think your query is broken: http://numbertron.com/whitepage/SR/Paramaribo |
00:32
🔗
|
soultcer |
Coderjoe: What dbms are you using? |
00:32
🔗
|
chronomex |
yeah, paramaribo breaks consistently. I don't know why yet. |
00:32
🔗
|
Coderjoe |
that's with mysql |
00:33
🔗
|
Coderjoe |
i think there is a link to the mysqldump of the create table |
00:33
🔗
|
Coderjoe |
yep |
00:35
🔗
|
Coderjoe |
hmm |
00:35
🔗
|
soultcer |
chronomex: Where do you get your data from? Did you travel to Surinami and steal a phone book? |
00:35
🔗
|
Coderjoe |
I haven't updated the dataset in quite some time |
00:35
🔗
|
chronomex |
soultcer: I scraped their online phone directory. |
00:35
🔗
|
Coderjoe |
since october 2010 |
00:36
🔗
|
soultcer |
Does the FCC provide an SQL dump of their database? |
00:36
🔗
|
chronomex |
soultcer: I've got all of Estonia and a few small countries nobody cares about, and I'm currently about 1/3 of the way scraping denmark |
00:36
🔗
|
soultcer |
chronomex: They make it possible to just scrape it? |
00:36
🔗
|
chronomex |
soultcer: who? |
00:37
🔗
|
soultcer |
Denmark, Surinami, ... |
00:37
🔗
|
Coderjoe |
soultcer: kinda. |
00:37
🔗
|
chronomex |
oh. usually with smaller countries, the phonebooks are stupidly made such that you can put in %%%% for the name and location, and page through everything. |
00:38
🔗
|
Coderjoe |
http://wireless.fcc.gov/uls/index.htm?job=downloads |
00:38
🔗
|
chronomex |
for denmark, estonia, netherlands, etc, they often name/number person webpages sequentially so I can iterate through it more slowly |
00:38
🔗
|
soultcer |
Coderjoe: I wish every country had as much "Open Data" stuff as the USA |
00:38
🔗
|
chronomex |
soultcer: it's awesome having everything the government produces be automatically public domain |
00:39
🔗
|
Coderjoe |
not only do they have a weekly full export, they have daily transaction files |
00:39
🔗
|
Coderjoe |
so if you write things well, you can just ingest the daily updates to keep up to date |
00:39
🔗
|
chronomex |
openstreetmap has every-minute diffs so you can have a very-close-to-realtime clone :) |
00:40
🔗
|
soultcer |
Well, openstreetmap isn't government data |
00:40
🔗
|
chronomex |
soultcer: one of the best things that we have is the census bureau. they made vector street maps with address ranges of every last place that people live. |
00:40
🔗
|
chronomex |
as with any huge dataset, it has its warts -- but it's great to have anyway. |
00:40
🔗
|
Coderjoe |
it is a litle harder when you modify the data, like I do with the asr data (those degdec fields are added by me. the asr data has it as deg,min,sec,direction in separate fields) |
00:41
🔗
|
Coderjoe |
mmm |
00:41
🔗
|
Coderjoe |
tiger data |
00:41
🔗
|
soultcer |
You should try PostgreSQL with the PostGIS extension, it has special data fields for geographical coordinates |
00:41
🔗
|
chronomex |
Coderjoe: if you switch to postgresql, there's native geospatial data types ... and k-nearest-neighbor searches ... |
00:41
🔗
|
chronomex |
postgresql is fucking bomb. |
00:41
🔗
|
Coderjoe |
i hate postgresql |
00:41
🔗
|
soultcer |
Hm, I wonder if I could scrape the austrian phonebook |
00:41
🔗
|
chronomex |
why? |
00:42
🔗
|
Coderjoe |
their permissions stuff is ass |
00:42
🔗
|
chronomex |
soultcer: if it's scrapable, I'll save you the effort |
00:42
🔗
|
chronomex |
Coderjoe: hm, okay, haven't done anything needing that. |
00:43
🔗
|
Coderjoe |
a project at work needed to be able to have separate create table vs update accounts, and be able to handle mysql and postgresql. |
00:44
🔗
|
Coderjoe |
sure you can just prevent a user from having access to a particular database, but if you want to have different levels of permission within a database it becomes trouble |
00:44
🔗
|
chronomex |
http://zeppelin.xrtc.net/corp.xrtc.net/zeppelin.corp.xrtc.net/index.html => numbertron machine's munin, for the curious. |
00:44
🔗
|
Coderjoe |
as well as the wole ownership of objects thing |
00:44
🔗
|
soultcer |
chronomex: http://www.herold.at/telefonbuch/ |
00:44
🔗
|
chronomex |
I didn't know that, Coderjoe. |
00:44
🔗
|
Coderjoe |
unless you work around it, only the owner (creator) of the object can drop it |
00:44
🔗
|
soultcer |
Coderjoe: I think postgresql should be able to handle permissions on table level |
00:45
🔗
|
Coderjoe |
soultcer: it can, but you have to have the table exist before you can do permissions on it. you can't set a blanket permission on future tables. |
00:45
🔗
|
soultcer |
Probably not, no |
00:46
🔗
|
Coderjoe |
and the ability to create tables is blocked or allowed by the ability to update a schema |
00:46
🔗
|
soultcer |
chronomex: Does Feist v. Rural allow you to also take foreign phone books? |
00:46
🔗
|
Coderjoe |
schemas are their hackish way of being able to do cross-database queries |
00:46
🔗
|
chronomex |
soultcer: I don't see why not. |
00:46
🔗
|
soultcer |
Swee |
00:46
🔗
|
soultcer |
+t |
00:46
🔗
|
Coderjoe |
(all the tables have to be in the same "schema" object) |
00:46
🔗
|
chronomex |
Coderjoe: yeah, the schema thing rankles me, at least it did at first. now I put everything in the same database. |
00:47
🔗
|
Coderjoe |
see, that doesn't work for everything |
00:47
🔗
|
chronomex |
indeed. |
00:48
🔗
|
chronomex |
soultcer: this looks like a pain in the ass. |
00:48
🔗
|
soultcer |
chronomex: They sell a dvd of their data for about $50 |
00:48
🔗
|
chronomex |
http://www.herold.at/telefonbuch/wien_sieveringer-str_4/FgWMr/apartmenthotel-kaiser-franz-joseph/ <-- FgWMr is probably incremental, but it barfs if you change the details. |
00:48
🔗
|
soultcer |
But I bet it is encrypted somehow |
00:48
🔗
|
chronomex |
they're always encrypted or scrambled. |
00:49
🔗
|
soultcer |
You could run a list of the 1000 most common last names against the db |
00:49
🔗
|
Coderjoe |
yeah? did you see my complaint about the library of congress on wednesday? |
00:49
🔗
|
chronomex |
right, and then use that as seeds for other names. |
00:49
🔗
|
chronomex |
Coderjoe: I think so, yes? |
00:49
🔗
|
soultcer |
What complaint? |
00:49
🔗
|
Coderjoe |
"scumbag LoC" |
00:50
🔗
|
soultcer |
Nope, what was it about? |
00:51
🔗
|
Coderjoe |
"scumbag LoC: maintains MARC 21 on just about everything. charges thousands of dollars for access." |
00:51
🔗
|
chronomex |
dude, have you seen MARC-8? |
00:51
🔗
|
Coderjoe |
no |
00:51
🔗
|
Coderjoe |
and do you know of MARC-XML is any better than MARC-21? |
00:52
🔗
|
chronomex |
second strangest unicode encoding I've seen since UTF-EBCDIC |
00:52
🔗
|
chronomex |
better how? |
00:52
🔗
|
chronomex |
what kind of better do you want :P |
00:53
🔗
|
Coderjoe |
better for machine parsing (see http://journal.code4lib.org/articles/3832 for the problems with MARC-21) |
00:55
🔗
|
chronomex |
hm. |
00:55
🔗
|
chronomex |
welp, I have to go to work now I suppose. |
00:55
🔗
|
Coderjoe |
MARC-21 is more of a markup language than a structured data format |
00:55
🔗
|
soultcer |
Work on a Saturday Evening? |
00:55
🔗
|
Coderjoe |
I probably shoudl as well |
00:55
🔗
|
chronomex |
soultcer: I work fri-sat-sun-mon. |
00:55
🔗
|
soultcer |
Where do you work? |
00:56
🔗
|
chronomex |
retail hackerspace, http://metrixcreatespace.com/ |
00:56
🔗
|
Coderjoe |
I seem to have broken daily emails from the freenas server, and I have a few other things I can do. I need to get some more hours in as well. |
00:57
🔗
|
soultcer |
Retail Hackerspaces sound like a nice concept |
00:57
🔗
|
chronomex |
it is |
00:57
🔗
|
chronomex |
I don't know of any others though |
01:02
🔗
|
soultcer |
Well, time to go to bed |
01:02
🔗
|
soultcer |
Gn8 everyone |
01:14
🔗
|
chronomex |
g'night |
01:18
🔗
|
bsmith093 |
is anyone working on knol, its tiny, but it should still be saved, and im pretty sure my niormal wget-warc -mcpke with the ff8 useragent, wont work on a google site |
10:52
🔗
|
ersi |
bsmith093: AFAIK yes, but it's not related to urlteam afaik |
15:02
🔗
|
underscor |
bsmith094: #klol |