Sunday, 6 September 2009

KML / KMZ data import into MySQL for use with GIS functions

Free open source data is great but does not always come in the right format.
For instance, geographical data can be created in Google Earth, published on the web as an overlay or set of POIs. Usually the publication format is KML or KMZ.
  • If it is KMZ (i.e. binary), it makes the data difficult to use with most (scripting) languages but... Google Earth comes to the rescue, you can make an export in KML (i.e. XML like format) and then we can have some fun.
  • KML is basically an XML like syntax that obeys a distinct XSD but does not natively play nice with MySQL GIS / geospatial data types, like points or polygons. However, with some coding love, this can be quite easily sorted thus we can get our data imported from KML to MySQL.
In this post, I will show how to import polygons defined in a KML file into MySQL, using a nice elegant language called Ruby.

For demonstration purposes, the KML polygons will be inserted into a simple table storing a string and a matching polygon definition. Here is the table definition:
CREATE TABLE `tbl_demo_polygons` (
                       `polygon_id` INT(10) NOT NULL AUTO_INCREMENT,
                       `name` TEXT NOT NULL,
                       `polygon` POLYGON NOT NULL,
                       PRIMARY KEY  (`polygon_id`),
                       SPATIAL KEY `poly_spx` (`polygon`)
                     ) ENGINE=MYISAM DEFAULT CHARSET=latin1

The strategy I adopted for this demo is to use a ruby script to generate some SQL INSERT statements which you may then use whichever way suits you.

Without further delay, here is the script:
require "rexml/document"

# we open the file as an XML document
doc ="datasource.kml")

# we get all placemark elements as an array.
all_pms = doc.elements.to_a("//Placemark")

# we create our SQL file will all the insert statements
f = "dump.sql", "w+"

# we iterate over each Placemark element
all_pms.each do |pm|
 # in my demo, the name of the polygon come from the child element name
 pc_name = pm.elements['name'].text

 # the polygon points come in 1 string by the format is not really MySQL
 raw_polygon = pm.elements['MultiGeometry/Polygon/outerBoundaryIs/LinearRing/coordinates'].text

 # we need to remove the 3rd coord of each point as we don't need a z-axis in our demo
 # we do a bit of formatting to please Mr MySQL
 pc_polygon =  raw_polygon.strip.gsub(',0','#').gsub(',',' ').gsub('#',',').strip[0..-2].split(',').collect!{|p| p.split.reverse!.join(' ')}.join(', ')
 # we concat all this nicely in 1 executable INSERT statement
 f << "INSERT INTO tbl_demo_polygons (name, polygon) VALUES ('#{pc_name}',PolygonFromText('Polygon((#{pc_polygon}))'));\n"

# we close our stream

# happy days!

This should normally yield a SQL file that move all these polygons into proper MySQL POLYGON geospatial type.

I hope you find this useful and somewhat elegant, thanks to Ruby REXML lib :-)