GeoServer : H2 Spatial Index
This page last changed on May 12, 2007 by cholmes.
H2 is currently the leading candidate for improving an embedded Java Database to be a full, simple features for sql compliant and spatially indexed database. This would be a huge win, as we would be able to ship GeoServer with it, so no one would be forced to set up any kind of backend database, they could just get started collaboratively editing. It would also make shapefiles more secure, since we could transparently transfer any uploaded shapefiles in to H2, for real, secure transactions. Most of the work on spatializing H2 is done, the remaining step is an efficient spatial index. This document will show the work that's been done on H2, and point towards how we might go about implementing a spatial index.
H2 has emerged as having the most potential for a java embedded database for a few reasons. First, it's fast. HSQL is quite fast, but H2 improves on it. Second, ACID level compliance for transactions, which HSQL lacks, but Derby has (though it's a bit slower). And third, pluggable indexes. This is a major flaw in Derby from our perspective, since spatial data is huge and we need a special spatial index to be able to efficiently use it. If we can't plug in new indexes then we have to get waist deep in the code to get our change in, which then wouldn't be easy to add on and likely wouldn't get in to their main source tree.
So the first part in getting H2 to be a spatial database is done. SpatialDBBox has done this for us. It uses JTS to handle all the hard spatial operations. The spatial db in a box pages don't have h2 bindings, but several have been done.
The easiest to get is:
You can download it and try it out. The one problem is that the license on this is a bit too restrictive for our uses. But it just uses our Spatial DB in a box stuff, so it's easy enough to replicate. It can be used for testing and experimentation until we get another nicer download that we release under an LGPL license.
There's a bit more info on a page on google group
Justin has made a GeoTools datastore that uses H2 with Spatial operations. There for he must have got it working. We need to ask him for more details. The SVN for it is at:
It looks like some of the H2 sql commands are at: http://svn.geotools.org/geotools/trunk/gt/modules/unsupported/h2/src/test/resources/org/geotools/data/h2/h2.sql
With a proper H2 Spatial you should be able to insert geometries and do operations on them:
GeomFromText is a standard Simple Features for SQL operation. That spec (version 1.1) can be found on the OGC site. See especially the section on 'Well Known Text' 3.2.5 (esp examples) the string to put in GeomFromText to create a spatial value).
A select on myFirstGeoTable should then give you a geometry (though it may just print a bunch of numbers, you may be able to get it to print numbers if you call 'toText' (though that might not work, this stuff obviously needs more documentation).
You should also be able to use the spatial functions, like
If you've got your geo table, then you should be able to do things like
So the final step is to add the spatial index. The actual implementation probably won't be all that hard, as there are a few good algorithms out there. But it will involve figuring out how to plug in a new index, and how to call it. The initial implementation should just be able to build and use a spatial index. Future advances can use it when appropriate with different spatial operations.
The spatial index wikipedia article links to several articles on different types of spatial index. It's probably worth reading up on them to get the general theory and all.
In GeoTools Indexed Shapefile implementation there are examples of varying levels of documentation, of QuadTree, which seems to be ported from Mapserver, and RTree (which may have something to do with MySQL?). These may be able to be used directly with H2, or at least borrow a decent portion of the code. I'm not sure though.
MySQL supposedly has an R-Tree implementation, not sure where the source is though.
So the key will be to figure out how to plug-in a new spatial index, and have it use the geometry types that spatial db in a box provides. The best note on this seems to be http://h2database.com/ipowerb/index.php?showtopic=212. Unfortunately the answer seems to be 'use the source', but I imagine it shouldn't be all that hard to figure out, and he points which classes to check out. Could be interesting to check out the multi-dimensional support that he talks about, but the other guy seemed to conclude it wouldn't be that useful.
So the goal is to get a spatial index working in H2. The way to use it will likely be a new operation that selects the bounding box, but uses the spatial index by default instead of iterating through all. Note sure if you can use the same type of thing in H2, but it'd be nice to have that sort of compatibility for users. See the postgis docs about how they use the spatial index. So just implement a bounding box operator that uses the spatial index, and then it can be leveraged by the other specific operations. This should likely be a part of the H2 Spatial package, with a nice alias to go with all the other functions (perhaps using && will work there as an alias)
If need be I can try to write some data tables that can be used for testing. But they just need to be large tables with spatial data all over the map. Then one just needs to run the normal operations, like 'NOT disjoint' (which is a bounding box call) and the new bounding box operation that utilizes the spatial index. The latter should be much, much faster on large tables, like orders of magnitude difference.
So this is all I've got for now. That's why it's in the RnD section. I do think there is likely some good java code from the GeoTools indexed shapefile. It works quite well with shapefiles, but it depends on an external file, so would likely have to be reworked to storing that index in H2, however it does it.
After there's a working spatial index we should modify the GeoTools H2 Datastore code to make use of it in all its spatial operations (or perhaps we can do aliases to the normal spatial operations that make them always use the bounding box spatial operation.
Some of these may be useful, things I discovered while writing this up that weren't directly relevant.
http://h2database.com/ipowerb/index.php?showtopic=341&hl=spatial (mentions multi-dimensional indexes) http://h2database.com/ipowerb/index.php?showtopic=299&hl=spatial (limiting creation of lots of extra files from big clobs) http://h2database.com/ipowerb/index.php?showtopic=243&hl=spatial (information about well known binary)
|Document generated by Confluence on May 14, 2014 23:00|