This page last changed on Mar 12, 2005 by jive.

Introduction

It is often useful to look at your spatial data in terms of a spatial database, even if your data is in a flat file (like in the case of a shapefile).

A basic SQL expression looks like this:

SELECT ...         
FROM ...          (ie. FeatureSource from a DataStore)
WHERE ...         (ie. filter spec)
GROUP BY ...      
ORDER BY ...
Get the complete picture

This document presents the high level idea - it links to the above documents where technical details are explored.

The OGC Filter spec is basically equivelent to the SQL WHERE clause - it reduces the number of rows extracted from the table in the FROM clause.

The FROM clause specifies the database table - or tables - where the actual data is gather information from. This is very much like a Geotools DataStore (FeatureSource).

The ORDER BY clause is talked about in the OGC Catalog 2.0 spec. I'm not going to talk about it here, but I have a few comments in the GROUP BY section.

I am going to talk about the SELECT clause and the GROUP BY clause.

By creating simple "virtual" datastore (or feature sources) that "wrap" around an already existing data source the functionality of the SELECT and GROUP BY clauses (used in almost all SQL statements) will be available to all Geoserver users.

I outline a plan that should be simple to implement, reasonably efficient, and require no modification of existing code (except, perhaps, to improve the function/expression abilities of the Filter).

The SELECT clause

Example of an SQL SELECT statement:

SELECT buffer(geom, river_width) as geom, 
       (num_fed_studies + num_state_studies) as nStudies,
       length(geom) as river_len
       width as width
FROM myRiverTable;

The SELECT clause tranforms an input row (taken from the FROM clause and filtered by the WHERE clause) from one form to another by calling functions. The number of rows returned are not affected.

geom|num_fed_studies|num_state_studies|width             geom|nStudies|river_len|width
--------------------------------------------             -----------------------------
LINE|       2       |        1        |  30              POLY|   3    |   600   | 30
LINE|       0       |        2        |  28              POLY|   2    |   500   | 28
LINE|       1       |        0        |  31              POLY|   1    |   400   | 31

The implementation would be a simple Datastore that wraps another datastore. When requests for features are asked from this new datastore, the Query is rewritten and sent back to the "original" datastore. The features generated from the original datastore are then tranformed into the appropriate features for the "derived" datastore.

The GROUP BY/ORDER BY clause and Aggregates

Aggregates are functions that work on a set of input rows and return a single value. The most common SQL aggregate is "count()" which counts the number of rows in the set. Aggregates are often associated with a GROUP BY clause which makes a set for each unique value of the column mentioned in the GROUP BY clause. A missing GROUP BY clause means to make a single group containing all the rows. The postgresql documentation has more details and examples.

SELECT city_name, count(*)
FROM companies;

city_name |  count
------------------
vancouver |  25        implies "SELECT count(*) FROM companies WHERE city_name = 'vancouver'" returns 25
new york  |  78        implies "SELECT count(*) FROM companies WHERE city_name = 'new york'"  returns 78
san diego |  67        implies "SELECT count(*) FROM companies WHERE city_name = 'san deigo'" returns 67

In a geometric context, the main use for aggregates is to take a set of geometries and union them together into a single (larger) geometry.

SELECT union(river_polygon)  -- union takes a group of geometries and unions them all together
FROM myRiverTable;

The Aggreggated derivived datastore functions in a way very similiar to the SELECT clause (disscussed above) - a "virtual" datastore wraps another datastore. The main difference is two fold - the data is first grouped into sets and functions reduce a set of data into a single value.

Permanent Filters

Another type of virtual datastore would be one with a perminent filter. This would be implemented in the same way as the select clause type virtual datastore - and only require query rewriting. The new query will just add an "<and>" tag with the original filter and the Permanent filter.

Chaining derived datastores together

Severeal of these "virtual" datastores can be chain together to form even more complex datasets. For example, the two above examples can be chained together. The result is three datastore - the original data (linework), the buffered data (polygons), and the unioned data (single polygon).

The FROM clause

The above techniques can be extended to produce a "join" behaviour by creating a "derived" FeatureSources based on multiple input FeatureSources. The actual join would be handled inside the virtual datastore.

This would be a significant amount of work since there are many different join strategies.

Comments

Please see here for comments on this.


select.gif (image/gif)
select.gif (image/gif)
group_by.gif (image/gif)
group_by.gif (image/gif)
chain.gif (image/gif)

These concepts are partially implemented in the complex data store developed to support "Community Schema"

Posted by rob_cto_sco at Sep 25, 2006 07:37
Document generated by Confluence on May 14, 2014 23:00