This page last changed on Jul 24, 2008 by

Issue: Regionating a 300000 feature data set takes almost 1 hour (aux_vulgaris_reynaudii on obis data set)

The basic regionating implementation as of July 3 2008 is pretty slow; in particular it tries to assign a tile to each feature in a single pass, and has memory issues as a result.  The workaround for this is to do a full sweep of the database for each tile, until the remaining feature count within a bbox is small enough to do the rest in memory.  One possible solution is to only assign the bare minimum number of features required for any given feature (this means the tile requested, plus all 'ancestors' up to the featureset's bounding box).  A pseudo code description of the proposed algorithm follows:

findFeatures(Tile xyz):
   Set s = getFromDB(xyz)
   if empty(s):
       Set parentS =
           ? emptySet()
           : findFeatures(findParent(xyz))
       if size(parentS) \!= maxPerTile:
           # since features 'spill over' from the parent tile, a non-full parent implies an empty child
           return s
       for Feature f in selectAllFeaturesFromDB(bbox = findBBox(xyz), sortby=importance):
           if !DBContains(f):
               add(s, f)
               if (size(s) = maxPerTile:
       storeToDB(s, xyz)
   return s


  • sortby=importance.  What do we do when the database can't provide the features to us in sorted order?  I suppose we'd need to use a PriorityQueue instead of a normal Set  and do a pass over the database each time.  Andrea suggested doing something clever with H2 (ie, have the CachedHierarchyRegionatingStrategy expect an object that can extract sort keys from features, then just put the keys into a temporary table and do sorted queries on that.  We can fake bbox queries by storing maxx, maxy, minx, miny in the database (and indexing on all four) and then the query will look like this:
    for bbox = [x1 y1 x2 y2] we would have 'SELECT feature FROM auxiliary WHERE maxX > x1 AND maxY > y1 AND minX < x2 AND minY < y2 ORDER BY importance;'
  • the pseudocode doesn't make it clear which operations are done in the database and which are in memory.  I've updated the pseudocode to hopefully be more clear; a Set is declared for sets stored in memory, and all requests to the database are done in methods whose names contain 'DB'.
  • What about features that span multiple tiles at the zoomlevel to which they're about to be assigned?  Should we push them up?  So far we've just assigned a tile based on the centroid, which seems to be working pretty well.  Bubbling them up the tile hierarchy seems more technically sound, but then you're not respecting the sort order...  A third option is to break it up into several geometries at the given zoomlevel, but complications arise there as well: if a feature is pushed down sufficiently deep then it will just fill the screen when it finally does show up, what about multiple descriptions, etc.  For now the centroid technique seems  like a good compromise.

Raster Overlay to Indicate Further Content Available

The problem here is that when we just hide the features that haven't been displayed thus far it's not obvious when you reach the end of the data (or to the uninitiated, that there is more data than shows up at the current zoom level).  One way to address this is to provide a raster overlay with all the data so that all the features are visible at all times.  We could play around with this; a couple of approaches have been proposed.

  • Chris suggested we show the raster version at first and push all the vector features down to the same zoom level.  That way, you always see all of the data, and if you zoom in enough you can see the clickable placemarks.
  • David suggested we always show both the raster and the placemarks, placemarks being pushed into a hierarchy as currently implemented.  This way you can always see everything and always have something clickable.  A couple of concerns about this:
    • We need to ensure that the placemarks show up on top of the raster data.
    • We need to have different rules for the vector vs. raster level of detail (raster goes away after a while, vector stays forever)
    • Probably to do this "right" we need to avoid rendering the features that are already showing up as vector data so they don't have pixelated halos.  Ie, for any tile we only render the features that appear in child tiles as raster graphics.  (This isn't very well supported by the way we're doing things; but it could be managed in a lot of cases I think.) 
    • Another option would be to have the vector graphics go away as well, and then just NOT the filter used for the vector data to produce the raster.  background. Easy to do, but a little weird in that you could zoom in too far to click on a feature.

Document generated by Confluence on May 14, 2014 23:00