This page last changed on Aug 29, 2012 by jdeolive.

Overview

Improved vertical scalability of Catalog resources (i.e. being able to efficiently manage hundreds of thousands of layers, styles, etc).

Proposed By

Gabriel Roldán

Assigned to Release

GeoServer 2.3.x master branch.

State

Under Discussion, In Progress, Completed, Rejected, Deferred

Motivation

With the arrival of Virtual Services, Workspace Local Services, and Workspace Local Settings GeoServer gets more suited to Multitenancy and hence supporting a large number of configuration resources becomes even more important.

Prior art on this regard includes the development of the DBConfig Module, which allows to externalize the storage of the configuration objects to a RDBMS using Hibernate O/R mapping, and hence adds the ability for the Catalog to scale up to an unbounded number of workspaces, stores, layers, etc.

Regardless of the Catalog's backend ability to scale up, GeoServer itself doesn't gracefully scale as the number of config objects in the catalog increases, since given the way the current Catalog API is designed, assumptions are made that full scans and defensive copies of lists of catalog resources are cheap both in processing time and memory consumption.

This proposal aims to provide a means to solve this problem in a way that allows to progressively adopt any API change throughout the code base where the benefits are clear and measurable.

Scope

In Scope

Given a relatively large number of Catalog configuration objects:

  • Identify some exemplary use cases that result in scalability/performance bottle necks throughout the GeoServer code base;
  • Identify the needed requirements and main QA goals to satisfactorily solve the problems described in the use cases;
  • Design Catalog API enhancements that fulfill the requirements;
  • To validate the API design by providing more than one concrete backend implementation, and to upgrade the Catalog client code from the exemplary use cases.
  • To provide general guidelines on how and when to progressively adopt the new API methods.

Not in Scope

  • It is not in this proposal's scope to allow applications outside GeoServer to directly edit the backend's (RDBMS or other) configuration objects. CatalogFacade and GeoServerFacade implementations are free to use whatever storage format and mechanisms they see fit. That said, this proposal also doesn't forbid Catalog/Config backend implementations to allow for applications outside GeoServer to directly edit the configuration objects.

Use Case Drivers

The following are identified use cases in the GeoServer code base that should cover most situations where the main scalability and/or performance bottle neck is in the Catalog's client code and not in the Catalog's ability to serve large amounts of configuration objects.

  1. Secure Catalog Decorator: a full scan of Catalog resources is performed on each get*():<List> request and a separate list is built for the current user's accessible objects, even if the Catalog returns an immutable list, affecting both memory consumption and processing time.
  2. Wicket User Interface: Home page gets the whole list of workspaces, stores, and layers only to get their size. Catalog resource list pages (e.g. LayerPage, StorePage, etc) do so to a) return the iterator for the current page of data, b) obtain the full list of objects, c) obtain the filtered list of objects, d) obtain the total number of objects, e) obtain the filtered number of objects.
  3. WMS GetCapabilities: Generation of a WMS Capabilities document implies fetching the full list of layers multiple times, in order to a)filter the layer list based on the request's NAMESPACE parameter, b)calculate the layer list's aggregated bounds, c) figure out a common CRS to all the layers, and d) build an in-memory layer tree in order to nest layers based on the LayerInfo's wms "path" attribute;

Check the GSIP 69 - Use Cases page for further detail.

Requirements

In attention to the above use cases, the following list of high level requirements and QA goals shall be met by Catalog API change proposal:

  1. Filtering: Shall allow for filtering of catalog objects through arbitrary query criteria;
  2. Streaming: Shall allow for a streamed approach to catalog objects retrieval;
  3. Paging: Shall allow for paged queries. Catalog backends shall provide a consistent "natural order" of resources. Doesn't need to be based on id or any other prescribed property.
  4. Leverage query engines: Shall allow to move any in-process filtering criteria back to the backend, allowing for optimization in the common cases;
  5. Query generality: in-process filtering shall work out of the box for the general case;
  6. Compactness: API changes should be additive and minimal;
  7. Usability: Easy of use and compactness is highly desired;
  8. Incremental adoption: Shall allow for progressive/iterative adoption;
  9. Leverage sub-system cohesion: Shall introduce no external dependencies at the API level.

Proposed Catalog API extensions

The following is a summary of the API proposal. Check the GSIP 69 - API Proposal page for further detail.

In essence, the proposal lays down to a two method addition to the Catalog interface: one to obtain the count and another one to obtain a stream of Catalog configuration objects, both allowing to specify a filtering criteria through an OGC Filter predicate, as well as the ability to do paging and sorting.

  1. Iterable or Iterator: Use java.util.Iterator instead of java.lang.Iterable as query return type. It is better suited to represent a single stream of contents and helps keep the API changes to a minimum.
  2. CloseableIterator
    interface CloseableIterator<T> extends Iterator<T>, Closeable {
        @Override
        public void close();
    }
    
  3. Full Text Search: Use a special property named AnyText to indicate a query predicate is to be performed against all text properties of a given object type. The concept and approach is taken from the OGC CSW specification.
  4. OGC Filter: leverage the well known GeoTools OGC Filter constructs as GeoServer catalog subsystem's query model.
  5. Predicates utility factory methods: Static factory methods utility to build well known types of Filter instances, that Catalog backends can easily identify and translate to their native query language, and provide a convenient way of creating the most common filters through static imports.
    package org.geoserver.catalog;
    public class Predicates {
    
        public static Filter acceptAll() {...}
    
        public static Filter acceptNone() {...}
    
        public static Filter equal(String property, Object expected) {...}
    
        public static Filter contains(String property, String subsequence) {...}
    
        public static Filter and(Filter... operands) {...}
    
        public static Filter or(Filter... operands) {...}
    
        public static Filter fullTextSearch(final String subsequence) {...}
    
        public static Filter isNull(String propertyName) {...}
    
        public static SortBy sortBy(String propertyName, boolean ascending) {...}
    }
    
  6. Catalog Extensions: augment the Catalog interface with the following four methods, two of which are pure convenience ones, to enable counting the number, and obtaining a stream of Catalog objects for a given query predicate, and to enable paged queries.
    public interface Catalog {
      
        .... previous methods ...
    
        public <T extends CatalogInfo> int count(
                    Class<T> of, Filter filter);
    
        /** Convenience method to retrieve a single object, provided the filter is known to
            return at most one object **/
        public <T extends CatalogInfo> T get(
                    Class<T> type, Filter filter)
                    throws IllegalArgumentException;
    
        public <T extends CatalogInfo> CloseableIterator<T> list(
                     Class<T> of, Filter filter);
    
        public <T extends CatalogInfo> CloseableIterator<T> list(
                     Class<T> of, 
                     Filter filter, 
                     @Nullable Integer offset, 
                     @Nullable Integer count,
                     @Nullable SortBy sortOrder);
    
    }
    
  7. CatalogFacade Extensions: augment the CatalogFacade interface with three methods in order to cope up with the three main interface use cases in a generic way: a) get a single object, b) get a filtered and possibly paged list of objects, and c) compute the number of objects that satisfy a query criteria.
    public interface CatalogFacade {
        ....
        
        public <T extends CatalogInfo> int count(Class<T> of, Filter filter);
        
        public boolean canSort(Class<? extends CatalogInfo> type, String propertyName);
    
        public <T extends CatalogInfo> CloseableIterator<T> list(final Class<T> of,
                final Filter filter, @Nullable Integer offset, @Nullable Integer count,
                @Nullable SortBy sortOrder);
    }
    

API Validation

In this section two ways of validating the Catalog API extension from this proposal is presented. First, we'll migrate the code from the use cases to the new API to verify its usability and correctness. Then we'll provide a couple Catalog back end implementations to verify its implementability and effectiveness.

Migration of identified sample offending code

In this section we will go through updating the Catalog client code identified as exemplary performance/scalability offenders in the Use Cases section, to the new API, in order to validate it in terms of usability.

Please not that all the usage of Guava utility classes is anecdotal implementation detail here. No such requirement exists at the API level.

  1. Solving Use Case 1: Lower the memory footprint and CPU utilization of SecureCatalogImpl
    1. Implement new methods: Implement new methods in SecureCatalogImpl in a way that the filtering of catalog objects not accessible to the current user is pushed back to the CatalogFacade, thus avoiding double creation of in-memory list of objects and wrapping all objects in a secure decorator just to throw away the ones not needed.
      Short version:
      import static org.geoserver.catalog.Predicates.*;
      
      class SecureCatalogImpl implements Catalog {
        ....
          @Override
          public <T extends CatalogInfo> int count(Class<T> of, Filter filter) {
              Filter securityFilter = securityFilter(of, filter);
              final int count = delegate.count(of, securityFilter);
              return count;
          }
          
          @Override
          public <T extends CatalogInfo> T get(Class<T> type, Filter filter)
                  throws IllegalArgumentException {
              Filter securityFilter = securityFilter(type, filter);
              T result = delegate.get(type, securityFilter);
              return result;
          }
      
          @Override
          public <T extends CatalogInfo> CloseableIterator<T> list(Class<T> of, Filter filter) {
              return list(of, filter, null, null, null);
          }
      
          @Override
          public <T extends CatalogInfo> CloseableIterator<T> list(Class<T> of, Filter filter,
                  Integer offset, Integer count, SortBy sortBy) {
      
              Filter securityFilter = securityFilter(of, filter);
      
              CloseableIterator<T> filtered;
              filtered = delegate.list(of, securityFilter, offset, count, sortBy);
      
              // create secured decorators on-demand
              final Function<T, T> securityWrapper = securityWrapper(of);
              final CloseableIterator<T> filteredWrapped;
              filteredWrapped = CloseableIteratorAdapter.transform(filtered, securityWrapper);
      
              return filteredWrapped;
          }
      
          /**
           * @return a Function that applies a security wrapper over the catalog
           *         object given to it as input
           */
          private <T extends CatalogInfo> Function<T, T> securityWrapper(final Class<T> forClass) {
            ...
          }
      
          /**
           * Returns a predicate that checks whether the current user has access to a given object of type
           * {@code infoType}.
           */
          private <T extends CatalogInfo> Filter securityFilter(final Class<T> infoType,
                  final Filter filter) {
              ...
          }
      }
      
    2. Leverage new API: Leverage new API in SecureCatalogImpl's existing code so that current bulk query methods avoid double creation of a java.util.List and in-process filtering of current user's accessible objects.
      Short version:
      import static org.geoserver.catalog.Predicates.*;
      class SecureCatalogImpl implements Catalog {
          ...
          //BEFORE
          public List<LayerInfo> getLayers() {
              return filterLayers(user(), delegate.getLayers());
          }
          //AFTER
          public List<LayerInfo> getLayers() {
              return filterLayers(acceptAll());
          }
          //BEFORE
          public List<LayerInfo> getLayers(ResourceInfo resource) {
              return filterLayers(user(), delegate.getLayers(unwrap(resource)));
          }
          //AFTER
          public List<LayerInfo> getLayers(ResourceInfo resource) {
              return filterLayers(propertyEquals("resource.id", resource.getId()));
          }
          //BEFORE
          public List<LayerInfo> getLayers(StyleInfo style) {
              return filterLayers(user(), delegate.getLayers(style));
          }
          //AFTER
          public List<LayerInfo> getLayers(StyleInfo style) {
              Filter filter = or(
                                   propertyEquals("defaultStyle.id", style.getId()),
                                   propertyEquals("styles.id", style.getId()));
              return filterLayers(filter);
          }
          //BEFORE
          protected List<LayerInfo> filterLayers(Authentication user, 
                                                 List<LayerInfo> layers) {
              List<LayerInfo> result = new ArrayList<LayerInfo>();
              for (LayerInfo original : layers) {
                  LayerInfo secured = checkAccess(user, original);
                  if (secured != null)
                      result.add(secured);
              }
              return result;
          }
          //AFTER
          private List<LayerInfo> filterLayers(final Filter filter) {
              CloseableIterator<LayerInfo> iterator;
              iterator = list(LayerInfo.class, filter, null, null);
              try {
                  return ImmutableList.copyOf(iterator);
              } finally {
                  iterator.close();
              }
          }
          ...
      }
      
  2. Solving Use Case 2: Leverage Catalog filtering, sorting and paging on LayerPage

    BEFORE

    public class LayerProvider extends GeoServerDataProvider<LayerInfo> {
        @Override
        protected List<LayerInfo> getItems() {
            return getCatalog().getLayers();
        }   
    }
    

    AFTER

    public class LayerProvider extends GeoServerDataProvider<LayerInfo> {
    
        @Override
        protected List<LayerInfo> getItems() {
            throw new UnsupportedOperationException(
                    "This method should not be being called! "
                  + "We use the catalog streaming API");
        }
    
        @Override
        public int size() {
            return getCatalog().count(LayerInfo.class, getFilter());
        }
    
        @Override
        public int fullSize() {
            return getCatalog().count(LayerInfo.class, acceptAll());
        }
    
        @Override
        public Iterator<LayerInfo> iterator(
                              final int first, final int count) {
            Iterator<LayerInfo> iterator = filteredItems(first, count);
            if (iterator instanceof CloseableIterator) {
                // don't know how to force wicket to close the iterator, lets return
                // a copy. Shouldn't be much overhead as we're paging
                try {
                    return Lists.newArrayList(iterator).iterator();
                } finally {
                    CloseableIteratorAdapter.close(iterator);
                }
            } else {
                return iterator;
            }
        }
    
        /**
         * Returns the requested page of layer objects after applying 
         * any keyword filtering set on the page
         */
        private Iterator<LayerInfo> filteredItems(
                                 Integer first, Integer count) {
            ...
        }
    
        private Filter getFilter() {
            ...
        }
    }
    
  3. Solving Use Case 3: Leverage Catalog filtering and sorting on WMS GetCapabilities generation.

    BEFORE

            ...
            private void handleLayers() {
                start("Layer");
    
                final List<LayerInfo> layers;
    
                // filter the layers if a namespace filter has been set
                if (request.getNamespace() != null) {
                    final List<LayerInfo> allLayers = wmsConfig.getLayers();
                    layers = new ArrayList<LayerInfo>();
                    String namespace = wmsConfig.getNamespaceByPrefix(
                                                     request.getNamespace());
                    for (LayerInfo layer : allLayers) {
                        Name name = layer.getResource().getQualifiedName();
                        if (name.getNamespaceURI().equals(namespace)) {
                            layers.add(layer);
                        }
                    }
                } else {
                    layers = wmsConfig.getLayers();
                }
                ...
                handleRootBbox(layers);
                ...
                // now encode each layer individually
                LayerTree featuresLayerTree = new LayerTree(layers);
                handleLayerTree(featuresLayerTree);
                ...
                List<LayerGroupInfo> layerGroups = wmsConfig.getLayerGroups();
                handleLayerGroups(layerGroups.iterator());
                ...
                end("Layer");
            }
            private void handleLayerTree(final LayerTree layerTree) {
                ...
            }
        }
    

    AFTER

            ...
            private void handleLayers() {
                start("Layer");
    
                //ask for enabled and advertised to start with
                Filter filter;
                {
                    Filter enabled = equal("enabled", Boolean.TRUE);
                    Filter advertised = equal("advertised", Boolean.TRUE);
                    filter = Predicates.and(enabled, advertised);
                }
                
                // filter the layers if a namespace filter has been set
                if (request.getNamespace() != null) {
                    //build a query predicate for the namespace prefix
                    final String nsPrefix = request.getNamespace();
                    final String nsProp = "resource.namespace.prefix";
                    Filter equals = propertyEquals(nsProp, nsPrefix);
                    filter = Predicates.and(filter, equals);
                }
                ...
                final Catalog catalog = wmsConfig.getCatalog();
                CloseableIterator<LayerInfo> layers;
                SortBy sortOrder = Predicates.sortBy("name", true);
                
                layers = catalog.list(LayerInfo.class, filter, null, null, sortOrder);
                try{
                    handleRootBbox(layers);
                }finally{
                    layers.close();
                }
                ...
                // now encode each layer individually
                layers = catalog.list(LayerInfo.class, filter);
                try {
                    handleLayerTree(layers);
                } finally {
                    layers.close();
                }
    
                final Filter lgFilter = acceptAll();
                CloseableIterator<LayerGroupInfo> layerGroups = catalog.list(LayerGroupInfo.class, lgFilter);
                try {
                    handleLayerGroups(layerGroups);
                }finally{
                    layerGroups.close();
                }
    
                end("Layer");
            }
    

Multiple Back-End Implementations

In addition to the default Catalog implementation, a JDBC based catalog and configuration storage has been developed.

The current prototype for the JDBC backend is located at this github branch.
The jdbcconfig community module is based on the spring-jdbc framework, and utilizes a RDBMS (either H2 or PostgreSQL at the time of writing) as a key/value store with extra indices for Catalog objects 'searchable' properties.
The key on this single-table store is the object identifier and the value it's XStream representation, leveraging exactly the same serialization mechanism GeoServer uses for the on-disk catalog persistence.
This is so to minimize the maintenance costs while the Catalog and configuration object model evolves, hence having to maintain only the XStream persistence code for both the on-disk and database back ends.

API Adoption Guidelines

  1. If you need to get a count of Catalog objects, use the count method instead of getXXX().size():
     int allLayers = catalog.count(LayerInfo.class, Predicates.acceptAll());
     int workspaceLayers = catalog.count(LayerInfo.class, Predicates.equal("resource.workspace.id", workspaceId);
    
  2. If only a subset of objects is needed, consider using a Filter instead of in-process filtering:
     //BAD:
     for(LayerInfo layer : catalog.getLayers()){
      if("topp".equals(layer.getResource().getStore().getWorkspace().getName()){
        //do something with layer
      }
     }
     //GOOD:
     Filter filter = Predicates.equal("resource.store.workspace.name", "topp");
     Iterator<LayerInfo> layers = catalog.list(LayerInfo.class, filter);
     try{
       LayerInfo layer;
       while(layers.hasNext()){
         layer = layers.next();
         // do something with layer
       }
    }finally{
      CloseableIteratorAdapter.close(layers);
    }
    
  3. Push sorting to the backend:
     //BAD:
     List<StyleInfo> styles = new ArrayList<StyleInfo>(catalog.getStyles());
     Comparator<StyleInfo> comparator = new Comparator<StyleInfo>{
      @Override
      public int compare(StyleInfo s1, StyleInfo s2){
        return s1.getName().compareTo(s2.getName());
      }
     } 
     Collections.sort(styles);
     
     //GOOD:
     boolean ascending = true;
     SortBy sortOrder = Predicates.sortBy("name", ascending);
     Iterator<StyleInfo> styles = catalog.list(StyleInfo.class, acceptAll(), null, null, sortOrder);
    
  4. Use catalog backend's paging, even if what you really want is a List and not an Iterator:
     int startIndex = 50;
     int pageSize = 25;
     //BAD:
     List<LayerInfo> layers = catalog.getLayers();
     List<LayerInfo> page = layers.subList(startIndex, startIndex + pageSize);
    
     //GOOD:
     Iterator<LayerInfo> pageIterator = catalog.list(LayerInfo.class, acceptAll(), startIndex, pageSize, null);
     List<LayerInfo> page;
     try{
      page = com.google.common.collect.Lists.newArrayList(pageIterator);
     }finally{
       CloseableIteratorAdapter.close(pageIterator);
     }
    

    Feedback

This section should contain feedback provided by PSC members who may have a problem with the proposal.

Backwards Compatibility

Backwards compatibility is preserved since the API changes are additive only. All existing code using the current API will keep working untouched.

Voting

Andrea Aime: +1
Alessio Fabiani:
Ben Caradoc-Davies: +1
Gabriel Roldán: +1
Justin Deoliveira: +1
Jody Garnett: +1
Simone Giannecchini: +1

Links

Document generated by Confluence on May 14, 2014 23:00