This page last changed on Jan 03, 2007 by aaime.

Current configuration system issues

Current configuration system is based on a set of XML files and model classes, and has proven to be lacking on both the ease of use and scalability points of view.

To make a slight modification to one of the configuration classes, say adding an attribute, one has to:

  • change the actual model class;
  • change the DTO class (which is basically a clone of the model class, but has no behaviour);
  • change the xml reader and writer to handle the new attribute.
    To make a bigger modification, a new set of classes (model, DTO, and xml reader/writers) has to be written.

Hand modification to the set of configuration files is error prone, since there is no xml ID mechanism in place that would allow a schema/dtd capable editor to prevent inconsistencies to take place.
The current data directory layout mixes data and configuration, which are different concerns indeed.

Finally, it makes it hard to cluster geoserver, since each node has to refer the same configuration.
Whilst it's possible to store the configuration on a shared folder and force refresh on each geoserver instance by using scripting, it's a less than ideal solution.

Requirements

  1. Configuration should be modular, each module should provide its extra configuration bits and eventually refer configuration elements from modules it refers to.
  2. Configuration storage should ensure consistency of the assembled configuration information (both on structural and semantic point of view).
  3. Most of the configuration system should be storage agnostic, allowing different storages to be implemented (pure file, database, remote storage service, whatever...)
  4. File storage should be provided out of the box, at least as a configuration import/export tool.
  5. File storage should be easy to check for correctness (think XML schema for example).
  6. Configuration extension/modification should be really quick, in order to allow easy experimenting with new modules or
    new features (stated with other words, configuration handling should not be so annyoing as to make developer desist on adding new features).
  7. Configuration should be cluster friendly, that is, it must be easy to setup a cluster of geoserver instances sharing the same configuration.
  8. A nice addition would be to have the configuration remotely modifiable with RCP calls.

In memory representation

There's relatively shared agreement that we want configuration to be handled as a network of java beans.
Given that different modules contribute different sets of beans and add relation among the beans, changes to the configuration may make it inconsistent.
Each module should provide its own validation procedures to ensure that each bean is valid by itself, and that correct relationships are mantained too. Configuration errors should pop up to the log and UI levels so that users can be informed of the issues and act t solve them.

Possible storage solutions

Easy file storage could be achieved using plain beans and XStream or JSON-lib. That would match easy to deal with configuration, single file storage, but unfortunately would not support xml schema/dtd for the configuration file.
It's also possible to have a single file configuration for each file, such as, geoserver.conf, wfs.conf, wms.conf instead.

Using file storage as main configuration storage could make clustering hard, thought we could expose a configuration service that allows a remote Geoserver to be configured against another running instance instead of using the local configuration files.

An alternative that would allow for easy clustering is database based configuration, that is, store everything in a central database and query the db each time there's a need for configuration information.
This would allow for good data consistency as well by the usage of foreign keys.
An O/R mapper such as hibernate could be used to map configuration beans onto the db, and generate the db schema as well.
Since we're using JDK 1.4, we cannot use hibernate annotation, but will have to rely on configuration files instead.

Possible remoting solutions

Configuration service remoting would allow for other apps to change Geoserver configuration, as well as a way to cluster Geoserver having a "master" copy of it serving configuration to other nodes.

The same interfaces used for storage indendent configuration access could be remoted and provide similar services

We may consider the following transport technologies:

  • JSON based
  • XML rcp
  • Spring remoting
    The latter is of particular interest, because it allows multiple protocols to be attacched on top of plain java service beans, thought nor JSON nor XML RCP are supported at the moment.

I have thought about this design myself. I would like to see all configuration "java bean driven" and never ever have to parse or encode xml directly. And yes, XStream looks promising. Another idea I had ties into out current spring system.

In spring all beans have "id's" which are unique in the application. On the ows branch, these bean id's are used to load files of the form <id>.xml located in the geoserver data directly, under a directory called "services" So for each bean there is a seperate file approach. I like this approach for a couple of reasons:

  1. Having a single file for all config works against lazy loading
  2. People could drop bean configurations into the "services" directory seperatley, say for a pariticular wfs configuration
  3. Gives plugins natural access to configuration, instead of having to hack into a section of a single file

So with this system, the old services.xml becomes geoserver.xml, wfs.xml, wms.xml, etc...

Posted by jdeolive at Dec 25, 2006 11:42

I just did a bit of research on DTO's, as I think the concept is quite valuable - being able to construct objects and send them over a wire - but our implementation seems to require a ton of overhead for no real gain. I came across this article: http://java.sun.com/blueprints/corej2eepatterns/Patterns/TransferObject.html
Which basically says that often DTO's don't have getter and setter methods - they are just public variables. This simplifies things greatly, we wouldn't have to add get/set methods for everything, just put in a new field.

Then the other piece we ways to actually pass this over the wire. That's one of the things lacking in our current architecture - struts I think gets some special way in - but I think it should be a way in that complete remote services can use as well (like a linked GeoNetwork should be able to construct the appropriate FeatureTypeInfo object and pass it in).

Also, though it may prove not worth the effort, it'd be nice if we had the option to load/persist config to a single file or to files for each service. Single file has advantages for letting people edit it by hand, pass it around, multiple files have advantages for plug-ins.

Justin - in your scheme does wms.xml also contain the wms specific information per featureType? Like for each featureType in a WMS you also want to define a 'style', but you don't need that if you're only using the WFS service. So where do we put that info?

Posted by cholmes at Dec 25, 2006 12:10

It would probably be a very valuable exercise to run our current Global objects through XStream and see what the resulting XML looks like. Also one other desire people have had in the past is to have a known XMLSchema for the config objects. This can aid greatly in remote configuration if we're allowing XML representations to be sent over the wire (currently the only way for a remote client to set the config is to send xml over the wire and to force a reload). But if we don't have an XMLSchema good examples may just work fine. Might also be interesting to look into JSON representations of config, making it super easy for javascript clients to set config options. Perhaps: http://oss.metaparadigm.com/jsonrpc/ might help. Could also look in to XML-RPC: http://xmlrpc.sourceforge.net/ seems like a nice library, and it also seems like it might have some similarity to XStream, to serialize XML from objects - but this would have a win in that we'd get XML-RPC for free, and it seems to also do some JSON serialization as well.

Posted by cholmes at Dec 25, 2006 12:38

OK - some other perspectives, which IMHO would lead to a reappraisal of the requirements along the lines of Chris's comments.

In general, deployment of _interoperable_ services requires conformance to a particular _profile_ of the underlying base standard. for example, a bunch of counties might deploy schools data, transport condition updates.

 Motherhood statements, but with significant implications:

Each content offering (Feature Type, Layer etc) will need to conform to at least two profiles:  the jurisdictional/enterprise view (all our services have the following metadata etc) and the content related standards (you must be able to perform these queries against this data standard).

A server delivering more than one content offering thus needs to import configuration from multiple external sources (this is even easier than writing one enourmous complicated file).

Thus, R1 seems to be not so much a requirement as an implementation strategy, and not one that fits the real set of requirements.

Now, the ability to maintain ID refs and provide meaningful reports on errors, regardless of single or multiple files seems to be a common concern.

I'd suggest replacing R1 with:

1a) Add ability to check for logical inconsistencies between configuration components

1b) Propagate errors and warnings to system manager (via UI, URL accessible config log or email in the future) in such a way that the underlying cause can be readily identified

1c) Allow importation of external XML resources (e.g. schemas, but also feature type definitions, service profiles) into a local cache, with ability to refresh from external source, saving old versions.

1d) Allow plugin services, including configuration and metadata

1e)  

The way forward would be to revisit this issue with a "separation of concerns" perspective. IMHO, we havent really got together all the usage patterns, so we're reacting to the problems inthe current system rather than designing for the future. If so, then its better to recognise its a pure band-aid and avoid overcapitalising.

It may be possible to provide a navigator/editor that traverses the links between plugged-in object configurations (ie. a single virtual XML file that doesnt need to be manually constructed from many fragments). I wouldnt like to see such an investment without more analysis of the actual requirements. I'd go back to basics and describe the Use Cases for configuration changes. The Use Case described is a very specialised one - a developer adding a piece of functionality.

For every developer, there will be many configurators, and for every deployment, many end-users.  I beleive if we had sharable configurations, the orders-of-magnitude would look like this:

developer -> data type configurator -> deployer -> user

 And the config system is pretty bad for the deployer IMHO. This is far more serious than the bit of pain a developer very occassionally needing to propagate a change.

Frankly, I'd focus on 1d) and 1b) at the moment - 1d) will tell us more about the real requirements and 1b) will tell us more about the modular internal architecture we may need.

Posted by rob_cto_sco at Dec 26, 2006 17:00

Um, hello geotools catalog interfaces?

Posted by jive at Jan 02, 2007 13:01
Document generated by Confluence on May 14, 2014 23:00