This page last changed on Aug 04, 2010 by jdeolive.
This is a temporary space to start mocking up the design of a GeoServer monitoring extension.
The purpose of a monitoring extension for GeoServer is to gather data about every request made to GeoServer, and to provide both a real time and historical view of that data.
For every request made of GeoServer information is gathered and stored. These data can be grouped into categories.
Information about the request itself.
||The request uri path.
||The request uri query string.
||The http method, GET, POST, etc...
||The content sent in a PUT or POST request.
||The timestamp of the start of a request.
||The timestamp of the termination of a request.
||The ip address of the server handling the request.
||The status of a request: WAITING, RUNNING, CANCELLED, ERROR, COMPLETED
For particular types of requests (such as ows) additional information applies:
||The OWS service handling the request, WMS, WFS, WCS, etc...
||The OWS operation being requested, GetMap, GetFeature, etc...
||The OWS version being request
||The names of any layers (feature types, coverages, etc...) specified in the request
||The spatial extent specified by the request
||The spatial projection of the request
Information about the origin of a request.
||The IP address the request originated from.
||The hostname (via reverse DNS lookup) the request originated from.
||The geo location (country, state, city, etc...) the request originated from (GeoIP)
||The identify fo the user (if available) specified in the request.
Information about the response to the request.
||The size (in bytes) of the content returned in response to the request.
||The mime type of the response content
||The time stamp the response to the request started.
Information about any errors that occurred during the request.
||The primary error message of an error that occurred during request processing.
||Any specific code for the error (ows)
||The stack trace of the error
From the above request data a number of statistics can be derived.
- currently active requests
- min/max/avg request/response time
- the most frequently accessed layer
- the most recently accessed layer
- number of requests in a particular time period
And many more. Part of the monitoring extension will be a reporting system that makes this data and any generated statistics available to the user for viewing. These are numerous possibilities for how to expose such reports.
An interface integrated into the existing web tool makes sense. Such an interface could range from simply providing a tabular view (CSV) of the data to generating and providing charts and graphs.
Providing an HTTP interface allows for the retrieval of reports via third party tools and facilities the integration of monitoring into other external applications.
Request Data Storage
Naturally this sort of information must be stored in some sort of database. The UML model for a request is relatively simple.
This domain model can be mapped to a relational database schema quite easily with varying degrees of normalization. The simplest form being a single single table consisting of all attributes in a flat structure:
CREATE TABLE request (
id INT PRIMARY KEY,
The actual relational model will likely be determined by analyzing tradeoffs between INSERT, UPDATE, etc... performance and reporting requirements.
Monitoring Control Flow
The life cycle of a GeoServer request follows more or less the following flow:
The monitoring flow is aspect oriented in nature which monitoring tasks injected at various points in the above flow:
There are a number of methods available in GeoServer that can be used to latch onto various states of a request life cycle. These include servlet filters, spring handler interceptors, ows dispatcher callbacks, proxies, aspects, etc... It is likely that a number of these will be employed in order to gather all the necessary information.
One such possible approach could involve the following:
- A servlet filter at the top level that does a few things:
- gathers all the original request information such as path, query string, origin ip, etc...
- wraps the HttpRequest and HttpResponse objects to track information further down the chain
- intercepts exceptions to track errors
- calculate post process request information that is expensive to calculate such the reverse dns lookup for the hostname
- An ows callback is registered to gather ows specific information such as ows service, operation, and version. This dispatcher would also be used to track ows exceptions that will not be thrown back to the top level filter
- The HttpRequest wrapper/proxy created by the top level filter registers a custom input stream that intercepts data read from the body of the request. The amount of data intercepted will be capped to some fixed buffer size as not to blow up on memory.
- The HttpResponse wrapper/proxy created by the top level registered a custom output stream that will be used to both track the content type of the response and count the number of bytes returned by the response.
Different methods could be implemented depending on the requirements of the user.
Involves only tracking live requests, and perhaps a short of history of recent requests. For users who only care about what the server is doing now, this method applies.
- No external database requirement
- Low overhead, a synchronized memory based queue could be used to maintain request data
- No or limited request history
- Does not support multi server setup, since all data is stored locally in memory
Only information about completed requests are available, no live information is maintained. For users who only care about analyzing the request pattern of a server over a particular time span this method applies.
- Maintains all request history
- Simple database transaction model, only a single commit to database per request
- No real time requirement, database transactions could be handled by low priority thread
This method involves maintaining and persisting information in real time over the life cycle of a request. For users who want to know what the server is doing now, but also want to track information in the long term.
- Best of both worlds
- Capable of handling multi server environment, as request info is persisted in real time to an external database
- Most complex database model, involves multiple commits during the life of a request
- Highest overhead