Expanding Usefulness of Geospatial Data Standards

Background: While many organizations see the clear advantages of establishing and using geospatial data standards, obtaining compliance from constituents continues to be a frustrating, and costly, problem. Users are frequently reluctant to adopt a new standard given the potential cost of modifying applications that use an existing, custom schema or simply having a ‘not invented here’ mentality. This was true within the Department of Defense (DoD) Installation Geospatial Information and Services (IGI&S) community, even though a standard had existed from more than 10 years. As a part of 2006 Spatial Data Standards for Facilities, Infrastructure, and Environment (SDSFIE) initiative, Zekiah developed a data standardization concept intended to provide flexibility in schema naming conventions and organization without compromising ease of data sharing, conversion, and information understanding.

The concept is based on an underlying database that contains information regarding geospatial structures, naming conventions, and certain GIS format restrictions and capabilities. This database equates to a “Platform Independent Model” (PIM) that facilitates geospatial database generations, translations, validations, and ultimately ties to the standard’s “Logical Model” for configuration management and control. Within the PIM, each Element of the geospatial dataset is recorded, along with its properties. Elements include Feature Types (the geospatial tables), Attributes/Fields, Constraints, Enumerations/ Domain Values, and Associations/Relationships. In addition, the PIM includes Configurations which are simply an arbitrary collection of PIM elements and Versions which permit tracking history of the model and facilitate translation/conversion from one version to the next.

The PIM: The secret to how the PIM can provide this flexibility is the way these elements are organized and ultimately identified. Each element is identified internally by a unique identifer (a GUID) which identifies the element and its definition. Names are not directly associated with the element, but are linked indirectly. So a feature type (the fundamental geospatial element) is identified by a featureID, which is a GUID. All other elements are similarly identified. Within the PIM, each element is organized by the version (a organization defined number), the element GUID, and a user number. These feature types have defined geometries, definitions, logical names, containers (Feature Datasets in the Esri format), a number of internal links to associate any hierarchy associated with the Logical Model, and internal PIM metadata. Attributes have data types, lengths, default values, etc. While each element has a default physical name, alternate names are permissible for alternate users for all elements. The figure (right) shows the general organization of the PIM elements.

Capability: The capability within the PIM is created through applications which perform a variety of functions. To permit ease of additional application development, an API exists which provides the interface to the database (both read and write) and converts the PIM elements into a simplified object model. The basis of the object model is the pimVersion, which contains multiple pimConfiguration objects, each of which contain one or more pimFeature objects, etc. Software exists which generates Esri Geodatabase (sde, file system, and personal), Oracle Spatial, Bentley XML, GML, and AutoDesk FDO. At the same time, software exists which can validate the compliance of a schema in a variety of formats.

But this kind of capability, while a minimal time saver, is not the reason for the success of the PIM concept. The real capability is the automated generation of ETL conversions in a number of ways.

Remember that each element within the PIM has a unique identifier that is NOT the name. However, since the unique identifier is associated with the definition, name variations are coded using the same unique identifier. So a version → elementID → user is the key. Given a name, a version, a configuration, and a user, it is possible to determine the elementID for that element. Given a different version, configuration, and/or user, it is possible to convert that elementID to the applicable name. Done at every level of the PIM, complete conversions can be generated from version A to version B, user A to user B, and/or configuration A to configuration B.

As an example, within a single version, Bob (user A) calls the feature type a ‘Road’. We want to generate a translation from Bob to Bill. But Bill calls the same feature type (the one with the same definition) a ‘Street’. Using Bob’s information and the version, we obtain the featureID for the definition. Using Bill’s information, we find that featureID becomes ‘Street’. In a similar manner, any element can be converted provided we know the application metadata for both Bob and Bill. At the same time, we can convert between a single user in multiple versions. In the case of the SDSFIE, we created conversions from Release 2.4, 2.5, 2.6, and 2.61 of the SDSFIE to the new Release 3.0 using exactly this approach.

Limitations: This is not to say that the PIM allows a free-for-all. It is necessary that the appropriate schema in the PIM is coded and tied to the model. Maintenance of any standard requires a certain amount of rigor and discipline. While not a trivial exercise, it can be done relatively easily provided the definitions associated with the users schema are known and the element is identified within the PIM. For elements which are not currently encoded in the PIM, these are added as new elements with their applicable definitions and PIM has now ‘learned’ new dataset elements.

Another limitation is the extent to which GIS formats are changing. Zekiah is currently establishing PIM capability within the PostGIS/PostgreSQL environment, and is looking at other expansions of capability. Some format manipulations do not currently exist. But the logic behind how the manipulations are done is essentially the same regardless of the GIS format or technology involved.

Conclusion: This approach has been implemented and proven in Zekiah’s long-running support of SDSFIE and well as other Federal data standards. It can be applied, however, at the state and regional levels to provide configuration management of multi-party geospatial data standards while preserving flexibility with regard to local implementations and infrastructures.

This post was written by:

Barry Schimpf

Vice President

For more information on this post, SDSFIE, Zekiah’s geospatial standards support, or our data configuration management capabilities, please e-mail us at contact@zekiah.com