Karen K. Kemp

Easing environmental models into GIS

Part of the difficulty in integrating environmental models and GIS is a frequent mismatch between the digital data models used in each. Similarly, there is often a conceptual mismatch between the analog data models used by environmental scientists when t hey collect real world data and the digital data models offered by GIS. This paper argues that it is necessary to develop a clear understanding of the concept and role of the data model, and to recognize that there is a broad spectrum of different kinds of data models lying between the real world and the binary representations of it, including analog, conceptual and physical variations. Each transformation along this spectrum either removes information about the phenomena being represented, or adds inf ormation to the representation. While the database literatures gives us much background about the part of the spectrum comprising conceptual to physical data models, little research has been done on analog to logical transformations, an area of critical importance if GIS is to become a successful tool for environmental scientists.


Introduction

While the mathematical modeling of physical processes has been a focus of environmental scientists for at least a half century, implementation of these models has lagged behind their conception. Solutions have, of course, been found. For example, me thods for discretizing continuous equations are now well developed and finite differencing forms an integral part of many environmental modeling toolkits. The increasingly sophisticated tools for spatial modeling and analysis provided by today's GIS are n ow leading to a new revolution in environmental modeling, one which encourages scientists to incorporate spatial processes and relationships in their models. However, the driving force for the design of most widely used GIS packages has not been environm ental science. As a result, translation of the unique spatial concepts and models which have evolved independent of GIS in the various environmental sciences is not always obvious or without misapplication.

The relationship between the real world and how it is represented in GIS has been the subject of a number of important papers at these conferences and elsewhere (cf. Burrough and Frank 1995, Couclelis 1992, Csillag 1996, Goodchild 1992 and 1993, Kemp, 199 6, Nyerges 1991 and Peuquet 1990). In 1993, this author concentrated on how physical fields, which are a fundamental concept in much of environmental science, are discretized for representation within the digital computer (Kemp 1993). We now need to att end to the full spectrum of data models used by scientists and to the relationships between these and their digital representations. This becomes increasingly important as we move toward interoperability which by definition implies that there is a finite number of generic data structures appropriate for all realities represented. We need to understand how reality relates to such generic structures and what is missing in terms of models, metadata and encapsulated procedures.

This paper reviews definitions of data models and data modeling, discusses their significance in integrating GIS and environmental models, summarizes several efforts focusing on the data model issue and suggests needed research directions.

The many definitions of "data model"

The term "data model" was coined in computer science. The original definition may be attributed to Date who defined it as "a set of defined entities and the relationships between them" (Date 1975). There is little ambiguity in this definition and those working in the field of database management clearly use the term with confidence that the meaning is broadly understood. However, within this single field, it has been necessary to identify several classes of data models as one moves from reality to the digital world (Figure 1).

Conceptual data models refer to entities in an enterprise and their relationships. Relational, network and hierachical data models are of the logical data model class. Physical data models refer to the digital structures used to or ganize and store the data within the computer.

However, although GIS textbooks frequently make reference to these standard DBMS terms, the term data model in GIS is used in a number of different ways and, as a result, confusion results. For example, data model is commonly used in GIS in the following contexts:

While all of these definitions are clearly related and similar, they are by no means synonymous. Each one addresses a similar issue, but from a different perspective. Without common agreement on what we mean, in GIS, when we say "data model", we cannot truly understand the fundamental importance of the concept.

Why do we need a better understanding of what we mean by "data model"?

Evolution in the understanding of these important issues is reflected in the hot topics arising from the three Environmental Modeling with GIS conferences hosted by the NCGIA in 1991 (Goodchild et al 1993), 1993 (Goodchild et al 1996) and 1996 (these proc eedings). Attention at the first conference was focused on the sophisticated uses being made of GIS in various environmental modeling disciplines. The second conference highlighted papers dealing with integration of data, individual GI systems and compu ter models of the environment. The third conference stressed interoperability--thus a concern with the integration of specific computing environments has been overtaken by consideration of the development of overarching theory and implementations through which everyone and every system is able to communicate with each other. The need for a common language is clear.

But why are the issues raised in discussions of data models important? While, as suggested above, there are many perspectives to the issue, all of the papers addressing the topic of data models seek to provide some further understanding of how we represe nt the world in a computer and/or database. Indeed, much progress is already being made in this direction. In (Kemp 1993), we outlined the need for a formalization of the relationship between the concept of a real world "field" and its representation in one of 6 different spatial data models (here referring to grids, polygons, TINs, contour lines, pointgrids and irregular points). Each of these representations imposes different interpretations of the continuous nature of the field and implies different techniques for interpolating values at points between those few for which data is stored in the digital database. We argued that it is necessary to retain certain information about the relationship between the reality being represented and the model use d to store it. Some of this information is implicit in the data model chosen, some of it must be explicitly stated (eg. as encapsulated operations).

In a guest editorial in International Journal of Geographical Information Systems, Burrough and Frank have mused upon the importance of understanding "the philosophical and experiential foundations of human perception of geographical phenomena and their abstraction and coding in geographical information systems" (Burrough and Frank 1995, p. 101). They consider how "geographic data models" reflect how people view the world. While they do not specifically define what they mean by their data model t erm, the discussion incorporates a consideration of spatial data paradigms and the variable aspects of representations affected by differences arising from different user communities and cultures. They conclude that:

the question arises of how one can sensibly integrate different kinds of spatial data if each has been observed, recorded, modelled and stored according to its own particular set of paradigms.... The main conclusion must be that methods of ha ndling spatial information must be linked to the paradigms of the users' disciplines and that inter-disciplinary research to determine more accomodating paradigms than the object-field models is essential. (Burrough and Frank 1995, p. 114)

What does a data model do?

If we are to uncover what we are trying to get at by using the term data model, it is useful to express what a data model is intended to do. Writing in the database management literature, Brodie suggests that:

a clear goal for a data model is that it be expressive. Using the data model, one should be able to represent any static or dynamic property of interest to the desired degree of precision in order to capture the intended meaning (Brodie et al . 1984, p. 41).
Similarly, Goodchild and others have recently written that "in essence, a data model captures the choices made by scientists and others in creating digital representations of phenomena, and thus constrains later analysis, modeling and interpretation" (Goo dchild et al. 1995, p. 10).

Data modeling is the process by which entities in the real world are discretized. While sampling the real world requires abstraction so that the natural complexity can be reduced to simple data, data models allow the addition of information to raw data. For example, the TIN model allows a network of points to be joined in such a way that sloped surfaces are represented. Thus, complexity is returned to simple data.

Therefore we suggest that the term data model be understood within the GIS context to exist across the full spectrum between the real world and its binary representation. Thus data models may include reference to any of the following:

Moving from the real world through various data models to model output requires transformations in both information structure and information content. These transformations from the real world to binary representations of it include:

Transformations take place as real world data is collected, recorded, manipulated and eventually stored in digital databases. Along this transformational path, the people who are manipulating the data transition from environmental scientist to computer an alyst to "naive" user. How much real world knowledge can be passed from each of these individuals to the next through the data itself? As the data model and, possibly, associated metadata and functionality or procedures are the media, it is critical tha t these incorporate all the relevant information.

As described earlier, the transformation from the real world through data models to binary representations involves some loss and, sometimes, recovery of information. There are a number of dimensions in the information which may be captured or lost in th e data modeling process (after Burrough and Frank 1995):

Analog data models versus digital data models

Analog data models define fundamental primitives which conceptually discretize the infinite complexity of reality. However, unlike digital data models, analog data models can be continuous and they may or may not include the same primitives used in the d ata models to represent the same phenomena digitally. Data collected in analog field data models include:

The point is that while the data may be discrete, the analog model relates these discrete values to continuous mental models of the real world.

Interoperability through data models

Data models provide entities and relationships at various levels of definition and discreteness. Interoperability requires the identification of what information is provided by the specific data models used in each computing component, a method for trans ferring that information with as little noise as possible plus, ideally, some measure of the amount of information lost in any processing stage. Interoperability through generic data models are described by papers presented in this session. Vckovski and Bucher lay out a specification through which information about data and data models can be included in the Open Geodata Interoperability Specification (OGIS). Albrecht supports interoperability by identifying a generic set of functions which operate on standardized data models.

In another major project addressing related issues, Smith and others are attempting to implement a computational modeling system (CMS) "which is intended to provide scientific investigators with a unified computational environment and easy access to a bro ad range of modeling tools" (Smith et al. 1995, p. 127). It is an impressive effort and sets in place many fundamental programming concepts necessary for interoperating modeling environments. In particular, it outlines a process by which symbolic repres entations of phenomena or concepts can be constructed from fundamental primitives and provides a means for relating these different concepts through transformations. The system allows the existence of both abstract representations, which may simply ident ify instances of a particular concept by name, and mulitiple concrete representations of these which are their various digital representations. Existing in an object oriented programming environment, concrete representations are built up inductively from primitives and previously defined super- or parent-representations.

While this CMS promises to provide an extremely powerful and flexible tool for environmental scientists conducting a modeling project, it does not adequately address the conceptual end of the data modeling process. No support is provided to assist the en vironmental scientist in formulating the abstract representations of the phenomena and concepts being studied. The system requires representations to be built up from primitives, but how does one go about identifying the concepts and their transformation s and relating them to appropriate primitive? What is still missing is a mechanism for ensuring the appropriate information is passed from the real world to the data models.

Data models and GIS research

Given all of the above, it should be clear that data models provide fundamental areas of investigation in many GIS research environments. For example, within the NCGIA's research agenda, we can find the following very different considerations of the data model problem: More specifically, the research agenda on data models from the first I-15 Specialist Meeting includes:

Conclusion

Much progress has been made in recognizing and structuring the information content of digital data models used in GIS. However, much more effort is needed in understanding analog spatial data models (i.e. the models used by environmental scientists) and their relationship to existing and future digital spatial data models.

Acknowledgements

Research at the NCGIA is supported by a grant from the National Science Foundation (SBR 88-10917).

References

Brodie, M. L., J. Mylopoulos, J. W. Schmidt. (1984). On conceptual modelling : perspectives from artificial intelligence, databases, and programming languages. New York, Springer-Verlag.

Burrough, P. A. and A. U. Frank (1995). Concepts and paradigms in spatial information: are current geographical information systems truly generic? International Journal of Geographical Information Systems 9(2): 101-116.

Couclelis, H. (1992). People manipulate objects (but cultivate fields): beyond the raster-vector debate in GIS. Theories and Methods of Spatio-Temporal Reasoning in Geographic Space. A. U. Frank, I. Campari and U. Formentini, Springer-Verlag. 639: 65-77.

Date, C. J. (1975). An Introduction to Database Systems. Reading, MA, Addison-Wesley.

Csillag, F. (1996). Variations on hierarchies: Toward linking and integrating structures. GIS and Environmental Modeling: Progress and Research Issues. M. F. Goodchild, L. T. Stayaert, B. O. Parks, C. Johnston, D. Maidment, M. Crane, S. Glendinni ng, eds. Fort Collins, CO, GIS World Books: 433-437.

Goodchild, M. F. (1992). Geographical data modeling. Computers and Geosciences 18(4): 401-408.

Goodchild, M. F. (1993). Data models and data quality: Problems and prospects. Environmental Modeling with GIS. M. F. Goodchild, B. O. Parks and L. T. Steyaert. New York, Oxford University Press: 94-103.

Goodchild, M. F., B. O. Parks and L. T. Steyaert. (1993). Environmental Modeling with GIS. New York, Oxford University Press.

Goodchild, M. F., J. E. Estes, K. Beard, T. Foresman, J. Robinson (1995). Research Initiative 15: Multiple Roles for GIS in US Global Change Research. Report of the First Specialist Meeting. Santa Barbara, CA, National Center for Geographic Infor mation and Analysis, University of California.

M. F. Goodchild, L. T. Stayaert, B. O. Parks, C. Johnston, D. Maidment, M. Crane, S. Glendinning, eds. (1996) GIS and Environmental Modeling: Progress and Research Issues. Fort Collins, CO, GIS World Books.

Kemp, K. K. (1993). Environmental Modeling with GIS: A strategy for dealing with spatial continuity, National Center for Geographic Information and Analysis, Department of Geography, University of California, Santa Barbara.

Kemp, K. K. (1996). Managing spatial continuity for integrating environmental models with GIS. GIS and Environmental Modeling: Progress and Research Issues. M. F. Goodchild, L. T. Stayaert, B. O. Parks, C. Johnston, D. Maidment, M. Crane, S. Glend inning, eds. Fort Collins, CO, GIS World Books: 339-343.

Nyerges, T. L. (1991). Geographic information abstractions: conceptual clarity for geographic modeling. Environment and Planning A 23: 1483-1499.

Peuquet, D. J. (1990). A conceptual framework and comparison of spatial data models. Introductory Readings in Geographic Information Systems. D. J. Peuquet and D. F. Marble. London and Bristol, PA, Taylor & Francis: 250-285.

Smith, T. R., J. Su, A. El Abbadi, D. Agrawal, G. Alonso, A. Saran (1995). Computation Modeling Systems. Information Systems 20(2): 127-153.


Karen K. Kemp
Assistant Director, National Center for Geographic Information and Analysis
University of California
Santa Barbara, California 93106-4060
Telephone: (805) 893-7094
FAX: (805) 893-8617