Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Geocomputation for Health Data: New Spatial Approach - Prof. Tonny J. Oyana, Papers of Geography

The application of geocomputation techniques, such as neural networks, heuristic search, and cellular automata, in the analysis of health data sets. The authors argue that these methods, which combine the power of geographic information systems and emerging areas like neurocomputing and heuristic search, offer new possibilities for understanding health patterns and trends. The document also discusses the motivating factors behind geocomputation research, including the availability of computerized data-rich environments and the importance of spatial statistics data analysis techniques.

Typology: Papers

2009/2010

Uploaded on 02/24/2010

koofers-user-me8
koofers-user-me8 🇺🇸

10 documents

1 / 25

Toggle sidebar

Related documents


Partial preview of the text

Download Geocomputation for Health Data: New Spatial Approach - Prof. Tonny J. Oyana and more Papers Geography in PDF only on Docsity! GEOCOMPUTATION TECHNIQUES FOR SPATIAL ANALYSIS: IS IT THE CASE FOR HEALTH DATA SETS? GILBERTO CÂMARA ANTÔNIO MIGUEL VIEIRA MONTEIRO Image Processing Division, National Institute for Space Research, Brazil Abstract Geocomputation is an emerging field of research, which advocates the use of computational-intensive techniques such as neural networks, heuristic search and cellular automata for spatial data analysis. Since an increasing quantity of health-related data is collected in a geographical frame of reference, geocomputational methods show increasing potential for health data analysis. This paper presents a brief survey of the geocomputational field, including some typical applications and references for further reading. 1. Introduction In recent years, the use of computer-based techniques for spatial data analysis has grown into an important scientific area, combining techniques from geographical information systems and emerging areas such as neurocomputing, heuristic search and cellular automata. In order to distinguish this new interdisciplinary area from the simple extension of statistical techniques for spatial data, some authors (Openshaw and Abrahart, 1996) have coined the term "geocomputation" to describe of the use of computer-intensive methods for knowledge discovery in physical and human geography, especially those that employ non-conventional data clustering and analysis techniques. Lately, the term has been applied in a broader sense, to include spatial data analysis, dynamic modelling, visualisation and space-time dynamics (Longley, 1998). This paper is a brief survey of geocomputational techniques. This survey should not be considered a comprehensive one, but attempts to portray a general idea of the concepts and motivation behind the brand name of "geocomputation". Our prime motivation on this paper is to draw the attention of the public health community to the new analytical possibilities offered by geocomputational techniques. We hope this discussion serves to widen their perceptions about new possibilities in spatial analysis of health data. 2. Motivations for Research on Geocomputation Simply defined, geocomputation "is the process of applying computing technology to geographical problems". As Oppenshaw (1996) points out, "many end- users merely want answers to fairly abstract questions such as 'Are there any patterns, where are they, and what do they look like ?'". This definition, although generic, point to a number of motivating factors: the emergence of computerised data-rich environments, affordable computational power, and spatial data analysis and mining techniques. The first motivation (data-rich environments) has come about by the massive data collection of socio-economical, environmental and health information, which is increasingly organised in computerised data bases, with geographical references such as census tracts or postcode. Even in Brazil, a country with limited tradition of public availability of geographical data, the 2000 Census is being described as the first such initiative where all data collection will be automated and georeferenced. Figure 1 – Two steps in the GAM algorithm – left, initial step with smaller circles; right, later step with bigger circles. 3.2 A GAM Application for Infant Mortality in Rio In Oppenshaw (1998) a strong case is made for the performance of the GAM algorithm for locating clusters of health diseases, including a comparison with other cluster finding techniques. To better assess and understand the potentials and limitations of GAM, the authors ran an example, using data from the study "Spatial Analysis Of Live-Born Profile And Socio-economic Conditions In Rio de Janeiro", conducted by D'Orsi and Carvalho (1997). This study assessed the spatial birth and socio-economic patterns in Rio de Janeiro city districts, aiming to identify the major groups of infant morbidity and mortality risk and the selection of the primary areas for preventive programs. In order to apply the GAM algorithm, the values had to be converted from areal- related patterns to point variables. The authors selected some of the basic attributes used by D'Orsi and Carvalho, and converted each area unit (corresponding to a city district), to a point location, which received the value of the areal unit it represented, as illustrated by Figure 2. We ran the GAM algorithm on the values for the live-born quality index for all neighbourhoods of Rio. As a result, GAM found three clusters of high values of this index, located approximately at the "Botafogo", "Barra da Tijuca" and "Ilha do Governador" regions, (Figure 3). As a comparison basis, the traditional cloropleth-map is shown in Figure 4, where the areal-based values are grouped by quintiles. As expected, the results concentrated in what is perceived by the algorithm as 'extreme' events, disregarding cases which are not 'significant' enough. It should be noted that we have used the Rio birth patterns merely as an example to assess the behaviour of the GAM technique. We hope to motivate health researchers to apply the GAM techniques to problems closer to its original intended use, such as sets of epidemiological occurrences. 4. Focus 2 – Exploratory Spatial Data Analysis 4.1 Local Spatial Statistics Statistical data analysis currently is the most consistent and established set of tools to analyse spatial data sets. Nevertheless, the application of statistical techniques to spatial data faces an important challenge, as expressed in Tobler’s (1979) First Law of Geography: “everything is related to everything else, but near things are more related than distant things”. The quantitative expression of this principle is the effect of spatial dependence: the observed values will be spatially clustered, and the samples will not be independent. This phenomena, also termed spatial autocorrelation, has long being recognised as an intrinsic feature of spatial data, and measures such as the Moran coefficient and the semi-variogram plot have been used to assess the global association of the data set (Bailey and Gattrel, 1995). Figure 2- Location of Rio de Janeiro city neighbourhoods. (source: D'Orsi and Carvalho, 1997) Figure 3 - Clusters of high values APGAR index found by GAM. Figure 4 – Grouping of values of APGAR index. (source: D'Orsi and Carvalho, 1997) 44,1 - 63,4 66,4 - 69,5 69,5 - 74,4 74,4 - 77,4 77,4 - 83,3 Excluded The Moran scatterplot map is a tool for visualisation of the relationship between the observed values Z and the local mean values WZ, where Z indicates the array of attribute values (expressed as deviations from the mean), and WZ: is the array of local mean values, computed using matrix W. The association between Z and WZ can be explored to indicate the different spatial regimes associated to the data and display in a graphical form, as indicated by Figure 6 (left side). The Moran Scatterplot Map divides spatial variability into four quadrants: • Q1 (positive values, positive local mean) and Q2 (negative values, negative local means): indicate areas of positive spatial association. • Q3 (positive values, negative local means) and Q4 (negative values, positive local means): indicate areas of negative spatial association. Since the Iex variable exhibits global positive spatial autocorrelation (Moran I = 0.65, significance= 99%), areas in quadrants Q3 and Q4 are interpreted as regions that do not follow the same global process of spatial dependence and these points indicate transitional regions between two different spatial regimes. The local Moran index Ii is computed by multiplying the local normalised value zi, by the local mean (Anselin, 1995): In order to establish a significance test for the local Moran index, Anselin (1995) proposes a pseudo-distribution simulation by permutation of the attribute values among the areas. Based on this pseudo-distribution, traditional statistical tests are used to indicate local index values with significance of 95% (1,96σ), 99% (2,54σ) and 99,9% I = zi wijz jΣj zi2Σ i = 1 n (3,20σ). The 'significant' indexes are then mapped and posited as 'hot-spots' of local non-stationarity. The local Moran index significance map indicated three ‘hot spots’, two of them related to low values of inclusion (located to the South and East of the city) and one related to high values of inclusion (located in the Centre of the city). These patterns correspond to the extreme regions of poverty and wealth in the city, and were chosen as “seeds” in the zoning procedure. The remaining regions were defined interactively, taking into account the Moran scatterplot map, which clearly indicates a number of transition regions between the regions of Q1 and Q2 locations (to so-called “high-high” and “low-low” areas), some of whom are indicated by the green ellipses. These regions were grouped into separate zones. The work proceeded interactively, until a final zoning proposal was produced, which can be confronted with the current administrative regions (figure 7). In order to assess the resulting map, a regression analysis was performed. This regression analyses the correlation between the percentage of houses with proper sewage facilities (as independent variable) and the percentage of people over 70 years of age (as dependent variable). The rationale behind this choice was that social deprivation is a serious impediment for a healthy living, as measured by the percentage of old-aged population. Three OLS (ordinary least squares) regression analyses were performed: the first, taking all districts of the city overall; a second one, using the current administrative division as separate spatial regimes; and the third, using the proposed new zoning as spatial regimes. The results as summarised in Table 1. TABLE 1 – CORRELATION COEFFICIENTS FOR (OLD AGE, SEWAGE) REGRESSION IN SÃO PAULO Situation Number of spatial regimes R2 (correlation coefficient) All city districts 1 0.35 Current Zoning 11 0.72 Proposed Zoning 13 0.83 This results are a positive indication of the possible use of local spatial statistics as a basis for zoning procedures and show how indicators such as the social exclusion index of Sposati (1996) can be used as a support for urban planning. 5. Focus 3 – Neural Networks and Geographical Analysis 5.1 Introduction An Artificial Neural Network (ANN) is a computer paradigm that is inspired by the way the brain processes information. The key element of this paradigm is a processing system composed of a large number of highly interconnected elements (neurons) working in unison to solve specific problems. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process (Gopal, 1998). health analysis, a researcher may be interested in assessing the risks associated to a disease (such as malaria) based on a combination of different conditions (land use and land cover, climatology, hydrological information, and distance to main roads and cities). These conditions can be expressed as maps, which are integrated into a common geographical database by means of GIS technology. Once the data has been organised in a common geographical reference, the researcher needs to determine a procedure to combine these data sets. For example, a health researcher may posit the following inference: “calculate a malaria risk map based on disease incidence, climate, distance to cities and land cover, where the conditions are such that a region is deemed 'high risk for malaria' if it rains more that 1000 m/year and the landcover is 'inundated forest' and is located at less that 50 km from a city”. The main problem with these map inference procedures is their ad-hoc, arbitrary nature: the researcher formulates hypothesis from previous knowledge and applies them to the data set. The process relies on inductive knowledge of the reality. Additionally, when the input maps have many different conditions, the definition of combinational rules for deriving the output may turn out to be difficult. For example, if input map has 8 different conditions (e.g., land cover classes) and five maps are to be combined, then 85 (=32768) different situations have to be taken into account. There are two main alternative approaches to solve this problem. One is to use fuzzy logic to combine the maps (Câmara et al., 2000). In this case, all input data is transformed into fuzzy sets (in a [0,1] scale) and a fuzzy inference procedure may be used. Alternatively, the use of neural network techniques aims at capturing the researcher's experience, without the need for the explicit definition of the inference procedure. The application of neural networks to map integration can be done using the following steps: 1. Create a georeferenced data base with the input (conditional maps) 2. Select well-known regions as training areas. For these areas, indicate the desired output response (such as health risk). 3. Use these training areas as inputs to a neural-network learning procedure. 4. Using the trained network, apply the inference procedure for the entire study region. 5. Evaluate the result and redo the training procedure, if necessary. This idea was applied by Medeiros1 (1999) in his study of the integration of natural resources data as a basis for economical-ecological zoning in the Amazon region. Medeiros used five data sets as input: vegetation, geology, geomorphology, soils and remote sensing images. The intended output was a map of environmental vulnerability. The learning procedure is illustrated in Figure 8, where the diagram shows the five inputs and the output. Medeiros (1999) compared the result obtained by the neural network with a subjective, operator interpretation and found a very strong spatial coherence between the two maps, with the neural-produced one being more restrictive in terms of results that the subjective one. He concluded that it is feasible to apply neural networks as inference machines for integration of geographical data. 1 This work is part of a doctoral thesis, co-advised by one of the authors (Antônio Miguel Monteiro). Figure 8 - Neural Network Training Procedure. 6. Focus 4 – Dynamical Modelling The computer representation of geographical space in current GIS technology is essentially static. Therefore, one important research focus in geocomputation is aimed at producing models that combine the structural elements of space (geographical objects) to the processes that modify such space (human actions as they operate in time). Such models would liberate us from static visions of space (as centuries of map-making have conditioned us) and to emphasise the dynamic components as an essential part of geographical space. This motivation has led to the use of cellular automata as a technique for simulation of urban and regional growth. Cellular automata (CA) are very simple dynamic spatial systems in which the state of each cell in an array depends on the Therefore, the authors propose a tentative definition: "Geocomputation is the use of a set of effective computing procedures to solve geographical problems, whose results are dependent on the basic assumptions of each technique and therefore are not strictly comparable." In this view, geocomputation emphasises the fact that the structure and data- dependency inherent in spatial data can be used as part of the knowledge-discovery approaches and the choices involve theory as well as data. This view does not neglect the importance of the model-based approaches, such as the Bayesian techniques based on Monte Carlo simulation for the derivation of distribution parameters on spatial data. In fact, in this broader perspective, the use of Bayesian techniques that rely on computationally intense simulations can be considered a legitimate part of the geocomputational field of research. In conclusion, what can public health researchers expect from geocomputation ? When used with discretion, and always bearing in mind the conceptual basis of each approach, techniques such as GAM, local spatial statistics, neural nets and cellular automata can be powerful aids to a spatial data analysis researcher, trying to discover patterns in space and relations between its components. We hope this article serves as inspiration to health researchers, and hope to have extended their notions about what is possible in spatial data analysis. Further Reading For readers interested on more information on geocomputation, we provide a set of references, organised by topics. We suggest prospective readers to start with Longley (1998) and then proceed to his specific area of interest. Reference Works on Geocomputation ABRAHART,B. et al., 2000. Geocomputation Conference Series Home Page. <http://www.ashville.demon.co.uk/geocomp/index.htm> LONGLEY, P. (ed) Geocomputation: A Primer. New York, John Wiley and Sons, 1998. OPENSHAW, S. and ABRAHART, R. J., 1996. Geocomputation. Proc. 1st International Conference on GeoComputation (Ed. Abrahart, R. J.), University of Leeds, UK, pp. 665-666. OPENSHAW, S. and ABRAHART, R. J., 2000. Geocomputation. London, Taylor and Francis, 2000. OPPENSHAW,S.; OPPENSHAW,C. 1997. Artificial Intelligence and Geography. New York, John Wiley. Reference Works on Spatial Data Analysis BAILEY,T.; GATTRELL, A 1995.. Spatial Data Analysis by Example London, Longman, 1995. CARVALHO, M.S. (1997) Aplicação de Métodos de Análise Espacial na Caracterização de Áreas de Risco à Saúde . PhD Thesis in Biomedical Engineeriing, COPPE/UFRJ. <www.procc.fiocruz.br/~marilia> Papers on GAM OPENSHAW, S., 1998. Building automated Geographical Analysis and Exploration Machines. In: Geocomputation: A primer (Longley, P. A., Brooks, S. M. and Mcdonnell, B. (eds)), p95-115. Chichester, Macmillan Wiley. TURTON, I., 1998. The Geographical Analysis Machine. Univ. of Leeds, Centre for Computational Geographics. <www.ccg.leeds.ac.uk/smart/gam/gam.html> Papers on Exploratory Spatial Data Analysis ANSELIN, L., 1995. Local indicators of spatial association - LISA. Geographical Analysis, 27:91-115. ANSELIN, L.. 1996. The Moran scatterplot as ESDA tool to assess local instability in spatial association. In: Spatial Analytical Perspectives on GIS (Fisher, M.; Scholten, H. J.; Unwin, D., eds.) p. 111-126. London, Taylor & Francis. GETIS, A., ORD J. K.; 1996. "Local spatial statistics: an overview". In: Spatial Analysis: Modelling in a GIS Environment. (LONGLEY,P.; BATTY,M., eds), pp. 261- 277. New York, John Wiley. OPENSHAW, S., 1997. Developing GIS-relevant zone-based spatial analysis methods. In: In: Spatial Analysis: Modelling in a GIS Environment. (LONGLEY,P.; BATTY,M., eds), pp. 55-73. New York, John Wiley. ORD J. K.; GETIS, A., 1995. "Local spatial autocorrelation statistics: distributional issues and an application". Geographical Analysis, 27:286-306.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved