Spatial Data Analysis

While classical Data Analysis assumes that the samples in the data are independent and identically distributed or iid, Spatial Data Analysis takes into account the spatial autocorrelation present in the data that states : nearby points are more similar than distant points in most cases.

Nonspatial sampling results in over-optimistic predictive models (caused by the spatial correlations) that can predict the input training data accurately but have marginal performance in terms of extrapolation (i.e., predicting patterns that have not been seen during the training process) A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling

Regular machine learning algorithms treat each location like an isolated point, only considering the features themselves to make predictions. This is like looking at individual pixels in an image without considering the bigger picture.

Spectral pixel information

In spatial data analysis, particularly when dealing with images or maps, we often use terms related to both the location and the spectral characteristics of the data points. Here's a breakdown :

  • Pixel-wise Spectral Information:
    • Pixel-wise: This refers to analyzing each individual pixel (the smallest unit) in the data. Imagine a map as an image; each tiny square on the map is a pixel.
    • Spectral Information: This refers to the specific characteristics of the data measured at each pixel. These characteristics can be related to wavelengths of light, chemical composition, or other properties that can be captured by sensors.

For example, in a satellite image, pixel-wise spectral information would tell you the specific color value (red, green, blue, etc.) for each pixel.

  • Local Spatial-Spectral Information:
    • Local: This refers to analyzing a small neighborhood around a specific pixel, not just the pixel itself. Imagine looking at a few pixels surrounding the one you're interested in, like a small box on the map.
    • Spatial-Spectral: This combines both spatial information (location) and spectral information (characteristics). It considers how the spectral properties of a pixel are related to the spectral properties of its neighbors.

Spatial statistics

This field deals with analyzing data that has a spatial component, meaning it's associated with locations. Spatial statistics look for patterns, trends, and relationships between data points based on their location.

There are two kinds of statistics, parametric or non-parametric :

  • Parametric methods assume the data follows a specific probability distribution (like a normal distribution). They rely on estimating the parameters of that distribution to understand the data.
  • Nonparametric methods make fewer assumptions about the underlying distribution of the data. They focus on directly analyzing the patterns in the data itself, without needing to fit a specific model.

The order of a data here means the number of points considered together for analysis :
First-order => 1 point aka pixel-wise
Second order => 2 points, pairs of data
Higher-order => 3+ points

By using nonparametric higher-order statistics, researchers can capture more complex spatial patterns that might not be evident with simpler methods. This can be particularly useful in fields like ecology, geology, and urban planning where understanding the spatial relationships between features is crucial.