Spatial modelling

#spatial #models

Spatially autocorrelated training and validation samples inflate performance assessment of convolutional neural networks Summary :

Convolutional neural networks (CNNs) are powerful tools for remote sensing applications, but their performance can be significantly impacted by spatial autocorrelation, a phenomenon where nearby observations are more similar than distant ones. When spatial autocorrelation is not accounted for in cross-validation, it can lead to over-optimistic evaluation of CNN models, giving a false impression of their generalization ability. To address this issue, spatial cross-validation techniques are employed, which create independent training and validation sets by spatially blocking or buffering observations. The authors demonstrate the effectiveness of spatial cross-validation in a case study on tree species segmentation, highlighting its importance for accurate assessment of CNN models in remote sensing applications.

(gpt4): Spatial autocorrelation can also impact the performance of decision tree algorithms in remote sensing applications. Decision trees are tree-like structures that recursively partition the feature space based on decision rules. However, when spatial autocorrelation is present, nearby observations are likely to share similar decision rules, leading to overfitting and sub-par performance on unseen data. To mitigate this issue, spatial cross-validation techniques can be applied to decision trees, similar to how they are used for CNNs. These techniques ensure that training and validation sets are spatially independent, preventing overfitting and providing a more accurate assessment of model generalization ability.

The study used a multicopter to capture RGB orthoimages of 47 forest sites. The orthoimages were created between 2017 and 2019 and cover a variety of conditions, including different illumination conditions, vegetation status, forest structural characteristics, and site characteristics. The orthoimages were then cropped into non-overlapping tiles and used to train a CNN-based segmentation model to classify each pixel in a tile into one of the target tree species. The masks used for training were created from polygons available for all targeted species, which were created with visual interpretation from imagery aided with ground observations. The entire dataset, including orthoimagery, tree-species delineations, and its metadata, is openly accessible.

The authors investigated the degree of optimism in tree species classification models trained on spatially autocorrelated training data. They found that optimism occurs across small and large sample sizes and that model regularization via data augmentation can help to reduce optimism. They evaluated different model setups with random and block cross-validation and found that block cross-validation is more effective at reducing optimism.

The authors used a variational autoencoder (VAE) to quantify the spatial autocorrelation between image tiles. The VAE was trained on a dataset of image tiles from 47 forest sites. The latent representation of each image tile was then used to calculate the correlation between the tile and its neighbors. The authors found that the spatial autocorrelationbetween image tiles was strong, especially at short distances. They also found that the spatial autocorrelation of image tiles was similar to the spatial autocorrelation of tree species cover.

Here are some of the key points from the text:

Variational autoencoders can be used to quantify the spatial autocorrelation between high-dimensional image-type observations.
The spatial autocorrelation between image tiles is strong, especially at short distances.
The spatial autocorrelation of image tiles is similar to the spatial autocorrelation of tree species cover.