Spatial correlation of errors

Spatial correlation of errors#

Digital elevation models have errors that are spatially correlated due to instrument or processing effects. Here, we rely on a non-stationary spatial statistics framework to estimate and model spatial correlations in elevation error. We use a sum of variogram forms to model this correlation, with stable terrain as an error proxy for moving terrain.

References: Rolstad et al. (2009), Hugonnet et al. (2022).

import geoutils as gu

import xdem

We load a difference of DEMs at Longyearbyen, already coregistered using Nuth and Kääb (2011) as shown in the Nuth and Kääb coregistration example. We also load the glacier outlines here corresponding to moving terrain.

dh = xdem.DEM(xdem.examples.get_path("longyearbyen_ddem"))
glacier_outlines = gu.Vector(xdem.examples.get_path("longyearbyen_glacier_outlines"))

Then, we run the pipeline for inference of elevation heteroscedasticity from stable terrain (Note: we pass a random_state argument to ensure a fixed, reproducible random subsampling in this example). We ask for a fit with a Gaussian model for short range (as it is passed first), and Spherical for long range (as it is passed second):

(
    df_empirical_variogram,
    df_model_params,
    spatial_corr_function,
) = xdem.spatialstats.infer_spatial_correlation_from_stable(
    dvalues=dh, list_models=["Gaussian", "Spherical"], unstable_mask=glacier_outlines, random_state=42
)
/home/docs/checkouts/readthedocs.org/user_builds/xdem/conda/stable/lib/python3.12/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis, out=out, keepdims=keepdims)

The first output corresponds to the dataframe of the empirical variogram, by default estimated using Dowd’s estimator and a circular sampling scheme in SciKit-GStat (following Fig. S13 of Hugonnet et al. (2022)). The lags columns is the upper bound of spatial lag bins (lower bound of first bin being 0), the exp column is the “experimental” variance value of the variogram in that bin, the count the number of pairwise samples, and err_exp the 1-sigma error of the “experimental” variance, if more than one variogram is estimated with the n_variograms parameter.

df_empirical_variogram
exp lags count err_exp
0 1.606022 28.284271 19 NaN
1 4.790244 40.000000 27 NaN
2 7.876436 56.568542 31 NaN
3 3.797114 80.000000 71 NaN
4 5.967543 113.137085 179 NaN
5 5.773913 160.000000 351 NaN
6 5.820014 226.274170 649 NaN
7 5.482665 320.000000 1302 NaN
8 5.874098 452.548340 2668 NaN
9 5.709442 640.000000 5188 NaN
10 5.966839 905.096680 9371 NaN
11 5.778219 1280.000000 16471 NaN
12 6.008722 1810.193360 25214 NaN
13 6.256631 2560.000000 29210 NaN
14 5.988002 3620.386720 27604 NaN
15 6.317496 5120.000000 26869 NaN
16 6.417479 7240.773439 26274 NaN
17 6.547194 10240.000000 26193 NaN
18 6.552516 14481.546879 27512 NaN
19 6.580882 20480.000000 32153 NaN
20 8.371663 28963.093757 23942 NaN


The second output is the dataframe of optimized model parameters (range, sill, and possibly smoothness) for a sum of gaussian and spherical models:

df_model_params
model range psill
0 gaussian 66.369687 5.755225
0 spherical 28963.093757 1.733584


The third output is the spatial correlation function with spatial lags, derived from the variogram:

for spatial_lag in [0, 100, 1000, 10000, 30000]:
    print(
        "Errors are correlated at {:.1f}% for a {:,.0f} m spatial lag".format(
            spatial_corr_function(spatial_lag) * 100, spatial_lag
        )
    )
Errors are correlated at 100.0% for a 0 m spatial lag
Errors are correlated at 23.0% for a 100 m spatial lag
Errors are correlated at 22.0% for a 1,000 m spatial lag
Errors are correlated at 11.6% for a 10,000 m spatial lag
Errors are correlated at 0.0% for a 30,000 m spatial lag

We can plot the empirical variogram and its model on a non-linear X-axis to identify the multi-scale correlations.

xdem.spatialstats.plot_variogram(
    df=df_empirical_variogram,
    list_fit_fun=[xdem.spatialstats.get_variogram_model_func(df_model_params)],
    xlabel="Spatial lag (m)",
    ylabel="Variance of\nelevation differences (m)",
    xscale_range_split=[100, 1000],
)
plot infer spatial correlation

This pipeline will not always work optimally with default parameters: variogram sampling is more robust with a lot of samples but takes long computing times, and the fitting might require multiple tries for forms and possibly bounds and first guesses to help the least-squares optimization. To learn how to tune more parameters and use the subfunctions, see the gallery example: Estimation and modelling of spatial variograms!

Total running time of the script: (0 minutes 9.216 seconds)

Gallery generated by Sphinx-Gallery