Spatial correlation of errors

Spatial correlation of errors#

Digital elevation models have errors that are spatially correlated due to instrument or processing effects. Here, we rely on a non-stationary spatial statistics framework to estimate and model spatial correlations in elevation error. We use a sum of variogram forms to model this correlation, with stable terrain as an error proxy for moving terrain.

Reference: Hugonnet et al. (2022), Figure 5 and Equations 13–16.

import geoutils as gu

import xdem

We load a difference of DEMs at Longyearbyen, already coregistered using Nuth and Kääb (2011) as shown in the Nuth and Kääb coregistration example. We also load the glacier outlines here corresponding to moving terrain.

dh = xdem.DEM(xdem.examples.get_path("longyearbyen_ddem"))
glacier_outlines = gu.Vector(xdem.examples.get_path("longyearbyen_glacier_outlines"))

Then, we run the pipeline for inference of elevation heteroscedasticity from stable terrain (Note: we pass a random_state argument to ensure a fixed, reproducible random subsampling in this example). We ask for a fit with a Gaussian model for short range (as it is passed first), and Spherical for long range (as it is passed second):

(
    df_empirical_variogram,
    df_model_params,
    spatial_corr_function,
) = xdem.spatialstats.infer_spatial_correlation_from_stable(
    dvalues=dh, list_models=["Gaussian", "Spherical"], unstable_mask=glacier_outlines, random_state=42
)
/home/docs/checkouts/readthedocs.org/user_builds/xdem/conda/stable/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis, out=out, keepdims=keepdims)

The first output corresponds to the dataframe of the empirical variogram, by default estimated using Dowd’s estimator and the circular sampling scheme of skgstat.RasterEquidistantMetricSpace (Fig. S13 of Hugonnet et al. (2022)). The lags columns is the upper bound of spatial lag bins (lower bound of first bin being 0), the exp column is the “experimental” variance value of the variogram in that bin, the count the number of pairwise samples, and err_exp the 1-sigma error of the “experimental” variance, if more than one variogram is estimated with the n_variograms parameter.

df_empirical_variogram
exp lags count err_exp
0 1.596629 28.284271 19 NaN
1 4.663377 40.000000 27 NaN
2 7.706409 56.568542 31 NaN
3 3.766628 80.000000 71 NaN
4 5.858630 113.137085 179 NaN
5 5.872112 160.000000 351 NaN
6 5.913194 226.274170 649 NaN
7 5.479238 320.000000 1302 NaN
8 5.951615 452.548340 2668 NaN
9 5.680506 640.000000 5188 NaN
10 5.953464 905.096680 9371 NaN
11 5.758270 1280.000000 16471 NaN
12 6.002661 1810.193360 25214 NaN
13 6.251790 2560.000000 29210 NaN
14 5.977697 3620.386720 27604 NaN
15 6.309217 5120.000000 26869 NaN
16 6.404235 7240.773439 26274 NaN
17 6.531230 10240.000000 26193 NaN
18 6.510347 14481.546879 27512 NaN
19 6.597471 20480.000000 32153 NaN
20 8.337621 28963.093757 23942 NaN


The second output is the dataframe of optimized model parameters (range, sill, and possibly smoothness) for a sum of gaussian and spherical models:

df_model_params
model range psill
0 gaussian 67.406940 5.747333
0 spherical 28963.093757 1.721797


The third output is the spatial correlation function with spatial lags, derived from the variogram:

for spatial_lag in [0, 100, 1000, 10000, 30000]:
    print(
        "Errors are correlated at {:.1f}% for a {:,.0f} m spatial lag".format(
            spatial_corr_function(spatial_lag) * 100, spatial_lag
        )
    )
Errors are correlated at 100.0% for a 0 m spatial lag
Errors are correlated at 22.9% for a 100 m spatial lag
Errors are correlated at 21.9% for a 1,000 m spatial lag
Errors are correlated at 11.6% for a 10,000 m spatial lag
Errors are correlated at 0.0% for a 30,000 m spatial lag

We can plot the empirical variogram and its model on a non-linear X-axis to identify the multi-scale correlations.

xdem.spatialstats.plot_variogram(
    df=df_empirical_variogram,
    list_fit_fun=[xdem.spatialstats.get_variogram_model_func(df_model_params)],
    xlabel="Spatial lag (m)",
    ylabel="Variance of\nelevation differences (m)",
    xscale_range_split=[100, 1000],
)
plot infer spatial correlation

This pipeline will not always work optimally with default parameters: variogram sampling is more robust with a lot of samples but takes long computing times, and the fitting might require multiple tries for forms and possibly bounds and first guesses to help the least-squares optimization. To learn how to tune more parameters and use the subfunctions, see the gallery example: Estimation and modelling of spatial variograms!

Total running time of the script: (0 minutes 9.065 seconds)

Gallery generated by Sphinx-Gallery