xdem.spatialstats.sample_empirical_variogram

xdem.spatialstats.sample_empirical_variogram#

xdem.spatialstats.sample_empirical_variogram(values, gsd=None, coords=None, subsample=1000, subsample_method='cdist_equidistant', n_variograms=1, n_jobs=1, random_state=None, **kwargs)[source]#

Sample empirical variograms with binning adaptable to multiple ranges and spatial subsampling adapted for raster data. Returns an empirical variogram (empirical variance, upper bound of spatial lag bin, count of pairwise samples).

If values are provided as a Raster subclass, nothing else is required. If values are provided as a 2D array (M,N), a ground sampling distance is sufficient to derive the pairwise distances. If values are provided as a 1D array (N), an array of coordinates (N,2) or (2,N) is expected. If the coordinates do not correspond to points of a grid, a ground sampling distance is needed to correctly get the grid size.

By default, the subsampling is based on RasterEquidistantMetricSpace implemented in scikit-gstat. This method samples more effectively large grid data by isolating pairs of spatially equidistant ensembles for distributed pairwise comparison. In practice, two subsamples are drawn for pairwise comparison: one from a disk of certain radius within the grid, and another one from rings of larger radii that increase steadily between the pixel size and the extent of the raster. Those disks and rings are sampled several times across the grid using random centers. See more details in Hugonnet et al. (2022), https://doi.org/10.1109/jstars.2022.3188922, in particular on Supplementary Fig. 13. for the subsampling scheme.

The “subsample” argument determines the number of samples for each method to yield a number of pairwise comparisons close to that of a pdist calculation, that is N*(N-1)/2 where N is the subsample argument. For the cdist equidistant method, the “runs” (random centers) and “samples” (subsample of a disk/ring) are set automatically to get close to N*(N-1)/2 pairwise samples, fixing a number of rings “nb_rings” to 10. Those can be more finely adjusted by passing the argument “runs”, “samples” and “nb_rings” to kwargs. Further details can be found in the description of skgstat.MetricSpace.RasterEquidistantMetricSpace or _choose_cdist_equidistant_sampling_parameters.

Spatial subsampling method argument subsample_method can be one of “cdist_equidistant”, “cdist_point”, “pdist_point”, “pdist_disk” and “pdist_ring”. The cdist methods use MetricSpace classes of scikit-gstat and do pairwise comparison between two distinct ensembles as in scipy.spatial.cdist. For the cdist methods, the variogram is estimated in a single run from the MetricSpace.

The pdist methods use methods to subsample the Raster points directly and do pairwise comparison within a single ensemble as in scipy.spatial.pdist. For the pdist methods, an iterative process is required: a list of ranges subsampled independently is used.

Variograms are derived independently for several runs and ranges using each pairwise sample, and later aggregated. If the subsampling method selected is “random_point”, the multi-range argument is ignored as range has no effect on this subsampling method.

For pdist methods, keyword arguments are passed to skgstat.Variogram. For cdist methods, keyword arguments are passed to both skgstat.Variogram and skgstat.MetricSpace.

Parameters:
  • values (Union[ndarray[Any, dtype[floating[Any]]], TypeVar(RasterType, bound= Raster)]) – Values of studied variable

  • gsd (float) – Ground sampling distance

  • coords (ndarray[Any, dtype[floating[Any]]]) – Coordinates

  • subsample (int) – Number of samples to randomly draw from the values

  • subsample_method (str) – Spatial subsampling method

  • n_variograms (int) – Number of independent empirical variogram estimations (to estimate empirical variogram spread)

  • n_jobs (int) – Number of processing cores

  • random_state (int | Generator | None) – Random state or seed number to use for calculations (to fix random sampling during testing)

Return type:

DataFrame

Returns:

Empirical variogram (variance, upper bound of lag bin, counts)