wavy.triple_collocation

Functions

`filter_collocation_distance`(data, dist_max, name)	Filters the datasets according to a maximum collocation
`filter_values`(data, ref_data[, min, max, return_ref_data])	Filters the values for each data serie given as input.
`filter_dynamic_collocation`(data, mod_1, mod_2[, ...])	Filter data when the two given model data differ by more than a given
`remove_nan`(A, B, C)	Find indexes of nan values in each of three
`triple_collocation`(data[, metrics, r2, ref, dec])	Runs the triple collocation given a dictionary
`get_CDF`(data, step[, llim, ulim, data_min, data_max, ...])
`CDF_matching_cal`(old, CDF_old, CDF_new)
`calibration_triplets_cdf_matching`(data, ref, step[, seed])
`calibration_triplets_tc`(data, ref[, r2, return_cal_cst])	Calibrate A and B relatively to R using triple collocation calibration
`least_squares_merging`(data[, tc_results, return_var])	Merges the three data series given as input following the least
`get_mean_spectra`(ds, varname, fs, nsample[, ...])	Divides a given time series into sample of given size, applies a window to
`integrate_r2`(PS_mod, PS_obs, f[, threshold, ...])	Estimates the representativeness error r2 by integrating the difference

Module Contents

wavy.triple_collocation.filter_collocation_distance(data, dist_max, name)

Filters the datasets according to a maximum collocation distance between satellite and in-situ.

data (dict of wavy objects): wavy objects to filter given: the maximum collocation distance. One of the objects must contain the collocation distance.

dist_max (float): Maximum collocation distance in km. name (string): key from the dictionary that refers to the

wavy object containing the distance

returns: data_filtered (dict of wavy objects): dictionary of the wavy objects

filtered using the maximum collocation distance given.

wavy.triple_collocation.filter_values(data, ref_data, min=0.0, max=25.0, return_ref_data=False)

Filters the values for each data serie given as input.

data (dict of arrays): data to filter ref_data (string or array): Either a string corresponding

to a key in data or an array. Values for all data are filtered with respect to the ref_data.

min (float): minimum value that ref_data should take. max (float): maximum value that ref_data should take.

wavy.triple_collocation.filter_dynamic_collocation(data, mod_1, mod_2, max_rel_diff=0.05)

Filter data when the two given model data differ by more than a given percentage. Dynamical collocation filtering method for collocation refers to Dodet et al., 2025.

data (dict of lists): data to filter mod_1 (string or list): Either key from data for the first model data

or the list of values of the model directly

mod_2 (string or list): Either key from data for the first model data: or the list of values of the model directly
max_rel_diff (float): Maximum relative difference (abs(mod_1-mod_2)/mod_1): allowed between values from mod_1 and mod_2.

returns: data_filtered (dict of lists): filtered data

wavy.triple_collocation.remove_nan(A, B, C): Find indexes of nan values in each of three lists, and returns the filtered lists

wavy.triple_collocation.triple_collocation(data, metrics=['var', 'rmse', 'si', 'rho', 'mean', 'std'], r2=0, ref=None, dec=3)

Runs the triple collocation given a dictionary containing three measurements, returns results in a dictionary.

data: {‘name of measurement’:list of values} metrics: Str “all” or List of the metrics to return, among ‘var’, ‘rmse’, ‘si’, ‘rho’, ‘sensitivity’, ‘snr’, ‘snr_db’, ‘fmse’, ‘mean’, ‘std’ r2: representativeness error or cross correlation error between the first two measurements in data. Default 0. ref: Name of one of the measurements, must correspond to one key of data. Default first key from data. dec: Number of decimals to round the results to. Default 3.

returns: dict of dict of the metrics for each measurement {‘name of measurement’: {‘metric name’:metric}}

wavy.triple_collocation.get_CDF(data, step, llim=None, ulim=None, data_min=None, data_max=None, dec=3, no_empty_bins=True)

wavy.triple_collocation.CDF_matching_cal(old, CDF_old, CDF_new)

wavy.triple_collocation.calibration_triplets_cdf_matching(data, ref, step, seed=5)

wavy.triple_collocation.calibration_triplets_tc(data, ref, r2=0, return_cal_cst=False)

Calibrate A and B relatively to R using triple collocation calibration constant estimates, following Gruber et al., 2016 method.

data (dict of lists of floats): Dictionary of the data to calibrate. ref (string): Name of the reference data to use for calibration. r2 (float): Representativeness error cal_cst (bool): If True, returns a dictionary for the calibration

constantes in addition to the calibrated data.

returns: data_cal (dict of lists of floats): Dictionary of the calibrated

data series

wavy.triple_collocation.least_squares_merging(data, tc_results=None, return_var=False, **kwargs)

Merges the three data series given as input following the least squares merging method described in Yilmaz et al., 2012.

data (dict of lists of floats): Dictionary of the data to calibrate. tc_results (pandas DataFrame): table of the results of triple collocation

for the given data. Must contain the variance. If None, the triple collocation is performed using the data and kwargs given.

return_var (bool): If True, returns the variance of the error of the merged: data in addition to the merged data.

returns: least_squares_merge (numpy array): series of merged data least_squares_var (float): variance of error of the merged data

wavy.triple_collocation.get_mean_spectra(ds, varname, fs, nsample, median_step=None, mode='average', window='hamming')

Divides a given time series into sample of given size, applies a window to each sample and calculates the power spectra for each sample, using a Fast Fourier transform. Returns the frequencies and either the list of the spectra for each sample or the average spectra over all samples.

ds (xarray dataset): xarray dataset with dimension time varname (str): name of the variable for which the spectra is

to be computed, present in ds and indexed by time

fs (float): sampling frequency nsample (int): number of points of each sample median_step (np.timedelta64): time to consider between each point of the

time series

mode (str): either ‘average’ to return the mean power spectra or: ‘list’ to return the list of the power spectra
window (str): window to apply to the samples before applying the: FFT. See scipy.singal.periodogram for options.

return: df_spectra (pandas DataFrame): dataframe containing the mean power spectra

or the power spectra for each sample and the corresponding frequencies.

wavy.triple_collocation.integrate_r2(PS_mod, PS_obs, f, threshold=np.inf, threshold_type='inv_freq')

Estimates the representativeness error r2 by integrating the difference between the average power spectra of the model and the observations. Integrates up to a given threshold resolution (inverse of the frequency), or from a given threshold frequency. s PS_mod (numpy array): power spectra for the model PS_obs (numpy array): power spectra for the observations f (numpy array): frequencies of the power spectra threshold (float): either upper resolution to integrate to, or minimum

frequency to integrate from

threshold_type (str): indicate whether the threshold corresponds to a minimum: frequency (‘freq’) or to a maximum resolution (‘inv_freq’)

return: r2 (float): representativeness error estimate