wavy.triple_collocation
=======================

.. py:module:: wavy.triple_collocation


Functions
---------

.. autoapisummary::

   wavy.triple_collocation.filter_collocation_distance
   wavy.triple_collocation.filter_values
   wavy.triple_collocation.filter_dynamic_collocation
   wavy.triple_collocation.remove_nan
   wavy.triple_collocation.triple_collocation
   wavy.triple_collocation.get_CDF
   wavy.triple_collocation.CDF_matching_cal
   wavy.triple_collocation.calibration_triplets_cdf_matching
   wavy.triple_collocation.calibration_triplets_tc
   wavy.triple_collocation.least_squares_merging
   wavy.triple_collocation.get_mean_spectra
   wavy.triple_collocation.integrate_r2


Module Contents
---------------

.. py:function:: filter_collocation_distance(data, dist_max, name)

   Filters the datasets according to a maximum collocation
   distance between satellite and in-situ.

   data (dict of wavy objects): wavy objects to filter given
                                the maximum collocation distance.
                                One of the objects must contain
                                the collocation distance.
   dist_max (float): Maximum collocation distance in km.
   name (string): key from the dictionary that refers to the
                  wavy object containing the distance

   returns:
   data_filtered (dict of wavy objects): dictionary of the wavy objects 
                                filtered using the maximum collocation 
                                distance given.


.. py:function:: filter_values(data, ref_data, min=0.0, max=25.0, return_ref_data=False)

   Filters the values for each data serie given as input. 

   data (dict of arrays): data to filter
   ref_data (string or array): Either a string corresponding
                   to a key in data or an array. Values for all
                   data are filtered with respect to the ref_data.
   min (float): minimum value that ref_data should take. 
   max (float): maximum value that ref_data should take.


.. py:function:: filter_dynamic_collocation(data, mod_1, mod_2, max_rel_diff=0.05)

   Filter data when the two given model data differ by more than a given
   percentage. Dynamical collocation filtering method for collocation
   refers to Dodet et al., 2025.

   data (dict of lists): data to filter 
   mod_1 (string or list): Either key from data for the first model data
                           or the list of values of the model directly
   mod_2 (string or list): Either key from data for the first model data
                           or the list of values of the model directly
   max_rel_diff (float): Maximum relative difference (abs(mod_1-mod_2)/mod_1)
                         allowed between values from mod_1 and mod_2.

   returns:
   data_filtered (dict of lists): filtered data


.. py:function:: remove_nan(A, B, C)

   Find indexes of nan values in each of three
   lists, and returns the filtered lists


.. py:function:: triple_collocation(data, metrics=['var', 'rmse', 'si', 'rho', 'mean', 'std'], r2=0, ref=None, dec=3)

   Runs the triple collocation given a dictionary
   containing three measurements, returns results
   in a dictionary.

   data: {'name of measurement':list of values}
   metrics: Str "all" or List of the metrics to return, among 'var',
   'rmse', 'si', 'rho', 'sensitivity', 'snr', 'snr_db', 'fmse', 'mean',
   'std'
   r2: representativeness error or cross correlation error between the 
   first two measurements in data. Default 0. 
   ref: Name of one of the measurements, must correspond
   to one key of data. Default first key from data. 
   dec: Number of decimals to round the results to. Default 3.

   returns: dict of dict of the metrics for each measurement
   {'name of measurement': {'metric name':metric}}


.. py:function:: get_CDF(data, step, llim=None, ulim=None, data_min=None, data_max=None, dec=3, no_empty_bins=True)

.. py:function:: CDF_matching_cal(old, CDF_old, CDF_new)

.. py:function:: calibration_triplets_cdf_matching(data, ref, step, seed=5)

.. py:function:: calibration_triplets_tc(data, ref, r2=0, return_cal_cst=False)

   Calibrate A and B relatively to R using triple collocation calibration
   constant estimates, following Gruber et al., 2016 method.

   data (dict of lists of floats): Dictionary of the data to calibrate.
   ref (string): Name of the reference data to use for calibration.
   r2 (float): Representativeness error
   cal_cst (bool): If True, returns a dictionary for the calibration
                   constantes in addition to the calibrated data. 

   returns:
   data_cal (dict of lists of floats): Dictionary of the calibrated 
                                       data series


.. py:function:: least_squares_merging(data, tc_results=None, return_var=False, **kwargs)

   Merges the three data series given as input following the least 
   squares merging method described in Yilmaz et al., 2012. 

   data (dict of lists of floats): Dictionary of the data to calibrate.
   tc_results (pandas DataFrame): table of the results of triple collocation
              for the given data. Must contain the variance. If None, the 
              triple collocation is performed using the data and kwargs given.
   return_var (bool): If True, returns the variance of the error of the merged 
              data in addition to the merged data.

   returns:
   least_squares_merge (numpy array): series of merged data
   least_squares_var (float): variance of error of the merged data    


.. py:function:: get_mean_spectra(ds, varname, fs, nsample, median_step=None, mode='average', window='hamming')

   Divides a given time series into sample of given size, applies a window to
   each sample and calculates the power spectra for each sample, using a Fast
   Fourier transform. Returns the frequencies and either the list of the
   spectra for each sample or the average spectra over all samples.

   ds (xarray dataset): xarray dataset with dimension time
   varname (str): name of the variable for which the spectra is
                  to be computed, present in ds and indexed by time
   fs (float): sampling frequency
   nsample (int): number of points of each sample
   median_step (np.timedelta64): time to consider between each point of the
                                 time series
   mode (str): either 'average' to return the mean power spectra or
              'list' to return the list of the power spectra
   window (str): window to apply to the samples before applying the
                 FFT. See scipy.singal.periodogram for options.


   return:
   df_spectra (pandas DataFrame): dataframe containing the mean power spectra
                                 or the power spectra for each sample and the
                                 corresponding frequencies.


.. py:function:: integrate_r2(PS_mod, PS_obs, f, threshold=np.inf, threshold_type='inv_freq')

   Estimates the representativeness error r2 by integrating the difference
   between the average power spectra of the model and the observations.
   Integrates up to a given threshold resolution (inverse of the frequency),
   or from a given threshold frequency.
   s
   PS_mod (numpy array): power spectra for the model
   PS_obs (numpy array): power spectra for the observations
   f (numpy array): frequencies of the power spectra
   threshold (float): either upper resolution to integrate to, or minimum
                      frequency to integrate from
   threshold_type (str): indicate whether the threshold corresponds to a minimum
                         frequency ('freq') or to a maximum resolution ('inv_freq')

   return:
   r2 (float): representativeness error estimate