Mozambique Validation Workshop 2024
The following examples are tailored to the wavy Mozambique Validation Workshop in fall 2024. This workshop will focus on some simple examples that can be used as python code snippets in your workflow.
0. wavy installation
Installing wavy can be done via conda. The steps are as follows:
clone the github repo like:
$ cd ~ $ git clone https://github.com/bohlinger/wavy.git or for one single branch try: $ git clone --single-branch --branch master https://github.com/bohlinger/wavy.git
install wavy:
$ cd ~/wavy $ conda env create -f environment.yml $ conda activate wavy
A much faster installation method would be using mamba if you have that installed.
$ cd ~/wavy $ mamba env create -f environment.yml $ conda activate wavy
Now, append wavy root directory to $PYTHONPATH, for instance add the following to your .bashrc:
export PYTHONPATH=$PYTHONPATH:/path/to/your/wavy
Note
/path/to/your/wavy/ should be replace with the full path of your wavy folder. It will be the case throughout all this documentation.
1. wavy config files
Create a new project directory, in this example we will call it Moz_ws24_wavy. Within this folder, create another folder config, where you will store the configuration files used by wavy for your project. This could look like:
:~$ mkdir Moz_ws24_wavy
:~$ cd Moz_ws24_wavy
:~/Moz_ws24_wavy$ mkdir config
In the config folder located in the path/to/your/wavy/wavy/ directory, you will find some default config files. For this workshop you will need the following:
satellite_cfg.yaml.default region_cfg.yaml.default
quicklook_cfg.yaml.default validation_metrics.yaml.default
model_cfg.yaml.default variable_def.yaml.default
Now copy these files listed above into the config folder you just created in your project directory (/Moz_ws24_wavy/config/) and remove the suffix .default. It should now look like:
:~/Moz_ws24_wavy/config$ ls satellite_cfg.yaml region_cfg.yaml quicklook_cfg.yaml validation_metrics.yaml model_cfg.yaml variable_def.yaml
At the root of your project directory, establish an .env file such that wavy knows where to find the config files it should use. This could look like:
And the .env file should contain the following line:
WAVY_CONFIG=/home/USER/Moz_ws24_wavy/config
Replace USER with you username that you get when typing
echo ${USER}
In order to download satellite data, you also need to copy the wavyDownload.py file from /wavy/apps/standalone/ into your project directory.
This should be the structure of your project directory:
:~/Moz_ws24_wavy$ ls -la
total 16
drwxrwxr-x 3 user user 4096 Nov 14 09:04 .
drwx------ 79 user user 4096 Nov 14 09:06 ..
drwxrwxr-x 2 user user 4096 Nov 14 09:10 config
-rwxr-xr-x 1 user user 44 Nov 14 09:04 .env
-rwxrwxr-x 1 user user 3814 Nov 5 09:12 wavyDownload.py
2. Download L3 satellite altimetry data
L3 satellite data is obtained from Copernicus with the product identifier WAVE_GLO_WAV_L3_SWH_NRT_OBSERVATIONS_014_001. User credentials are required for this task. So before you can start you have to get a Copernicus account (free of costs). Prepare access to Copernicus products. Your should add the following lines to your .bashrc, adapted with your username and password from Copernicus.
export COPERNICUSMARINE_SERVICE_USERNAME=YOUR_COPERNICUS_USERNAME
export COPERNICUSMARINE_SERVICE_PASSWORD=YOUR_COPERNICUS_PASSWORD
Adjust the satellite config file called satellite_cfg.yaml. Remember, this is the file you copied to ~/Moz_ws24_wavy/config. In this file you should adapt the default paths with the ones from your project. It should include the following section and could look like:
--- # specifications for satellite missions
cmems_L3_NRT:
# mandatory
name:
s3a: s3a
s3b: s3b
c2: c2
j3: j3
h2b: h2b
al: al
cfo: cfo
s6a: s6a
swon: swon
# mandatory when downloading
# where to store downloaded data
download:
ftp: # downloading method
src_tmplt: "/Core/\
WAVE_GLO_PHY_SWH_L3_NRT_014_001/\
cmems_obs-wave_glo_phy-swh_nrt_name-l3_PT1S/\
%Y/%m/"
trgt_tmplt: /path/to/Moz_ws24_wavy/altimeter_data/L3/name/%Y/%m
path_date_incr_unit: 'm'
path_date_incr: 1
search_str: '%Y%m%dT'
strsub: ['name']
server: "nrt.cmems-du.eu"
copernicus:
dataset_id: cmems_obs-wave_glo_phy-swh_nrt_name-l3_PT1S
trgt_tmplt: /path/to/Moz_ws24_wavy/altimeter_data/L3/name/%Y/%m
path_date_incr_unit: 'm'
path_date_incr: 1
strsub: ['name']
server: "nrt.cmems-du.eu"
time_incr: 'd' # 'h', 'd', 'm'
# optional: where to read from
# can be defined directly when calling wavy
wavy_input:
src_tmplt: /path/to/Moz_ws24_wavy/altimeter_data/L3/name/%Y/%m
fl_tmplt: "varalias_name_region_\
%Y%m%d%H%M%S_%Y%m%d%H%M%S.nc"
strsub: ['name']
path_date_incr_unit: 'm'
path_date_incr: 1
# optional: where to write to
# can be defined directly when calling wavy
wavy_output:
trgt_tmplt: /path/to/Moz_ws24_wavy/altimeter_data/L3/name/%Y/%m
fl_tmplt: "varalias_name_region_\
%Y%m%d%H%M%S_%Y%m%d%H%M%S.nc"
strsub: ['varalias','name','region']
file_date_incr: m
# optional, if not defined the class default is used
reader: read_local_ncfiles
collector: get_remote_files_copernicusmarine
# optional, needs to be defined if not cf and in variable_info.yaml
vardef:
Hs: VAVH
U: WIND_SPEED
coords:
# optional, info that can be used by class functions
misc:
processing_level:
provider:
obs_type:
# optional, to ease grouping
tags:
You can proceed now and download L3 data using the wavyDownload.py script you copied in your project folder. You can get help with:
$ ./wavyDownload.py -h
And then download some satellite altimeter data:
$ ./wavyDownload.py --name s3a --sd 20241017T07 --ed 20241017T08 --nID cmems_L3_NRT
If you need to download satellite data from Copernicus for more than a day or month, you can change the time increment in time_incr. ‘h’ will download 3-hours files at a time, ‘d’ will download all available files for a day at a time and ‘m’ all available files for a month at a time. Make sure to change this parameter if you need to download long periods of data as this will considerably shorten the time it takes to do so.
You can also download the data directly with python as follows:
>>> from wavy.satellite_module import satellite_class as sc
>>> nID = "cmems_L3_NRT"
>>> name = "s3a"
>>> sd = "2024-10-17 07"
>>> ed = "2024-10-17 09"
>>> sco = sc(sd=sd, nID=nID, name=name, ed=ed).download()
3. Read satellite data
Once the satellite data is downloaded one can access and read the data for further use with wavy. Let’s have a look at some examples in a python script.
In python L3-data can be read by importing the satellite_class, choosing a region of interest, the variable of interest (Hs or U), the satellite mission, which product should be used, and whether a time window should be used as well as a start and possibly an end date. This could look like:
>>> from wavy.satellite_module import satellite_class as sc
>>> nID = "cmems_L3_NRT"
>>> name = "s3a"
>>> sd = "2024-10-17 07"
>>> ed = "2024-10-17 09"
>>> sco = sc(sd=sd, nID=nID, name=name, ed=ed).populate()
This would result in a satellite_class object and a similar output message as:
# -----
### Initializing satellite_class object ###
Given kwargs:
{'sd': '2024-10-17 07', 'nID': 'cmems_L3_NRT', 'name': 's3a', 'ed': '2024-10-17 09'}
### satellite_class object initialized ###
# -----
### Read files and populate satellite_class object
## Find and list files ...
path is None -> checking config file
Object is iterable
9 valid files found
source template: /path/to/Moz_ws24_wavy/altimeter_data/L3/name/%Y/%m
....
## Summary:
5238 footprints retrieved.
Time used for retrieving data:
0.3 seconds
### satellite_class object populated ###
# -----
Investigating the satellite_object you will find something like:
>>> sco.
sco.apply_limits( sco.filter_main(
sco.cfg sco.filter_NIGP(
sco.cleaner_blockQ( sco.filter_runmean(
sco.cleaner_blockStd( sco.get_item_child(
sco.compute_pulse_limited_footprint_radius() sco.get_item_parent(
sco.coords sco.list_input_files(
sco.crop_to_period( sco.meta
sco.crop_to_poi( sco.name
sco.crop_to_region( sco.nID
sco.despike_blockQ( sco.pathlst
sco.despike_blockStd( sco.poi
sco.despike_GP( sco.populate(
sco.despike_linearGAM( sco.quick_anim(
sco.despike_NIGP( sco.quicklook(
sco.distlim sco.reader(
sco.download( sco.region
sco.ed sco.sd
sco.filter sco.slider_chunks(
sco.filter_blockMean( sco.stdvarname
sco.filter_distance_to_coast( sco.time_gap_chunks(
sco.filter_footprint_land_interaction( sco.twin
sco.filter_footprint_radius( sco.units
sco.filter_GP( sco.varalias
sco.filter_lanczos( sco.varname
sco.filter_landMask( sco.vars
sco.filter_linearGAM( sco.write_to_nc(
With the retrieved data in sco.vars:
>>> sco.vars
Dimensions: (time: 5238)
Coordinates:
* time (time) datetime64[ns] 42kB 2024-10-17T06:30:00 ... 2024-10-17T09...
Data variables:
Hs (time) float64 42kB 2.068 2.065 2.063 2.063 ... 2.406 2.386 2.374
lons (time) float64 42kB -138.7 -138.7 -138.7 ... -168.3 -168.3 -168.4
lats (time) float64 42kB 52.22 52.27 52.33 52.39 ... -25.42 -25.36 -25.3
Attributes:
title: wavy dataset
Using the quicklook function you can quickly visualize the data you have retrieved:
>>> sco.quicklook(ts=True) # for time series
>>> sco.quicklook(m=True) # for a map
>>> sco.quicklook(a=True) # for all
4. Define your own region
In wavy you can define your own region over which you want to gather satellite data. The region has to be defined in the region_cfg.yaml file. It can either be defined as a rectangular region, a polynom, a geojson format, or a model. If region is a model defined in model_specs.yaml, this will automatically be noticed and a model file will be loaded to cross-check the model domain with the satellite footprints. Let’s define Mozambique as a new region:
Moz:
llcrnrlon: 28.3
llcrnrlat: -27.8
urcrnrlon: 46
urcrnrlat: -10
Now, we use this region to retrieve only data over this region.
>>> from wavy.satellite_module import satellite_class as sc
>>> nID = "cmems_L3_NRT"
>>> name = "s3a"
>>> sd = "2024-10-17 07"
>>> ed = "2024-10-17 09"
>>> sco = sc(sd=sd, nID=nID, name=name, ed=ed, region="Moz").populate()
>>> sco.quicklook(m=True)
Another option is to define the region directly in the script:
>>> from wavy.satellite_module import satellite_class as sc
>>> region_dict = {'name': 'Moz',
>>> 'region': {
>>> 'llcrnrlon': 28.3,
>>> 'llcrnrlat': -27.8,
>>> 'urcrnrlon': 46,
>>> 'urcrnrlat': -10}}
>>> nID = "cmems_L3_NRT"
>>> name = "s3a"
>>> sd = "2024-10-17 07"
>>> ed = "2024-10-17 09"
>>> sco = sc(sd=sd, nID=nID, name=name, ed=ed).populate(region=region_dict)
You can adapt the window for the map as well as follows:
>>> sco.quicklook(m=True, map_extent_llon=28.3, map_extent_ulon=46,
map_extent_llat=-27.8, map_extent_ulat=-10)
5. access/read model data
Model output can be accessed and read using the model_module module. The model_module config file model_cfg.yaml needs adjustments if you want to include a model that is not present as default. Given that the model output file you would like to read follows the cf-conventions and standard_names are unique, the minimum information you have to provide are usually:
modelname:
vardef:
Hs:
time:
lons:
lats:
wavy_input:
fl_tmplt:
reader:
misc:
init_times:
init_step:
grid_date:
date_incr_unit:
date_incr:
The variable aliases (left hand side below vardef) need to be specified in the variable_def.yaml. Basic variables are already defined. Adding your model output files to wavy means to add something like:
era5Moz:
name:
download:
vardef:
Hs: swh
time: valid_time
lons: longitude
lats: latitude
coords:
wavy_input:
src_tmplt: /path/to/Moz_ws24_wavy/data/
fl_tmplt: era5_reanalysis_241016_241024_moz.nc
reader: read_era
collector:
misc:
init_times: [0]
init_step: 24
grid_date: 2024-10-23 00:00:00
convention: meteorological
date_incr_unit: h
date_incr: 1
proj4: "+proj=longlat +a=6367470 +e=0 +no_defs"
Now you can proceed to load your model in wavy. Start python and type:
>>> from wavy.model_module import model_class as mc
>>> nID = 'era5Moz' # default
>>> varalias = 'Hs' # default
>>> sd = "2024-10-17 07"
>>> mco = mc(nID=nID,sd=sd).populate() # one time slice
Whenever the keyword “leadtime” is None, a best estimate is assumed and retrieved. In this case you are using reanalysis data, meaning that there is no leadtime to take into account. The output will be something like:
>>> mco.
mco.cfg mco.filter mco.meta mco.quick_anim( mco.stdvarname
mco.coords mco.get_item_child( mco.model mco.quicklook( mco.units
mco.crop_to_period( mco.get_item_parent( mco.nID mco.reader( mco.varalias
mco.distlim mco.leadtime mco.pathlst mco.region mco.varname
mco.ed mco.list_input_files( mco.populate( mco.sd mco.vars
>>> mco.vars
Dimensions: (time: 1, lats: 43, lons: 41)
Coordinates:
* lons (lons) float64 328B 30.0 30.5 31.0 31.5 ... 48.5 49.0 49.5 50.0
* lats (lats) float64 344B -9.0 -9.5 -10.0 -10.5 ... -29.0 -29.5 -30.0
number int64 8B ...
* time (time) datetime64[ns] 8B 2024-10-16T23:00:00
expver <U4 16B ...
Data variables:
Hs (time, lats, lons) float32 7kB nan nan nan ... 5.322 5.505 5.725
Attributes:
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2024-10-30T14:48 GRIB to CDM+CF via cfgrib-0.9.1...
For the model_class objects a quicklook function exists to depict a certain time step of what you loaded:
>>> mco.quicklook(m=True) # for a map
>>> mco.quicklook(a=True) # for a map
6. Collocating model and observations
One main focus of wavy is to ease the collocation of observations and numerical wave models for the purpose of model validation. If you have available the necessary satellite data and model data you can proceed with collocation:
Collocation of satellite and wave model
>>> from wavy.satellite_module import satellite_class as sc
>>> from wavy.collocation_module import collocation_class as cc
>>> # retrieve the satellite data for the region
>>> nID = "cmems_L3_NRT"
>>> name = "s3a"
>>> sd = "2024-10-17 07"
>>> ed = "2024-10-17 09"
>>> sco = sc(sd=sd, nID=nID, name=name, ed=ed, region='Moz').populate()
>>> # collocate the model
>>> model = 'era5Moz'
>>> cco = cc(oco=sco, model=model, leadtime='best', distlim=6, twin=180)
distlim is the distance limit for collocation in km and date_incr is the time step increase in hours. One can also add a keyword for the collocation time window. The default is +-30min which is equivalent to adding twin=30. In this case ERA only had 6h time steps which makes it a bit more unlikely that satellite crossings and model time steps coincide. Increasing twin helps, however, it means we assume quasi-stationarity for this time period.
Using the quicklook function again (cco.quicklook(a=True)) will enable three plots this time, a time series plot (ts=True), a map plot (m=True), and a scatter plot (sc=True)
cco.quicklook(a=True)
7. Validate the collocated time series
Having collocated a quick validation can be performed using the validationmod. validation_specs.yaml can be adjusted.
>>> val_dict = cco.validate_collocated_values()
# ---
Validation stats
# ---
Correlation Coefficient: 0.60
Mean Absolute Difference: 0.54
Root Mean Squared Difference: 0.63
Normalized Root Mean Squared Difference: 0.20
Debiased Root Mean Squared Difference: 0.59
Bias: -0.22
Normalized Bias: -0.07
Scatter Index: 18.82
Model Activity Ratio: 1.31
Mean of Model: 2.91
Mean of Observations: 3.13
Number of Collocated Values: 6
The entire validation dictionary will then be in val_dict.