XARRAY & GEOVIEWS A new perspective on oceanographic data - part II
In a previous post, I introduced xarray with some simple manipulation and data plotting. In this super-short post, I'm going to do some more manipulation, using multiple input files to create a new dimension, reorganize the data and store them in multiple output files. All but with a few lines of code.
GOAL:¶
The ultimate goal here is to create new datasets, one for band, that aggregate results across experiments so as to facilitate inter-experiment comparisons.
HOW:¶
I will load netCDF files from a number of Monte-Carlo uncertainty experiments, among which the source of the uncertainty differs; Lt (sensor noise), wind, pressure, relative humidity, all the above. At the end of this post, I will have 6 files, one per visible SeaWiFS visible band containing one 3D array where dimensions are latitude, longitude, experiment.
WHY:¶
I'm doing this to create an interactive visualization (cf. next post) using GeoViews, where the goal is to compare, band-wise, cross-experiment results.
As usual, start with some imports...
import xarray as xr
import os
import glob
Now I set up some file path logic to avoid rewriting full file paths. I then accrue file paths into a list. I, fpaths. The new files I will next create will be stored in the 'Synthesis' directory for later retrieval.
dataDir = '/accounts/ekarakoy/disk02/UNCERTAINTIES/Monte-Carlo/DATA/AncillaryMC/'
expDirs = ['Lt', 'AllAnc_Lt', 'Pressure', 'RH', 'WindSpeed', 'O3']
outDir = 'Synthesis'
fpattern = 'S20031932003196.L3m_4D_SU*.nc'
fpaths = [glob.glob(os.path.join(dataDir, expDir, fpattern))[0] for expDir in expDirs]
I'm only interested in the visible bands because of the black pixel assumption used in the atmospheric correction applied during the processing phase, which renders Rrs in the near-infrared bands useless.
bands = [412, 443, 490, 510, 555, 670]
xarray has a nifty feature that allows opening multiple datasets, and automatically concatenating matching (by name and dimension) arrays, with the option of naming the thus newly created dimension. In our case, this is 'experiment'. The next line of code, below, opens what will end up being a temporary xarray Dataset - note that you will need dask installed for this. I'll then label the experiment dimension with the appropriate experiment names. Importantly, the concatenation direction reflects the order in which the file paths are specified, and it's also the order the experiment names are in in the 'expDirs' list defined above. I also make sure that the Rrs uncertainty data is labeled the same, 'rrs_unc'.
with xr.open_mfdataset(fpaths, concat_dim='experiment') as allData:
allData.coords['experiment'] = expDirs
for band in bands:
foutpath = os.path.join(dataDir, outDir, '%s%d%s' %(fpattern.split('SU')[0],
band, '.nc'))
if not os.path.exists(os.path.dirname(foutpath)):
os.makedirs(os.path.dirname(foutpath))
data = allData.data_vars['Rrs_unc_%d' % band]
data.name='rrs_unc'
dsData = data.to_dataset()
dsData.to_netcdf(path=foutpath, engine='netcdf4')
Verify that all the files are where they should be - in the Synthesis directory
os.listdir(os.path.dirname(foutpath))
Success!! I now have six files, one per Rrs band, and each file contains the results of five experiments. That's it. This will make it easy for me, in the next post, to use GeoViews, HoloViews, with a Matplotlib backend to create some nice, but more importantly, intuitive and hopefully useful visualizations with very little code. Happy hacking!!
Comments