Correlation Matrix

https://colab.research.google.com/assets/colab-badge.svg https://mybinder.org/badge_logo.svg
plot_corr_map(sourceTable, sourceVar, targetTables, targetVars, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2, temporalTolerance, latTolerance, lonTolerance, depthTolerance, method='spearman', exportDataFlag=False, show=True)

This function computes and plots the pair-correlation coefficient between the source and target variables. The results are visualized in form of a correlation matrix. To compute the correlations, the source and target variables have to be colocalized first (see Match (colocalize) Datasets). The colocalization procedure relies on the tolerance parameters because they set the matching boundaries between the source and target datasets. Note the source has to be a single non-climatological variable. In principle, if the source dataset is fully covered by the target variable’s spatio-temporal range, there should always be matching results if the tolerance parameters are larger than half of their corresponding spatial/temporal resolutions. Please explore the Data Catalog to find appropriate target variables. Currently, this visualization is only supported by plotly visualization library.

Returns the generated correlation graph object. One may modify the graph properties (see example below).

Note

This method requires a valid API key. It is not necessary to set the API key every time because the API properties are stored locally after being called the first time.


Parameters:
sourceTable: string

Table name of the source dataset. A full list of table names can be found in Data Catalog.

sourceVar: string

The source variable short name. The target variables are matched (colocalized) with this variable. A full list of variable short names can be found in Data Catalog.

targetTables: list of string

Table names of the target datasets to be matched with the source data. Note source dataset can be matched with multiple target datasets. A full list of table names can be found in Data Catalog.

dt1: string

Start date or datetime. Both source and target datasets are filtered before matching. This parameter sets the lower bound of the temporal cut.

Example values: ‘2016-05-25’ or ‘2017-12-10 17:25:00’.

dt2: string

End date or datetime. Both source and target datasets are filtered before matching. This parameter sets the upper bound of the temporal cut.

lat1: float

Start latitude [degree N]. Both source and target datasets are filtered before matching. This parameter sets the lower bound of the meridional cut. Note latitude ranges from -90 to 90 degrees.

lat2: float

End latitude [degree N]. Both source and target datasets are filtered before matching. This parameter sets the upper bound of the meridional cut. Note latitude ranges from -90 to 90 degrees.

lon1: float

Start longitude [degree E]. Both source and target datasets are filtered before matching. This parameter sets the lower bound of the zonal cut. Note longitude ranges from -180 to 180 degrees.

lon2: float

End longitude [degree E]. Both source and target datasets are filtered before matching. This parameter sets the upper bound of the zonal cut. Note longitude ranges from -180 to 180 degrees.

depth1: float

Start depth [m]. Both source and target datasets are filtered before matching. This parameter sets the lower bound of the vertical cut. Note depth is a positive number (depth is 0 at the surface and increases towards the ocean floor).

depth2: float

End depth [m]. Both source and target datasets are filtered before matching. This parameter sets the upper bound of the vertical cut. Note depth is a positive number (depth is 0 at the surface and increases towards the ocean floor).

temporalTolerance: list of int

Temporal tolerance values between pairs of source and target datasets. The size and order of values in this list should match those of targetTables. If only a single integer value is given, that would be applied to all target datasets. This parameter is in day units except when the target variable represents monthly climatology data in which case it is in month units. Note fractional values are not supported in the current version.

latTolerance: list of float or int

Spatial tolerance values in meridional direction [deg] between pairs of source and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A “safe” value for this parameter can be slightly larger than the half of the target variable’s spatial resolution.

lonTolerance: list of float or int

Spatial tolerance values in zonal direction [deg] between pairs of source and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A “safe” value for this parameter can be slightly larger than the half of the target variable’s spatial resolution.

depthTolerance: list of float or int

Spatial tolerance values in vertical direction [m] between pairs of source and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets.

method: str, default: ‘spearman’

Correlation algorithm. ‘spearman’ is a rank correlation algorithm and is a metric for monotonic relationships. Other options involve ‘pearson’ and ‘kendall’. ‘pearson’ is the standard correlation coefficient, more favorable for linear correlations. ‘kendall’ evaluates Kendall Tau correlation coefficient.

exportDataFlag: boolean, default: False

If True, the graph data points are stored on the local machine. The export path and file format are set by the APIs parameters.

show: boolean, default: True

If True, the graph object is returned and is displayed. The graph file is saved on the local machine at the figureDir directory. If False, the graph object is returned but not displayed.

Returns:

the graph object

Below are the graph’s properties and methods.

Properties:
x: list of string

Correlation matrix column titles (covariate names).

y: list of string

Correlation matrix row titles (covariate names).

z: numpy.ndarray

Computed pairwise correlation coefficients.

cmap: str or cmocean colormap

Colormap name. Any matplotlib (e.g. ‘viridis’, ..) or cmocean (e.g. cmocean.cm.thermal, ..) colormaps can be passed to this property. A full list of matplotlib and cmocean color palettes can be found at the following links: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

https://matplotlib.org/cmocean/

vmin: float

This parameter defines the lower bound of the colorbar.

vmax: float

This parameter defines the upper bound of the colorbar.

height: int

Graph’s height in pixels.

width: int

Graph’s width in pixels.

title: str

The graphs’s title.

Methods:
render()

Displays the plot according to the set properties.


Example

In this example the abundance of a prochlorococcus strain (MIT9313PCR, see lines 37-38) measured by Chisholm lab during the AMT13 cruise (Atlantic Meridional Transect Cruise 13) is colocalized with 7 target variables (lines 7-8):

  • ‘MIT9312PCR*Chisholm’, ‘MED4PCR*Chisholm’, and ‘sbact_Chisholm’ from the same source dataset
  • ‘phosphate*WOA*clim’, and ‘nitrate*WOA*clim’ from World Ocean Atlas monthly climatology dataset
  • ‘chl’ from weekly averaged satellite chlorophyll dataset
  • ‘picoprokaryote’ from 3-day averaged Darwin model. Colocalizing this variable will take longer time than others as the 3-day averaged Darwin dataset is massive (multi-decadal global 3D dataset)!

Tip

The space-time cut parameters (lines 41-48) have been set in such a way to encompass the entire source dataset ‘tblAMT13_Chisholm’ (see the dataset page for more details). Notice that the last data point at the source dataset has been measured at ‘2003-10-12 12:44:00’. For simplicity dt2 has been set to ‘2003-10-13’, but you could also use the exact date-time ‘2003-10-12 12:44:00’.

Please review the Example 1 at Match (colocalize) Datasets page since all of the mentioned tips directly apply to this example too.

#!pip install pycmap -q     #uncomment to install pycmap, if necessary
# uncomment the lines below if the API key has not been registered on your machine, previously.
# import pycmap
# pycmap.API(token='YOUR_API_KEY>', vizEngine='plotly')

from collections import namedtuple
from pycmap.viz import plot_corr_map



def match_params():
    Param = namedtuple('Param', ['table', 'variable', 'temporalTolerance', 'latTolerance', 'lonTolerance', 'depthTolerance'])
    params = []
    ######## self-matching: colocalizing with some other variables in the tblAMT13_Chisholm dataset
    params.append(Param('tblAMT13_Chisholm', 'MIT9312PCR_Chisholm', 0, 0, 0, 0))
    params.append(Param('tblAMT13_Chisholm', 'MED4PCR_Chisholm', 0, 0, 0, 0))
    params.append(Param('tblAMT13_Chisholm', 'sbact_Chisholm', 0, 0, 0, 0))
    ####### WOA: World Ocean Atlas Monthly Climatology
    params.append(Param('tblWOA_Climatology', 'nitrate_WOA_clim', 0, .5, .5, 5))
    params.append(Param('tblWOA_Climatology', 'phosphate_WOA_clim', 0, 0.5, 0.5, 5))
    ####### Satellite
    params.append(Param('tblCHL_REP', 'chl', 4, 0.25, 0.25, 0))
    ####### Darwin Model
    params.append(Param('tblDarwin_Phytoplankton', 'picoprokaryote', 2, 0.25, 0.25, 5))


    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = [], [], [], [], [], []
    for i in range(len(params)):
        tables.append(params[i].table)
        variables.append(params[i].variable)
        temporalTolerance.append(params[i].temporalTolerance)
        latTolerance.append(params[i].latTolerance)
        lonTolerance.append(params[i].lonTolerance)
        depthTolerance.append(params[i].depthTolerance)
    return tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance



targetTables, targetVars, temporalTolerance, latTolerance, lonTolerance, depthTolerance = match_params()
go = plot_corr_map(
                  sourceTable='tblAMT13_Chisholm',
                  sourceVar='MIT9313PCR_Chisholm',
                  targetTables=targetTables,
                  targetVars=targetVars,
                  dt1='2003-09-14',
                  dt2='2003-10-13',
                  lat1=-48,
                  lat2=48,
                  lon1=-52,
                  lon2=-11,
                  depth1=0,
                  depth2=240,
                  temporalTolerance=temporalTolerance,
                  latTolerance=latTolerance,
                  lonTolerance=lonTolerance,
                  depthTolerance=depthTolerance
                  )
# here is how to modify the graph:
import numpy as np

# print correlation values
# print(go.z)
# print(go.x)
# print(go.y)
go.z = np.abs(go.z)
go.cmap = 'Greys'
go.width = 1000
go.height = 1000
go.render()