Examples for Data Access and Visualization
Prerequisites
To provide progressive streaming capability for large datasets, the data has been converted to OpenVisus IDX format.
Option 1: Using Conda
Download the environment.yml
file from the Pythia OpenVisus Cookbook GitHub repository and use conda
to create your environment. This will install all required libraries for data access and visualization.
- Step 1: Download
environment.yml
from here. - Step 2: Create the environment using conda:
conda env create -f environment.yml
- Step 3: Activate the environment:
conda activate nsdf-cookbook
For more instructions on managing conda environments, see the official documentation.
Option 2: Using pip
source .venv/bin/activate</code></pre>If you prefer to use pip
, you can create a Python virtual environment and install the minimal required libraries manually:
- Step 1: Create a new virtual environment:
python -m venv .venv source .venv/bin/activate
- Step 2: Install the required libraries:
python -m pip install jupyterlab matplotlib requests aiohttp bokeh panel xmltodict colorcet boto3 basemap OpenVisus openvisuspy
Binder
You can use the following link to open the binder and try the notebooks. It might take a while if you are launching it for the first time
Launch BinderAccess LLC4320 ECCO Data
Below are the steps based on the GitHub instructions. Check out this Github Repo for examples.
- Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
import openvisuspy as ovp
- Step 2: Define the field you want to access
#available options=[salt, theta, u, v, w]; choose one below.. lets say, we select w for now
variable = 'w'
- Step 3: Load the IDX metadata:
In this section, you can select any variables that you can declared in the cells above and replace it inside LoadDataset. We are just reading the metadata for the dataset here.
# Step 3: Load the 4320 dataset from OSDF.. if salt or theta is selected above, change climate2 to climate1 below
field= f"pelican://osg-htc.org/nasa/nsdf/climate2/llc4320/idx/w/w_llc4320_x_y_depth.idx"
db=ovp.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
- Step 4: Read Data (Since the data is very large, I am only extracting one level. Check data descriptions for more details.)
# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which takes a while to load because of the higher filesize.
# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).
data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.db.read(time=0,quality=data_resolution, z=[0,1]) # Since the data is very large, I am only extracting one level.
print(data3D.shape)
print(np.min(data3D),np.max(data3D))
2. Access DYAMOND Data (Atmospheric - GEOS and Oceanic - LLC2160)
You can follow these steps to access the DYAMOND atmospheric (GEOS) and oceanic (LLC2160) data. You can find individual data description and fields description in the Data Section
2.1 Access DYAMOND Atmospheric Data (GEOS)
Binder
You can use the following link to open the binder and try the notebooks. It might take a while if you are launching it for the first time
Launch BinderBelow are the steps to access the DYAMOND Atmospheric (GEOS) data. Check out this github repo for more jupyter notebook examples.
- Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
- Step 2: Define the field and face you want to access. Remember that the GEOS data is projected to a cubed sphere, so it has 6 faces.
Available options are: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W. Set the variable based on your selection:
# Example available options: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W
variable = 'u'
face=0
- Step 3: Load the IDX metadata
This step allows you to read the metadata for the selected field. You can replace the variable in the URL to choose the data you want:
field= f"https://nsdf-climate3-origin.nationalresearchplatform.org:50098/nasa/nsdf/climate3/dyamond/GEOS/GEOS_{variable.upper()}/{variable.lower()}_face_{face}_depth_52_time_0_10269.idx"
db = ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
- Step 4: Read Data
This section shows how to load the data for the specified field. You can select any timestep and region (face number) or resolution you want:
# This selects the resolution for querying the data. -15 is very coarse, 0 is full resolution.
# Be cautious: full resolution (0) may take longer to load because of the file size.
data_resolution = -6 # Try values among -15, -12, -9, -6, -3, 0
data3D = db.read(time=0, quality=data_resolution)
print(data3D.shape)
print(np.min(data3D), np.max(data3D))
2.2 Access DYAMOND Oceanic Data (LLC2160)
Below are the steps based on the GitHub instructions:
- Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
import openvisuspy as ovp
- Step 2: Define the field you want to access
#available options=[salt, theta, u, v, w]; choose one below
variable = 'salt'
- Step 3: Load the IDX metadata:
In this section, you can select any variables that you can declared in the cells above and replace it inside LoadDataset. We are just reading the metadata for the dataset here.
# Step 3: Load the LLC2160 dataset from OSDF
variable='salt' # options are: u,v,w,salt,theta
base_url= "https://nsdf-climate3-origin.nationalresearchplatform.org:50098/nasa/nsdf/climate3/dyamond/"
if variable=="theta" or variable=="w":
base_dir=f"mit_output/llc2160_{variable}/llc2160_{variable}.idx"
elif variable=="u":
base_dir= "mit_output/llc2160_arco/visus.idx"
else:
base_dir=f"mit_output/llc2160_{variable}/{variable}_llc2160_x_y_depth.idx"
field= base_url+base_dir
db=ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
- Step 4: Read Data
# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which takes a while to load because of the higher filesize.
# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).
data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.read(time=0,quality=data_resolution)
print(data3D.shape)
print(np.min(data3D),np.max(data3D))
3. Access NEX GDDP CMIP6 Data
We demonstrate how to load the data from the NEX-GDDP-CMIP6 dataset using OpenVisus and visualize it with matplotlib. Additionally, you can save the plotted data to a file. In just a few lines of Python code, you can generate a plot as shown in Figure 3.1. Feel feel to try it from binder or quarto(link at the bottom). You can use the following below to open the binder and try the notebooks. It might take a while if you are launching it for the first time
Launch Binder3.1 Table of Contents
- 3.1 Notebook Code
- 3.1.1 Loading the Data
- 3.1.2 Plotting and Saving Data
3.1 Notebook Code
Below is a sample Jupyter notebook to load one timestep of a selected variable and display it using matplotlib. Use this github example as a reference.
# import libraries
import numpy as np
import OpenVisus as ov
# Set climate variables
model = "ACCESS-CM2"
variable = "huss"
year = 2020
scenario = "ssp585"
field = f"{variable}_day_{model}_{scenario}_r1i1p1f1_gn"
# Open remote dataset to variable db
db = ov.LoadDataset(f"http://atlantis.sci.utah.edu/mod_visus?dataset=nex-gddp-cmip6&cached=arco")
print("Dataset loaded successfully!")
print(f"Available fields: {db.getFields()}")
3.1.1 Loading the Data
We load a specific timestep (for July 21, 2020) and print the information about the data.
# Set the timestep for July 21. See https://nsidc.org/data/user-resources/help-center/day-year-doy-calendar
day_of_the_year = 202
timestep = year * 365 + day_of_the_year
# Load the data into a numpy array
data = db.read(field=field, time=timestep)
print(f"Data shape: {data.shape}")
print(f"Min value: {np.min(data)}, Max value: {np.max(data)}")
3.1.2 Plotting and Saving Data
Below, we use matplotlib to plot the data and save it as a PNG image.
import matplotlib.pyplot as plt
# Plot and save data
my_cmap = 'gist_rainbow'
plt.subplots(figsize=(18, 9))
plt.imshow(data, cmap=my_cmap, origin='lower')
plt.colorbar(label=f'{variable} values')
plt.title(f'{model} {variable} {scenario} on Day {day_of_the_year}, {year}')
plt.savefig("NEX-GDDP-CMIP6_ACCESS-CM2_huss_ssp585_2020_day202.png")
plt.show()
Figure 3.1: Plot of NEX-GDDP-CMIP6 data (huss, ACCESS-CM2, ssp585)
Dashboard
Check out this NEX GDDP CMIP6 Dashboard we deployed for interactive exploration and visualization of the dataset. This dashboard allows users to select variables, timesteps, and generate visualizations interactively.
Quarto Documentation
Check out this Quarto documentation for more details on accessing the NEX-GDDP-CMIP6 data. The documentation includes step-by-step instructions for loading and visualizing climate model data using Python and OpenVisus.
Pythia OpenVisus Cookbook
The Pythia OpenVisus Cookbook provides a comprehensive guide to working with large-scale scientific data using OpenViSUS. It includes working Jupyter notebooks, conda environment setup, and practical workflows for data access, analysis, and visualization.
Motivation
OpenViSUS enables interactive analysis and visualization of petabyte-scale scientific datasets on any device. The cookbook teaches efficient storage, querying, and visualization using hierarchical Z-order data layouts.
Main Sections
- Preamble: How to cite the NSDF-OpenViSUS Cookbook.
- Introduction: Overview of the NSDF-OpenViSUS framework and its role in scientific data visualization.
- NASA DYAMOND Datasets (C1440–LLC2160): Workflows for visualizing and analyzing NASA DYAMOND atmospheric and ocean datasets.
- ECCO LLC4320 Datasets: Visualization and analysis of the ECCO LLC4320 ocean dataset, including data access and interactive exploration.
Running the Notebooks
- On Binder: Click the rocket icon in any chapter to launch an interactive Jupyter notebook in the cloud. No installation required.
- Locally: Clone the GitHub repository, create and activate the conda environment from
environment.yml
, and start JupyterLab in thenotebooks
directory.
# Clone the repository
$ git clone https://github.com/ProjectPythia/nsdf-openvisus-cookbook
$ cd nsdf-openvisus-cookbook
# Create and activate the environment
$ conda env create -f environment.yml
$ conda activate nsdf-cookbook
# Start JupyterLab
$ cd notebooks
$ jupyter lab
References & Further Reading
- National Science Data Fabric
- OpenVisus
- OpenVisuspy
- Web-based Visualization and Analytics of Petascale data
- Interactive Visualization of Terascale Data in the Browser
- Fast Multiresolution Reads of Massive Simulation Datasets
For questions, contact Aashish Panta, Giorgio Scorzelli, or Valerio Pascucci.