Examples

Prerequisites

To provide progressive streaming capability for large datasets, the data has been converted to OpenVisus IDX format.

Users can create a new Python environment and install the required libraries with the following steps:

Step 1: Create a new virtual enviroment using python


# Create a python virtual environment
python -m venv .venv

Step 2: Activate the environment you just created


# Activate the environment
source .venv/bin/activate

Step 3: Install required libraries


# Install required libraries
python -m pip install --verbose --no-cache --no-warn-script-location boto3 colorcet fsspec numpy imageio pympler==1.0.1 urllib3 pillow xarray xmltodict plotly requests scikit-image scipy seaborn tifffile pandas tqdm matplotlib zarr altair cartopy dash fastparquet lxml numexpr scikit-learn sqlalchemy  xlrd yfinance pyarrow pydeck netcdf4 nexpy nexusformat nbgitpuller intake ipysheet ipywidgets bokeh ipywidgets-bokeh panel pyvista trame trame-vtk trame-vuetify notebook "jupyterlab==3.6.6" jupyter_bokeh openvisuspy jupyter-server-proxy jupyterlab-system-monitor "pyviz_comms>=2.0.0,<3.0.0" "jupyterlab-pygments>=0.2.0,<0.3.0"

Step 4: Install OpenVisus

 
# Install OpenVisus and Openvisuspy
python -m pip install OpenVisus openvisuspy

Binder

You can use the following link to open the binder and try the notebooks. It might take a while if you are launching it for the first time

Launch Binder

Conda Environment File

For convenience, here is a conda environment file you can use to create the environment. Save it as a environment.yml file and create the environment using conda env create -f environment.yml.
If you need more instructions on how to manage conda environments, please check the offical documentation here.


# environment.yml file
name: scivis2026
channels:
  - conda-forge
dependencies:
  - python=3.8
  - boto3
  - colorcet
  - fsspec
  - numpy
  - imageio
  - pympler=1.0.1
  - urllib3
  - pillow
  - xarray
  - xmltodict
  - plotly
  - requests
  - scipy
  - seaborn
  - tifffile
  - pandas
  - matplotlib
  - cartopy
  - fastparquet
  - lxml
  - numexpr
  - sqlalchemy
  - statsmodels
  - xlrd
  - intake
  - ipysheet
  - ipywidgets
  - bokeh
  - openvisuspy
  - ipywidgets-bokeh
  - panel
  - notebook
  - jupyterlab=3.6.6
  - jupyter_bokeh
  - jupyter-server-proxy
  - jupyterlab-system-monitor
  - pyviz_comms>=2.0.0,<3.0.0
  - jupyterlab-pygments>=0.2.0,<0.3.0
  - OpenVisus

Binder

You can use the following link to open the binder and try the notebooks. It might take a while if you are launching it for the first time

Launch Binder

Access LLC4320 ECCO Data

Below are the steps based on the GitHub instructions. Check out this Github Repo for examples.

Step 1: Importing Libraries


import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
import openvisuspy as ovp

Step 2: Define the field you want to access


#available options=[salt, theta, u, v, w]; choose one below.. lets say, we select w for now 
variable = 'w'

Step 3: Load the IDX metadata:


# Step 3: Load the 4320 dataset from OSDF.. if  salt or theta is selected above, change climate2 to climate1 below
field= f"pelican://osg-htc.org/nasa/nsdf/climate2/llc4320/idx/w/w_llc4320_x_y_depth.idx"

db=ovp.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')

Step 4: Read Data (Since the data is very large, I am only extracting one level. Check data descriptions for more details.)


# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which  takes a while to load because of the higher filesize.

# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).

data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.db.read(time=0,quality=data_resolution, z=[0,1])  # Since the data is very large, I am only extracting one level.
print(data3D.shape)
print(np.min(data3D),np.max(data3D))

2. Access DYAMOND Data (Atmospheric - GEOS and Oceanic - LLC2160)

You can follow these steps to access the DYAMOND atmospheric (GEOS) and oceanic (LLC2160) data. You can find individual data description and fields description in the Data Section

2.1 Access DYAMOND Atmospheric Data (GEOS)

Binder

You can use the following link to open the binder and try the notebooks. It might take a while if you are launching it for the first time

Launch Binder

Below are the steps to access the DYAMOND Atmospheric (GEOS) data. Check out this github repo for more jupyter notebook examples.

Step 1: Importing Libraries


import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov

Step 2: Define the field and face you want to access. Remember that the GEOS data is projected to a cubed sphere, so it has 6 faces.

Available options are: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W. Set the variable based on your selection:


# Example available options: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W
variable = 'u'
face=0

Step 3: Load the IDX metadata

This step allows you to read the metadata for the selected field. You can replace the variable in the URL to choose the data you want:


field= f"https://nsdf-climate3-origin.nationalresearchplatform.org:50098/nasa/nsdf/climate3/dyamond/GEOS/GEOS_{variable.upper()}/{variable.lower()}_face_{face}_depth_52_time_0_10269.idx"

db = ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')

Step 4: Read Data

This section shows how to load the data for the specified field. You can select any timestep and region (face number) or resolution you want:


# This selects the resolution for querying the data. -15 is very coarse, 0 is full resolution.
# Be cautious: full resolution (0) may take longer to load because of the file size.

data_resolution = -6  # Try values among -15, -12, -9, -6, -3, 0
data3D = db.read(time=0, quality=data_resolution)
print(data3D.shape)
print(np.min(data3D), np.max(data3D))

2.2 Access DYAMOND Oceanic Data (LLC2160)

Below are the steps based on the GitHub instructions:

Step 1: Importing Libraries


import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
import openvisuspy as ovp

Step 2: Define the field you want to access


#available options=[salt, theta, u, v, w]; choose one below
variable = 'salt'

Step 3: Load the IDX metadata:


# Step 3: Load the LLC2160 dataset from OSDF

variable='salt' # options are: u,v,w,salt,theta

base_url= "https://nsdf-climate3-origin.nationalresearchplatform.org:50098/nasa/nsdf/climate3/dyamond/"
if variable=="theta" or variable=="w":
    base_dir=f"mit_output/llc2160_{variable}/llc2160_{variable}.idx"
elif variable=="u":
    base_dir= "mit_output/llc2160_arco/visus.idx"
else:
    base_dir=f"mit_output/llc2160_{variable}/{variable}_llc2160_x_y_depth.idx"

field= base_url+base_dir

db=ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')

Step 4: Read Data


# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which  takes a while to load because of the higher filesize.

# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).

data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.read(time=0,quality=data_resolution)
print(data3D.shape)
print(np.min(data3D),np.max(data3D))

3. Access NEX GDDP CMIP6 Data

We demonstrate how to load the data from the NEX-GDDP-CMIP6 dataset using OpenVisus and visualize it with matplotlib. Additionally, you can save the plotted data to a file. In just a few lines of Python code, you can generate a plot as shown in Figure 3.1. Feel feel to try it from binder or quarto(link at the bottom). You can use the following below to open the binder and try the notebooks. It might take a while if you are launching it for the first time

Launch Binder

3.1 Notebook Code

Below is a sample Jupyter notebook to load one timestep of a selected variable and display it using matplotlib. Use this github example as a reference.


# import libraries
import numpy as np
import OpenVisus as ov

# Set climate variables
model     = "ACCESS-CM2"
variable  = "huss" 
year      = 2020
scenario  = "ssp585"
field = f"{variable}_day_{model}_{scenario}_r1i1p1f1_gn"

# Open remote dataset to variable db
db = ov.LoadDataset(f"http://atlantis.sci.utah.edu/mod_visus?dataset=nex-gddp-cmip6&cached=arco")
print("Dataset loaded successfully!")
print(f"Available fields: {db.getFields()}")

3.1.1 Loading the Data

We load a specific timestep (for July 21, 2020) and print the information about the data.


# Set the timestep for July 21. See https://nsidc.org/data/user-resources/help-center/day-year-doy-calendar
day_of_the_year = 202 
timestep = year * 365 + day_of_the_year

# Load the data into a numpy array
data = db.read(field=field, time=timestep)
print(f"Data shape: {data.shape}")
print(f"Min value: {np.min(data)}, Max value: {np.max(data)}")

3.1.2 Plotting and Saving Data

Below, we use matplotlib to plot the data and save it as a PNG image.


import matplotlib.pyplot as plt

# Plot and save data
my_cmap = 'gist_rainbow'
plt.subplots(figsize=(18, 9))
plt.imshow(data, cmap=my_cmap, origin='lower')
plt.colorbar(label=f'{variable} values')
plt.title(f'{model} {variable} {scenario} on Day {day_of_the_year}, {year}')
plt.savefig("NEX-GDDP-CMIP6_ACCESS-CM2_huss_ssp585_2020_day202.png")
plt.show()

Figure 3.1: Plot of NEX-GDDP-CMIP6 data (huss, ACCESS-CM2, ssp585)

Dashboard

Check out this NEX GDDP CMIP6 Dashboard we deployed for interactive exploration and visualization of the dataset. This dashboard allows users to select variables, timesteps, and generate visualizations interactively.

Quarto Documentation

Check out this Quarto documentation for more details on accessing the NEX-GDDP-CMIP6 data. The documentation includes step-by-step instructions for loading and visualizing climate model data using Python and OpenVisus.

Visualization Example: Using Matplotlib

This page will provide resources on generating visualizations using matplotlib, and much more. Whether you're new to visualizing scientific data or looking for advanced techniques, you'll find valuable information below.

Below is a basic example of how you can visualize some of the ocean data using Python and matplotlib:


import numpy as np
import matplotlib.pyplot as plt

# Example data (simulated sea surface temperature)
x = np.linspace(0, 10, 100)
y = np.linspace(0, 10, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(X) * np.cos(Y)

# Create the plot
plt.figure(figsize=(10, 6))
plt.contourf(X, Z, cmap='coolwarm')
plt.colorbar(label='Sea Surface Temperature (C)')
plt.title('Sea Surface Temperature Visualization')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

This is a simple example of visualizing 2D data. You can modify the code to work with the real dataset, adding more complexity and details as needed.

Other Visualization Tools

Seaborn – An advanced Python library for statistical data visualization.
ParaView – For handling large datasets and creating 3D visualizations.
Bokeh – Interactive visualization in modern web browsers.

Data Analysis Resources

For advanced data analysis techniques, we recommend using libraries like NumPy, Pandas, and Xarray. These libraries allow you to handle multi-dimensional arrays and efficiently work with large-scale scientific data.

Learning Resources

If you're new to scientific computing and visualization, the following resources may be helpful:

Examples for Data Access and Visualization

Prerequisites

Binder

Conda Environment File

Binder

Access LLC4320 ECCO Data

2. Access DYAMOND Data (Atmospheric - GEOS and Oceanic - LLC2160)

2.1 Access DYAMOND Atmospheric Data (GEOS)

Binder

2.2 Access DYAMOND Oceanic Data (LLC2160)

3. Access NEX GDDP CMIP6 Data

3.1 Table of Contents

3.1 Notebook Code

3.1.1 Loading the Data

3.1.2 Plotting and Saving Data

Figure 3.1: Plot of NEX-GDDP-CMIP6 data (huss, ACCESS-CM2, ssp585)

Dashboard

Quarto Documentation

Visualization Example: Using Matplotlib

Other Visualization Tools

Data Analysis Resources

Learning Resources