Prerequisites
To provide progressive streaming capability for large datasets, the data has been converted to OpenVisus IDX format.
Users can create a new Python environment and install the required libraries with the following steps:
Step 1: Create a new virtual enviroment using python
# Create a python virtual environment
python -m venv .venv
Step 2: Activate the environment you just created
# Activate the environment
source .venv/bin/activate
Step 3: Install required libraries
# Install required libraries
python -m pip install --verbose --no-cache --no-warn-script-location boto3 colorcet fsspec numpy imageio pympler==1.0.1 urllib3 pillow xarray xmltodict plotly requests scikit-image scipy seaborn tifffile pandas tqdm matplotlib zarr altair cartopy dash fastparquet lxml numexpr scikit-learn sqlalchemy xlrd yfinance pyarrow pydeck netcdf4 nexpy nexusformat nbgitpuller intake ipysheet ipywidgets bokeh ipywidgets-bokeh panel pyvista trame trame-vtk trame-vuetify notebook "jupyterlab==3.6.6" jupyter_bokeh jupyter-server-proxy jupyterlab-system-monitor "pyviz_comms>=2.0.0,<3.0.0" "jupyterlab-pygments>=0.2.0,<0.3.0"
Step 4: Install OpenVisus
# Install OpenVisus
python -m pip install OpenVisus
Conda Environment File
For convenience, here is a conda environment file you can use to create the environment. Save it as a environment.yml
file and create the environment using conda env create -f environment.yml
.
If you need more instructions on how to manage conda environments, please check the offical documentation here.
# environment.yml file
name: scivis2026
channels:
- conda-forge
dependencies:
- python=3.8
- boto3
- colorcet
- fsspec
- numpy
- imageio
- pympler=1.0.1
- urllib3
- pillow
- xarray
- xmltodict
- plotly
- requests
- scipy
- seaborn
- tifffile
- pandas
- matplotlib
- cartopy
- fastparquet
- lxml
- numexpr
- sqlalchemy
- statsmodels
- xlrd
- intake
- ipysheet
- ipywidgets
- bokeh
- ipywidgets-bokeh
- panel
- notebook
- jupyterlab=3.6.6
- jupyter_bokeh
- jupyter-server-proxy
- jupyterlab-system-monitor
- pyviz_comms>=2.0.0,<3.0.0
- jupyterlab-pygments>=0.2.0,<0.3.0
- OpenVisus
Access LLC4320 ECCO Data
Below are the steps based on the GitHub instructions. Check out this <a href=https://github.com/sci-visus/sciviscontest2026/blob/main/notebooks_examples/ieee_scivis_dyamond_ocean-Copy1.ipynb"">Github Repo</a> for examples.
Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
Step 2: Define the field you want to access
#available options=[salt, theta, u, v, w]; choose one below
variable = 'theta'
Step 3: Load the IDX metadata:
In this section, you can select any variables that you can declared in the cells above and replace it inside LoadDataset. We are just reading the metadata for the dataset here. PUT PERMANENT OSDF LINK... ALSO GENERATE PUBLIC KEYS FOR READ ACCESS
# Step 3: Load the 4320 dataset from OSDF
field= f"https://s3.nsdf.chtc.io/nasa-ecco/llc4320/idx/{variable}/{variable}_llc4320_x_y_depth.idx?access_key=any&secret_key=any&endpoint_url=https://s3.nsdf.chtc.io"
db=ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
Step 4: Read Data (Since the data is very large, I am only extracting one level. Check data descriptions for more details.)
# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which takes a while to load because of the higher filesize.
# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).
data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.read(time=0,quality=data_resolution, z=[0,1]) # Since the data is very large, I am only extracting one level.
print(data3D.shape)
print(np.min(data3D),np.max(data3D))
2. Access DYAMOND Data (Atmospheric - GEOS and Oceanic - LLC2160)
You can follow these steps to access the DYAMOND atmospheric (GEOS) and oceanic (LLC2160) data. You can find individual data description and fields description in the Data Section
2.1 Access DYAMOND Atmospheric Data (GEOS)
Below are the steps to access the DYAMOND Atmospheric (GEOS) data. Check out this github repo for more jupyter notebook examples.
Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
Step 2: Define the field and face you want to access. Remember that the GEOS data is projected to a cubed sphere, so it has 6 faces.
Available options are: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W. Set the variable based on your selection:
# Example available options: CO, CO2, DELP, DTHDT, DTHDTCN, FCLD, H, P, P_TAVG, QI, QL, QV, RI, RL, T, U, V, W
variable = 'CO'
face=0
Step 3: Load the IDX metadata
This step allows you to read the metadata for the selected field. You can replace the variable in the URL to choose the data you want:
field= f"https://maritime.sealstorage.io/api/v0/s3/utah/nasa/dyamond/GEOS/GEOS_{variable}/{variable}_face_{face}_depth_52_time_0_10269.idx?access_key=any&secret_key=any&endpoint_url=https://maritime.sealstorage.io/api/v0/s3&cached=arco"
db = ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
This section shows how to load the data for the specified field. You can select any timestep and region (face number) or resolution you want:
# This selects the resolution for querying the data. -15 is very coarse, 0 is full resolution.
# Be cautious: full resolution (0) may take longer to load because of the file size.
data_resolution = -6 # Try values among -15, -12, -9, -6, -3, 0
data3D = db.read(time=0, quality=data_resolution)
print(data3D.shape)
print(np.min(data3D), np.max(data3D))
2.2 Access DYAMOND Oceanic Data (LLC2160)
Below are the steps based on the GitHub instructions:
Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import OpenVisus as ov
Step 2: Define the field you want to access
#available options=[salt, theta, u, v, w]; choose one below
variable = 'salt'
Step 3: Load the IDX metadata:
In this section, you can select any variables that you can declared in the cells above and replace it inside LoadDataset. We are just reading the metadata for the dataset here.
# Step 3: Load the LLC2160 dataset from Sealstorage
field= "https://maritime.sealstorage.io/api/v0/s3/utah/nasa/dyamond/mit_output/llc2160_arco/visus.idx?access_key=any&secret_key=any&endpoint_url=https://maritime.sealstorage.io/api/v0/s3&cached=arco" if variable=="salt" else f"https://maritime.sealstorage.io/api/v0/s3/utah/nasa/dyamond/mit_output/llc2160_{variable}/{variable}_llc2160_x_y_depth.idx?access_key=any&secret_key=any&endpoint_url=https://maritime.sealstorage.io/api/v0/s3&cached=arco"
db=ov.LoadDataset(field)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')
# This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which takes a while to load because of the higher filesize.
# here you can select the resoution at which you query the data: -15 is very coarse, 0 is full resoltuon (dangerous since you may fetch a lot of data and wait a long time).
data_resolution = -9 # try values among -15, -12, -9, -6, -3, 0
data3D=db.read(time=0,quality=data_resolution)
print(data3D.shape)
print(np.min(data3D),np.max(data3D))
3. Access NEX GDDP CMIP6 Data
We demonstrate how to load the data from the NEX-GDDP-CMIP6 dataset using OpenVisus and visualize it with matplotlib. Additionally, you can save the plotted data to a file. In just a few lines of Python code, you can generate a plot as shown in Figure 3.1.
3.1 Table of Contents
3.1 Notebook Code
Below is a sample Jupyter notebook to load one timestep of a selected variable and display it using matplotlib.
Use this github example as a reference.
# import libraries
import numpy as np
import OpenVisus as ov
# Set climate variables
model = "ACCESS-CM2"
variable = "huss"
year = 2020
scenario = "ssp585"
field = f"{variable}_day_{model}_{scenario}_r1i1p1f1_gn"
# Open remote dataset to variable db
db = ov.LoadDataset(f"http://atlantis.sci.utah.edu/mod_visus?dataset=nex-gddp-cmip6&cached=arco")
print("Dataset loaded successfully!")
print(f"Available fields: {db.getFields()}")
3.1.1 Loading the Data
We load a specific timestep (for July 21, 2020) and print the information about the data.
# Set the timestep for July 21. See https://nsidc.org/data/user-resources/help-center/day-year-doy-calendar
day_of_the_year = 202
timestep = year * 365 + day_of_the_year
# Load the data into a numpy array
data = db.read(field=field, time=timestep)
print(f"Data shape: {data.shape}")
print(f"Min value: {np.min(data)}, Max value: {np.max(data)}")
3.1.2 Plotting and Saving Data
Below, we use matplotlib to plot the data and save it as a PNG image.
import matplotlib.pyplot as plt
# Plot and save data
my_cmap = 'gist_rainbow'
plt.subplots(figsize=(18, 9))
plt.imshow(data, cmap=my_cmap, origin='lower')
plt.colorbar(label=f'{variable} values')
plt.title(f'{model} {variable} {scenario} on Day {day_of_the_year}, {year}')
plt.savefig("NEX-GDDP-CMIP6_ACCESS-CM2_huss_ssp585_2020_day202.png")
plt.show()
Figure 3.1: Plot of NEX-GDDP-CMIP6 data (huss, ACCESS-CM2, ssp585)
Dashboard
Check out this NEX GDDP CMIP6 Dashboard we deployed for interactive exploration and visualization of the dataset. This dashboard allows users to select variables, timesteps, and generate visualizations interactively.
Quarto Documentation
Check out this Quarto documentation for more details on accessing the NEX-GDDP-CMIP6 data. The documentation includes step-by-step instructions for loading and visualizing climate model data using Python and OpenVisus.
Visualization Example: Using Matplotlib
This page will provide resources on generating visualizations using matplotlib, and much more. Whether you're new to visualizing scientific data or looking for advanced techniques, you'll find valuable information below.
Below is a basic example of how you can visualize some of the ocean data using Python and matplotlib
:
import numpy as np
import matplotlib.pyplot as plt
# Example data (simulated sea surface temperature)
x = np.linspace(0, 10, 100)
y = np.linspace(0, 10, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(X) * np.cos(Y)
# Create the plot
plt.figure(figsize=(10, 6))
plt.contourf(X, Z, cmap='coolwarm')
plt.colorbar(label='Sea Surface Temperature (C)')
plt.title('Sea Surface Temperature Visualization')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
This is a simple example of visualizing 2D data. You can modify the code to work with the real dataset, adding more complexity and details as needed.
Other Visualization Tools
- Seaborn – An advanced Python library for statistical data visualization.
- ParaView – For handling large datasets and creating 3D visualizations.
- Bokeh – Interactive visualization in modern web browsers.
Data Analysis Resources
For advanced data analysis techniques, we recommend using libraries like NumPy, Pandas, and Xarray. These libraries allow you to handle multi-dimensional arrays and efficiently work with large-scale scientific data.
Learning Resources
If you're new to scientific computing and visualization, the following resources may be helpful: