Skip to main content

Exploring the USGS Science Data Life Cycle in the Cloud


Release Date


Executive Summary Traditionally in the USGS, data is processed and analyzed on local researcher computers, then moved to centralized, remote computers for preservation and publishing (ScienceBase, Pubs Warehouse). This approach requires each researcher to have the necessary hardware and software for processing and analysis, and also to bring all external data required for the workflow over the internet to their local computer. To explore a more efficient and effective scientific workflow, we explored an alternate model: storing scientific data remotely, and performing data analysis and visualization close to the data, using only a local web browser as an interface. Although this environment was not a good fit for the policies of [...]


Attached Files

Click on title to download individual files attached to this item.

“Example using Jupyter notebooks and USGS Yeti HPC to compute on an 80GB dataset.”
thumbnail 304.92 KB image/png

Project Extension

productDescriptionWe will create an implementation of the THREDDS Data Server and JupyterHub in the USGS Cloud Hosting Solutions environment
productDescriptionWe will spin up a Microsoft Windows server instance to run the Delft Flexible Mesh model, so data will be created directly in the Cloud
productDescriptionWe will also spin up multiple Linux server instances and deploy JupyterHub and the THREDDS Data Server via Docker containers
productDescriptionWe will also provide enhancements to the existing pyugrid Python package that implements the UGRID community conventions for unstructured grid models.

Example using Jupyter notebooks and USGS Yeti HPC to compute on an 80GB dataset.
Example using Jupyter notebooks and USGS Yeti HPC to compute on an 80GB dataset.


  • Community for Data Integration (CDI)



Data source
Input directly

Item Actions

View Item as ...

Save Item as ...

View Item...