1. Prepare data and metadata
2. Create a new record in IPDS
3. Create a new landing page
4. Finalize metadata
5. Decide how to organize and display data
6. Upload files and edit the landing page
7. Format citation
8. Final steps
► For an overview of the ScienceBase data release process, view the tutorial video.
Frequently Asked Questions
- Where can I find information about how to create and/or review a metadata record?
- How can I grant read/write permissions to USGS and non-USGS users while a data release is still in progress?
- What if I need to update my data after they have been released?
- Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
- Why is CSV format recommended instead of Excel?
- What is the file size limit for uploading and downloading files?
- Can I release legacy data in ScienceBase?
- A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
- Which repository should I use to release code?
- What repository services does ScienceBase provide for USGS data release products?
Links to additional information:
- The USGS Fundamental Science Practices (FSP) website contains an FAQ page about data release and a guide to the publishing path options.
- The USGS data management website contains a guide to the steps of data release, with links to tools and resources.
- Before beginning the ScienceBase data release workflow, scientists should view the ScienceBase User Agreement.
- Options to browse the public data release products in ScienceBase.
A data release should contain only data and metadata.
- A best practice is to release data in an open, machine-readable format. For example, tablular data in .csv or .txt format is preferrable to Excel.
- Data obtained from published sources do not need to be included - simply document the source and methods in your metadata.
- Proprietary or sensitive data should not be included.
- Metadata should be in XML format and should conform to an FGDC-endorsed metadata standard, FGDC CSDGM* or ISO**.
*Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata **International Organization for Standardization
- Learn about metadata creation tools and how to finalize metadata for ScienceBase.
Tutorial Video: Structuring and Documenting a USGS Public Data Release can help you decide how many data and metadata files to include in your data release.
According to USGS Fundamental Science Practices (FSP) guidance, a data release is an information product that is non-interpretive and does not include extended descriptions beyond what is required in the full metadata record. Extended text descriptions, figures, maps, and files in PDF format are more appropriate for USGS series publications handled by the USGS Science Publishing Network (SPN).
► Data and metadata should be reviewed and approved according to the USGS Fundamental Science Practices (FSP) process.
The review process is tracked in the Information Product Data System (IPDS). When you create a new record in IPDS, select "Data Release" in the Product Type dropdown menu.
New records in IPDS are assigned an IP number. Each ScienceBase data release product should correspond to only one IP number. That is, materials that are reviewed and approved together in IPDS should be released together in one data release product. Materials that are reviewed and approved separately should be released as separate products.
Data releases often have associated manuscripts that also go through review. In these cases, the review processes are separate. There should be an IPDS record for the data release and another for the manuscript.
For more information on data and metadata reviews, see section 5 on the USGS data management website page: https://www2.usgs.gov/datamanagement/share/datarelease.php.
Sign in to the ScienceBase Data Release Tool and provide some basic information about the data you are releasing. Upon submitting the form, you will receive an automated email with a link to your new landing page and a reserved Digital Object Identifier (DOI).
Note: The following instructions are for metadata records in the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) format. The USGS metadata creation tools, the Online Metadata Editor, the Metadata Wizard (Esri toolbox), and the Metadata Wizard 2.x (stand alone desktop application), create metadata in this format.
Digital Object Identifier
USGS policy requires the use of a Digital Object Identifier (DOI) for data releases. If you use the ScienceBase Data Release Tool to start a new data release, you will have the option to reserve a DOI for your landing page.
► Add your full DOI URL (i.e., https://doi.org/10.5066/xxxxxxxx) to your metadata:
- Add the DOI to the online linkage element (<onlink>) in the citation information section of your metadata (instructions). Full xml path: idinfo/citation/citeinfo/onlink
- We also recommend adding the DOI to the network resource element (<networkr>) in the distribution section (some advanced metadata authors use this field for direct data download links). Full xml path: distinfo/stdorder/digform/digtopt/onlineopt/computer/networka/networkr
Please include the following content in the distribution section of your metadata (instructions).
► Distribution contact information: If you search for "GS ScienceBase" in the directory look-up tool of the Online Metadata Editor or the Metadata Wizard, you can auto-populate this content into your metadata.
- Contact Organization and/or Contact Person: "U.S. Geological Survey - ScienceBase"
- Contact Address: "Denver Federal Center, Building 810, Mail Stop 302" "Denver" "CO" "80225"
- Contact Phone: "1-888-275-8747"
- Contact Email: "firstname.lastname@example.org"
► Distribution liability statement: please select the USGS disclaimer statement(s) that are relevant to your data release. Disclaimer statements are available on the FSP website.
ScienceBase data releases can be organized in several ways. The ScienceBase team has recorded a tutorial video to help scientists determine the best way to structure and document their data releases.
Please upload only one metadata record per page in ScienceBase (it is possible to upload additional records if they are in zipped files). This is because the USGS Science Data Catalog, which harvests metadata records from data releases in ScienceBase, can only pull one metadata record from a page.
► If you have one metadata record to describe your data, upload your files directly to the landing page (example).
► If you have multiple metadata records and data sets, you have two options:
- Create subpages that are nested under the landing page (example). Use this option if you would like your data sets to be independently discoverable. Nested pages in ScienceBase are called "child items". To create a new child item, click the "Add" dropdown menu, then select “Add Child Item”. On each child item, upload one metadata record and its associated data file(s). All metadata records will be harvested by the Science Data Catalog.
- Upload data and metadata directly to the landing page in zipped bundles (example). There should be one metadata record uploaded separately - a summary metadata record that describes the entire data release. The summary metadata record will be the only one harvested by the Science Data Catalog.
If you would like to display an image on a ScienceBase page, upload the image in .JPG or .PNG format. The image will be automatically displayed on the page.
ScienceBase can generate web services for certain geospatial file types (shapefiles, GeoTIFF and ESRI Service Definition (.SD) files). The web services can be used to serve the data to outside applications and to display the data in the preview map on a ScienceBase page. For more information, see the ScienceBase Geospatial Services page.
Note: The current file size limit for uploads in ScienceBase is 10 GB. If your file sizes exceed 10 GB per file, please contact email@example.com. Also note that there is a 100 file limit, in terms of the number of files that can be attached to a single item. A zip file bundle can contain multiple files. For guidance on structuring data for efficient release and optimizing presentation, please review point #5 above or contact the ScienceBase Data Release team (firstname.lastname@example.org).
► The most efficient way to populate an empty ScienceBase page is to start by uploading an XML metadata record in an FGDC-endorsed format. Click the "Add" dropdown menu on the upper right side of the page, then select "Attach Files":
When you upload a metadata record, ScienceBase will recognize the format and bring up a popup window to ask if you would like to pull content from the metadata:
Select "Yes" to automatically populate the key fields in the edit form. You may still need to manually edit some of the information. Click "Save" to save your changes.
► To edit your page, click the "Manage Item" dropdown menu on the upper right side of the page, then select "Edit Item":
► To add a child item (subpage nested under the landing page), click the "Add" dropdown menu, then select "Add Child Item":
► If you need to give additional people access to your ScienceBase item while it is private, click the "Manage Item" dropdown menu, then select "Manage Item Permissions". You can then search for ScienceBase user accounts and grant read/write permissions.
To share a private data release with people outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page:
You can generate a temporary URL to share with your reviewers, who can view the data release without having to sign up for a ScienceBase account. (Note: the data release will be locked for editing while the link is active).
The data release citation should include each author (last name, first and middle initials), the year, the title, the publication type (U.S. Geological Survey data release), and the Digital Object Identifier link. ScienceBase can automatically generate citations from the content of uploaded metadata records, but the citation format usually needs to be modified. Please verify that automatically generated citations have the correct format and author order. The citation field can be edited in the first tab of the edit form.
If a data release has child items, the ScienceBase team will propagate the landing page citation to all child items, so only the landing page citation needs to be edited.
Cartwright, J.M., 2015, Hydrologic and soil data collected in limestone cedar glades at Stones River National Battlefield, Tennessee: U.S. Geological Survey data release, https://doi.org/10.5066/F7NV9G9C.
Coates, P.S., Casazza, M.L., Ricca, M.A., Brussee., B.E., Blomberg, E.J., Gustufson, K.B., Overton, C.T., Davis, D.M., Niell, L.E., Espinosa, S.C., Gardner, S.C., and Delehanty, D.J., 2015, Integrating spatially explicit indices of abundance and habitat quality: an applied example for greater sage-grouse management: U.S. Geological Survey data release, https://doi.org/10.5066/F75D8PW8.
For more information, see guidance at: https://www.usgs.gov/products/data-and-tools/data-management/data-citation
► When you are ready to make the data release public, contact your Sciencebase point of contact or email@example.com. If your data release has a related primary publication, please share the publication's DOI or IPDS number with your point of contact.
Your point of contact will check the data release against the checklist and share any recommendations they have. Please allow up to 2 business days for completion of this step.
When the data release has been finalized, the ScienceBase team will make it public and it will no longer be open for modifications. They will also register the DOI so that it's an active link.
You can use the recommended citation on the landing page to cite your data. If you cite the data in a publication, please send the publication's citation to firstname.lastname@example.org so that it can be added to the landing page.
Frequently Asked Questions
The USGS data management website: https://www2.usgs.gov/datamanagement/describe/metadata.php.
The USGS has two tools for metadata creation: the Online Metadata Editor (OME) and the Metadata Wizard. In both tools, users fill out a form by answering questions about their data. They can then generate and output XML metadata records in the correct format. The OME is an online application and the Metadata Wizard is a desktop application. The Wizard is recommended for geospatial data and tabular data because it has the ability to parse information from certain geospatial file types, as well as automate the process of describing column (and value) definitions.
The USGS Metadata Parser tool (https://mrdata.usgs.gov/validation/) allows users to validate an XML metadata file against the FGDC CSDGM standard and view it in an easy-to-read format.
How can I grant read/write permissions to USGS and non-USGS users while a data release is still in progress?
- To give permissions to USGS employees and other users with ScienceBase accounts, select the "Manage Item" dropdown menu, then "Manage Item Permissions":
Select "Custom Permissions". Enter a user’s name or email address into the "User" text box. Wait for the autocomplete to find the user's ScienceBase account, then select it and click "Add".
ScienceBase accounts are automatically created for users the first time they log in with their Active Directory credentials. If someone hasn't logged in to ScienceBase before, they won’t yet have an account. Users without Active Directory credentials can request a ScienceBase account if they are collaborating with USGS partners.
- To share a private data release with someone outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page:
Select "Create New Anonymous Entry Link". This will create a temporary URL you can share with reviewers, allowing them to view the data release without having to sign up for a ScienceBase account. The data release will be locked for editing while the link is active. To unlock, select "Manage Anonymous Access Links" again and remove the link.
The USGS Fundamental Science Practices (FSP) website describes procedures for documenting revisions to data releases. Please follow this guidance if you need to correct or add to published data. Contact the ScienceBase team at email@example.com when you are ready to update your data release.
Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
Yes, by default ScienceBase will automatically perform this function for authors. Metadata records attached to a formal USGS data release product in ScienceBase will be sent to the USGS Science Data Catalog (SDC) after the data release is finalized.
Some science centers and programs have alternate methods of submitting metadata records to the SDC and may not wish for their records to be sent from ScienceBase. This option is also supported; ScienceBase keeps a list of these centers, and XML records associated with their data release products will not be sent from ScienceBase. If you would like to add your center to this list, please contact firstname.lastname@example.org.
Comma-separated values format (.csv) is preferable to Microsoft Excel format (.xlsx) because .csv is often more machine-readable and can be more easily incorporated into other workflows. While both .csv and .xlsx are considered open formats (that is, you don't need proprietary software to view them), .xlsx supports features that can make it less machine-readable. For example, if there are multiple worksheets in an Excel workbook or if some of the information is conveyed through formatting, it would be more difficult to use or work with the data in other applications (e.g. Python, R).
The current file size limit for uploads and downloads in ScienceBase is 10GB. Files larger than 1GB should be uploaded using the Large File Uploader tool available in the “Item Actions” section at the bottom of a ScienceBase page.
Yes, but ScienceBase has a formal process for publicly releasing data, which enables the ScienceBase team to catalog, track, and update these resources in a uniform way. If you would like to release your legacy data in ScienceBase, you will need to go through FSP review and work with the ScienceBase team.
A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
A). The citation will be added to the landing page in the "Related External Resources" section (see example). In associated publications, data release citations should be included in the reference section. USGS publications have links to their associated data releases at the top of their landing pages in the USGS Publications Warehouse.
B). Yes, a publication’s citation can be added to a data release at any time, even after it has been made public and the edit permissions have been restricted. If you would like to add a citation to a public data release, please send the citation to email@example.com (or to someone on the ScienceBase team) and we’ll add it to the landing page. If you’ve updated the metadata to include the publication’s citation, please also send the most recent version of the metadata and we’ll replace the metadata in the data release.
The recommended repository for software is USGS GitLab (https://code.usgs.gov), a Git-based platform for software development. Users can mint a DOI using the USGS DOI Tool to point to the software in GitLab.
If a data release has associated code (e.g., a Python script used to process the data), it can be included as part of the data release in ScienceBase. All code uploaded to ScienceBase must be well-documented.
ScienceBase supports the following services:
- Providing reliable access to public data release items
- Curating landing page content
- Creating multiple backups of data and metadata
- Calculating checksums to ensure file integrity
- Directing inquiries about the data to the point of contact listed for the data release
Science centers / data authors are responsible for the following:
- Answering questions about the data
- Correcting any errors discovered in the data
- Records management and data archival responsibilities for internal Bureau purposes (e.g., Scientific Case Files) according to the USGS Records Program. These responsibilities extend beyond public data access requirements for open data. Contact your local Records Management Contact or the USGS Records Management Program at firstname.lastname@example.org for additional information.
- Performing file format migrations or data transcriptions, if necessary