1. Prepare data and metadata
2. Create a new record in IPDS
3. Create a new landing page
4. Finalize metadata
5. Decide how to organize and display data
6. Upload files and edit the landing page
7. Format citation
8. Final steps
► For an overview of the ScienceBase data release process, view the tutorial video.
Frequently Asked Questions
- Where can I find information about how to create and/or review a metadata record?
- How can I give other people permission to view the data release when it’s still in progress?
- What if I need to update my data after they have been released?
- Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
- Why is CSV format recommended instead of Excel?
- What is the file size limit for uploading and downloading files?
- Can I release legacy data in ScienceBase?
- A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
- Which repository should I use to release code?
Links to additional information:
- The USGS Fundamental Science Practices (FSP) website contains an FAQ page about data release and a guide to the publishing path options.
- The USGS data management website contains a guide to the steps of data release, with links to tools and resources.
- Before beginning the ScienceBase data release workflow, scientists should view the ScienceBase User Agreement.
A data release should contain only data and metadata.
- A best practice is to release data in an open, machine-readable format. For example, tablular data in .csv or .txt format is preferrable to Excel.
- Data obtained from published sources do not need to be included - simply document the source and methods in your metadata.
- Proprietary or sensitive data should not be included.
- Metadata should be in XML format and should conform to an FGDC-endorsed metadata standard, FGDC CSDGM* or ISO**.
*Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata **International Organization for Standardization
- Learn about metadata creation tools and how to finalize metadata for ScienceBase.
Tutorial Video: Structuring and Documenting a USGS Public Data Release can help you decide how many data and metadata files to include in your data release.
According to USGS Fundamental Science Practices (FSP) guidance, a data release is an information product that is non-interpretive and does not include extended descriptions beyond what is required in the full metadata record. Extended text descriptions, figures, maps, and files in PDF format are more appropriate for USGS series publications handled by the USGS Science Publishing Network (SPN). ScienceBase may be used to distribute model archives and all of their constituent files as USGS data releases, per the Office of Groundwater policy memo 2016.02.
► Data and metadata should be reviewed and approved according to the USGS Fundamental Science Practices (FSP) process.
The review process is tracked in the Information Product Data System (IPDS). When you create a new record in IPDS, select "Data Release" in the Product Type dropdown menu.
New records in IPDS are assigned an IP number. Each ScienceBase data release product should correspond to only one IP number. That is, materials that are reviewed and approved together in IPDS should be released together in one data release product. Materials that are reviewed and approved separately should be released as separate products.
Data releases often have associated manuscripts that also go through review. In these cases, the review processes are separate. There should be an IPDS record for the data release and another for the manuscript.
For more information on data and metadata reviews, see section 5 on the USGS data management website page: https://www2.usgs.gov/datamanagement/share/datarelease.php.
Sign in to the ScienceBase Data Release Tool and provide some basic information about the data you are releasing. Upon submitting the form, you will receive an automated email with a link to your new landing page and a reserved Digital Object Identifier (DOI).
Note: The following instructions are for metadata records in the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) format. The USGS metadata creation tools, the Online Metadata Editor and the Metadata Wizard, create metadata in this format.
Digital Object Identifier
USGS policy requires the use of a Digital Object Identifier (DOI) for data releases. If you use the ScienceBase Data Release Tool to start a new data release, you will have the option to reserve a DOI for your landing page.
► Add your full DOI URL (i.e., https://doi.org/10.5066/xxxxxxxx) to your metadata:
- Add the DOI to the online linkage element (<onlink>) in the citation information section of your metadata. Full xml path: idinfo/citation/citeinfo/onlink
- We also recommend adding the DOI to the network resource element (<networkr>) in the distribution section (some advanced metadata authors use this field for direct data download links). Full xml path: distinfo/stdorder/digform/digtopt/onlineopt/computer/networka/networkr
Please include the following content in the distribution section of your metadata. If you search for "GS ScienceBase" in the directory look-up tool of the Online Metadata Editor or the Metadata Wizard, you can auto-populate this content into your metadata.
- Contact Organization and/or Contact Person: “U.S. Geological Survey - ScienceBase”
- Contact Address: “Denver Federal Center, Building 810, Mail Stop 302” “Denver” “CO” “80225”
- Contact Phone: “1-888-275-8747”
- Contact Email: “firstname.lastname@example.org"
- Distribution Liability: please select the USGS disclaimer statement(s) that are relevant to your data release. Disclaimer statements are available at http://www.usgs.gov/fsp/fsp_disclaimers.asp.
ScienceBase data releases can be organized in several ways. The ScienceBase team has recorded a tutorial video to help scientists determine the best way to structure and document their data releases.
Please upload only one metadata record per page in ScienceBase (it is possible to upload additional records if they are in zipped files). This is because the USGS Science Data Catalog, which harvests metadata records from data releases in ScienceBase, can only pull one metadata record from a page.
► If you have one metadata record to describe your data, upload your files directly to the landing page (example).
► If you have multiple metadata records and data sets, you have two options:
- Create subpages that are nested under the landing page (example). Use this option if you would like your data sets to be independently discoverable. Nested pages in ScienceBase are called "child items". To create a new child item, click the "Add" dropdown menu, then select “Add Child Item”. On each child item, upload one metadata record and its associated data file(s). All metadata records will be harvested by the Science Data Catalog.
- Upload data and metadata directly to the landing page in zipped bundles (example). There should be one metadata record uploaded separately - a summary metadata record that describes the entire data release. The summary metadata record will be the only one harvested by the Science Data Catalog.
If you would like to display an image on a ScienceBase page, upload the image in .JPG or .PNG format. The image will be automatically displayed on the page.
ScienceBase can generate web services for certain geospatial file types (shapefiles, GeoTIFF and ESRI Service Definition (.SD) files). The web services can be used to serve the data to outside applications and to display the data in the preview map on a ScienceBase page. For more information, see the ScienceBase Geospatial Services page.
Note: The current file size limit for uploads and downloads in ScienceBase is 10GB.
► The most efficient way to populate an empty ScienceBase page is to start by uploading an XML metadata record in an FGDC-endorsed format. Click the "Add" dropdown menu on the upper right side of the page, then select "Attach Files":
When you upload a metadata record, ScienceBase will recognize the format and bring up a popup window to ask if you would like to pull content from the metadata:
Select "Yes" to automatically populate the key fields in the edit form. You may still need to manually edit some of the information. Click "Save" to save your changes.
► To edit your page, click the "Manage Item" dropdown menu on the upper right side of the page, then select "Edit Item":
► If you need to give additional people access to your ScienceBase item while it is private, click the "Manage Item" dropdown menu, then select "Manage Item Permissions":
► To add a child item (subpage nested under the landing page), click the "Add" dropdown menu, then select "Add Child Item":
The data release citation should include each author (last name, first and middle initials), the year, the title, the publication type (U.S. Geological Survey data release), and the Digital Object Identifier link. ScienceBase can automatically generate citations from the content of uploaded metadata records, but the citation format usually needs to be modified. Please verify that automatically generated citations have the correct format and author order. The citation field can be edited in the first tab of the edit form.
If a data release has child items, the ScienceBase team will propagate the landing page citation to all child items, so only the landing page citation needs to be edited.
Cartwright, J.M., 2015, Hydrologic and soil data collected in limestone cedar glades at Stones River National Battlefield, Tennessee: U.S. Geological Survey data release, https://doi.org/10.5066/F7NV9G9C.
Coates, P.S., Casazza, M.L., Ricca, M.A., Brussee., B.E., Blomberg, E.J., Gustufson, K.B., Overton, C.T., Davis, D.M., Niell, L.E., Espinosa, S.C., Gardner, S.C., and Delehanty, D.J., 2015, Integrating spatially explicit indices of abundance and habitat quality: an applied example for greater sage-grouse management: U.S. Geological Survey data release, https://doi.org/10.5066/F75D8PW8.
For more information, see guidance at: https://www2.usgs.gov/datamanagement/share/citing.php
Your point of contact will check the data release against the checklist and share any recommendations they have. Please allow up to 2 business days for completion of this step. When the data release has been finalized, the ScienceBase team will make it public and it will no longer be open for modifications.
You can use the recommended citation on the landing page to cite your data. If you do cite the data in a publication, please send the publication's citation to email@example.com so that it can be added to the landing page.
Frequently Asked Questions
The USGS data management website: http://www.usgs.gov/datamanagement/describe/metadata.php.
The USGS Fundamental Science Practices (FSP) website: http://www.usgs.gov/fsp/faqs_metadata_for_scientific_data.asp.
The USGS has two tools for metadata creation. In the Online Metadata Editor (https://www1.usgs.gov/csas/ome/), users fill out a form by answering questions about their data. The tool can then generate and output an XML metadata record. The Metadata Wizard (https://www.sciencebase.gov/metadatawizard) is a toolbox for ESRI ArcDesktop and is recommended for geospatial data. It also generates XML metadata records based on user input and has the additional capability to parse information from geospatial data and .DBF tables.
The USGS Metadata Parser tool (https://mrdata.usgs.gov/validation/) allows users to validate an XML metadata file against the FGDC CSDGM standard and view it in an easy-to-read format.
How can I give other people permission to view and edit the data release when it’s still in progress?
To give permissions to USGS employees and other users with ScienceBase accounts, select the "Manage Item" dropdown menu, then "Manage Item Permissions". Select "Custom Permissions". Enter a user’s name or email address into the "User" text box. Wait for the autocomplete to find the user's ScienceBase account, then select it and click "Add".
ScienceBase accounts are automatically created for users the first time they log in with their Active Directory credentials. If someone hasn't logged in to ScienceBase before, they won’t yet have an account. Users without Active Directory credentials can request a ScienceBase account if they are collaborating with USGS partners.
To share the link to a private data release with someone outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page. This will generate a temporary access URL that you can share with your reviewer. The URL will allow them to view the data release without having to sign up for a ScienceBase account. The data release will be locked for editing while the link is active. To unlock, select "Manage Anonymous Access Links" again and remove the link.
The USGS Fundamental Science Practices (FSP) website describes procedures for documenting revisions to data releases. Please follow this guidance if you need to correct or add to published data. Contact the ScienceBase team at firstname.lastname@example.org when you are ready to update your data release.
Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
Yes, by default ScienceBase will automatically perform this function for authors. Metadata records attached to a formal USGS data release product in ScienceBase will be sent to the USGS Science Data Catalog (SDC) after the data release is finalized.
Some science centers and programs have alternate methods of submitting metadata records to the SDC and may not wish for their records to be sent from ScienceBase. This option is also supported; ScienceBase keeps a list of these centers, and XML records associated with their data release products will not be sent from ScienceBase. If you would like to add your center to this list, please contact email@example.com.
Comma-separated values format (.csv) is preferable to Microsoft Excel format (.xlsx) because .csv is often more machine-readable and can be more easily incorporated into other workflows. While both .csv and .xlsx are considered open formats (that is, you don't need proprietary software to view them), .xlsx supports features that can make it less machine-readable. For example, if there are multiple worksheets in an Excel workbook or if some of the information is conveyed through formatting, it would be more difficult to use or work with the data in other applications (e.g. Python, R).
The current file size limit for uploads and downloads in ScienceBase is 10GB. Files larger than 1GB should be uploaded using the Large File Uploader tool available in the “Item Actions” section at the bottom of a ScienceBase page.
Yes, but ScienceBase has a formal process for publicly releasing data, which enables the ScienceBase team to catalog, track, and update these resources in a uniform way. If you would like to release your legacy data in ScienceBase, you will need to go through FSP review and work with the ScienceBase team.
A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
A). The citation will be added to the landing page in the "Related External Resources" section (see example). In associated publications, data release citations are included in the reference section. USGS publications have links to their associated data releases at the top of their landing pages in the USGS Publications Warehouse.
B). Yes, a publication’s citation can be added to a data release at any time, even after it has been made public and the edit permissions have been restricted. If you would like to add a citation to a public data release, please send the citation to firstname.lastname@example.org (or to someone on the ScienceBase team) and we’ll add it to the landing page. If you’ve updated the metadata to include the publication’s citation, please also send the most recent version of the metadata and we’ll replace the metadata in the data release.
The recommended option depends on the nature of the code.
ScienceBase could be a good option if the code isn’t going to be updated over time. Code that is associated with a data release (e.g., it was used to process the data) could be included as part of that data release in ScienceBase. Code that isn’t associated with a data release could have its own landing page in ScienceBase with a unique citation and DOI. All code uploaded to ScienceBase must have associated documentation.
Versioned software that will be updated over time would be best served using USGS BitBucket or another Version Control System (VCS) enabled repository.