Linking Publications and Data
Data are often connected to other research outputs, particularly publications. The USGS Science Data Management (SDM) Branch is working to ensure that these connections are documented and available to users of our USGS data.
Related Primary Publications
One of the most important connections is the primary publication that is written to describe the initial data collection and the first analyses of the data. This publication can provide users with additional information to help them understand the purpose and scope of the data. The SBDR team is calling this publication the "Related Primary Publication." We collect information about known related primary publications when an author first starts a data release using the ScienceBase Data Release Tool. In many cases, authors may only be able to provide the IPDS number for the related primary publication when they are publishing their data release. We now have an automated pipeline established with the USGS Publications Warehouse to collect the digital object identifier (DOI) for a publication given its IPDS number. The publication's DOI is then associated with the data release in two ways: it is added to ScienceBase on the data release landing page (see image) and is added to the DOI Tool in the data release DOI's metadata.
Tracking Data Citations
One of the reasons that we are required to publicly release data is to increase scientific productivity by allowing others to reuse existing data. Data release authors may find it helpful to understand how others are using their data so that they can improve future data releases and to provide evidence of impact. The USGS Science Data Management (SDM) Branch has been working on an automated process to track these data citations using the eXtract Dark Data Database (xDD, formally known as GeoDeepDive, https://geodeepdive.org/). xDD is a tool that enables text mining of over 12 million and counting published research documents. The SDM team is using the xDD API to track references to USGS DOIs, that is, DOIs with the '10.5066' prefix. References to USGS DOIs are stored in the USGS DOI Tool database (see image). To see where a given USGS DOI has been referenced, go to the DOI Tool and search for the data release DOI. In the future, the ScienceBase team plans to display these citations on data release landing pages. Stay tuned for more information.
Featured Data Release
Map data of landslides triggered by the 25 April 2015 Mw 7.8 Gorkha, Nepal earthquake
Science Center: Geologic Hazards Science Center
This data release, created in cooperation with partners at the University of Michigan, ETH, and in Nepal, mapped earthquake-triggered landslides using high-resolution (<1m pixel resolution) pre- and post-event satellite imagery. Since its publication in ScienceBase in 2017, the data have been cited by five publications, and the related primary publication (https://doi.org/10.1016/j.geomorph.2017.01.030) has been cited 68 times per Scopus and read 178 times on Mendeley. Additionally, the dataset helps underpin USGS models used to describe the extent and severity of landslides triggered by earthquakes. For example, one of the citing publications reused the data to propose a comprehensive method for near real-time landslide probability estimation using a logistic regression model based on slope units (Tanyas et al. 2019). Models like this are used operationally and provide situational awareness for earthquake response worldwide.
Data citation and reuse, as shown in the example above, are only one way of measuring the impact of a data release. If you know of a data product available in ScienceBase that has gone on to be reused in other projects, inform policy decisions, garner attention in major media outlets, or any other interesting use, we'd love to hear about it. Please complete this form to contribute your data story.
Tanyas, H., Rossi, M., Alvioli, M., van Westen, C.J. and Marchesini, I., 2019, A global slope unit-based method for the near real-time prediction of earthquake-induced landslides: Geomorphology, 327, pp.126-146, https://doi.org/10.1016/j.geomorph.2018.10.022.
Data citation: Roback, K., Clark, M.K., West, A.J., Zekkos, D., Li, G., Gallen, S.F., Champlain, D., and Godt, J.W., 2017, Map data of landslides triggered by the 25 April 2015 Mw 7.8 Gorkha, Nepal earthquake: U.S. Geological Survey data release, https://doi.org/10.5066/F7DZ06F9.
Image citation: Roback, K., Clark, M.K., West, A.J., Zekkos, D., Li, G., Gallen, S.F., Chamlagain, D. and Godt, J.W., 2018, The size, distribution, and mobility of landslides caused by the 2015 Mw7. 8 Gorkha earthquake, Nepal: Geomorphology, 301, pp.121-138, https://doi.org/10.1016/j.geomorph.2017.01.030.
Data Manager Resources: ScienceBase Scripts
The SBDR team recently shared a collection of ScienceBase data release scripts for use by USGS data managers. There are currently two Python scripts included in the collection: SBDR_Metrics and ScienceCenterRevisionCode.
SBDR_Metrics provides users with basic metrics about ScienceBase data releases in general, as well as within a given time period. This script can return data on the number of public, in-progress, and revised data releases, data releases by mission area and science center, and all of the above by a given time period. Note: users will only be able to see in-progress data releases for which they have read permissions, as in-progress data releases are not yet public. Additionally, there’s a section for quality control that can run checks for missing fields, including mission area, science center and publication date, and can identify problems, such as incorrect publication date, invalid DOIs, and public data releases that still have an in-progress tag in place.
The ScienceCenterRevisionCode notebook allows users to see all revisions completed in ScienceBase for their science center within a given time period. Both scripts are simple and well-documented, and only require beginner’s knowledge of Python and Jupyter Notebooks. Find these notebooks at the SBDR team’s code repository.
Autofill IPDS Update
Do you find yourself entering information into the ScienceBase Data Release (SBDR) tool when you've already entered the same information into IPDS? A feature is now available in the SBDR tool to help streamline this process in ScienceBase for authors and data managers. Now, if a user enters an IPDS number associated with their data release and clicks on the autofill button, the information that has been entered into IPDS will auto-populate the SBDR tool form. Information including, but not limited to, authors and ORCIDs, title, and science center will be pulled into the SBDR tool. This feature will help reduce the amount of errors and the time allotted to input information.
Please note, we pull information back daily from IPDS at 5:20am CT, and are currently working to increase the number of times the information can be retrieved from IPDS.
Standardizing Publication Dates
To make a data release public, the SBDR team uses a Jupyter Notebook script that automatically runs through a set of steps that finalize the product. For example, the script moves a completed data release to a public folder, updates the read/write permissions, and adds the current date to the landing page as the publication date. One recent update to the Notebook is that it now overwrites the publication dates in attached .xml metadata files, ensuring that they match the publication date on the landing page.
This can save time for authors, especially if there are many child items with metadata records, or if authors don't know the publication date in advance. The publication date field in metadata records is displayed on USGS web pages via the Drupal content management system, so ensuring that metadata records have accurate, well-formatted publication dates can help standardize this field across systems.