The main factors we found that influenced the management, inventorying and preservation of legacy data were the organization of the center, the presence of outside entities in funding and ownership, and the utilization of institutional knowledge.
Organization:
The existing archival organization system at UMESC helped to make the legacy data inventory process more efficient and more focused. The time the archivist spent going through and identifying records in a way that allowed for easy identification and location of particular studies saved the DaR team time and allowed us to go directly to studies of a particular topic. This ease of access has also proved to be beneficial for UMESC scientists who want to go back and find a particular dataset for further study or reference.
As tool, the master index sheet is useful for quick assessments of what’s contained in a study/file folder, as well as an efficient way to request project records from a Federal Records Center (FRC). In many cases, UMESC has sent physical records, master index sheet included, to a FRC. Keeping a copy of the master index sheet enables UMESC staff to identify a study or piece of a study of interest and request the records from the specific FRC if needed. Locally managing the master index sheets saves time and storage space by minimizing the need to digitize and store on-site every record from a study that may or may not be used again. Sending records to FRC or the National Archives (NARA) on schedule is not only good records management practice but it's also an effective way to make space available in the center and keep track of the constant flux of data.
While their system isn’t fully developed, it’s important to recognize the advantages of UMESC’s systematic organization and control they have over their science center’s scientific records, specifically legacy data.
Outside Ownership:
With different outside entities providing funding for all or most of a branch’s projects, there are direct effects to how the data can be handled once the study is done. While conducting the UMESC inventory, we found that there were some advantages and disadvantages to data being owned by someone other than USGS.
Having studies tightly regulated by outside entities proved to be advantageous for legacy data preservation by enforcing excellent records management and detailed documentation throughout the studies. One example of this at UMESC is the Aquatic Ecosystem Health Branch that’s funded and regulated by the EPA. This inter-agency relationship was established to regulate studies dealing with chemicals (mostly pesticides for invasive aquatic species) that needed to be accepted by the EPA and FDA before becoming available for use. These tight regulations caused studies from the AEH branch to be well organized and well documented, compared to other branches who weren’t held to such strict regulations. In addition to being well organized, the EPA and FDA regulated studies required all records to be stored in fire-protected file cabinets.
The main disadvantage in having outside ownership of the data was that it was often restricted for uses we were interested in, such as releasing to the public. The restrictions varied based on the project, funding entity, data agreements, etc., but in all cases if an outside entity owned the data, permissions would have to be given to us before we could release the data.
We delved into these policies a bit more with some of the datasets we inventoried from the AEH Branch. One example is a collection of studies on TFM, a chemical toxic to sea lampreys. The establishment of Great Lakes Fisheries Commission (GLFC) in the 1960’s spurred the regional effort for sea lamprey control to combat the detrimental invasive species across the Great Lakes and Midwestern waterways. The GLFC remains in control of the studies conducted on sea lamprey control out of the Midwest, therefore, in this case the Data at Risk project will follow up with the UMESC-GLFC connection to determine whether or not we can release these high profile legacy data.
Institutional Knowledge:
One of the most valuable resources we used in our inventory process was the center’s institutional knowledge. Communication is a key part of legacy data preservation. In general, a study is best understood by the scientists who conducted the work and the information managers that assist them. The UMESC data manager and archivist provided a deep working knowledge of who best to talk to about specific studies and how to effectively get information from them. Directly engaging UMESC scientists provided project-level context and a practical understanding of the legal restrictions of the work. Without that input, legacy data preservation would not be possible in many cases. Utilizing current institutional knowledge is, therefore, a crucial part of preserving USGS legacy data. Without this knowledge the data loses much, if not all, of its value.