Skip to main content

Data and Model Archive for Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York

Dates

Publication Date
Time Period
2022

Citation

DeSimone, L.A., 2023, Data and model archive for preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York: U.S. Geological Survey data release, https://doi.org/10.5066/P90AT9YG.

Summary

Data and preliminary machine-learning models used to predict manganese and 1,4-dioxane in groundwater on Long Island are documented in this data release. Concentration data used to develop the models were from 910 wells for manganese and 553 wells for 1,4-dioxane, primarily public supply wells, from U.S. Geological Survey, U.S. Environmental Protection Agency (USEPA), and Suffolk County Water Authority sources. Thirty-two explanatory variables describe depth, groundwater flow, land use, soil properties, and other features of the aquifer system. The models use XGBoost, an ensemble tree machine learning method. Four models are documented for manganese, predicting the probability of concentrations relative to four thresholds: 10 micrograms [...]

Contacts

Attached Files

Click on title to download individual files attached to this item.

LI_mn_14dx_exp_vars.txt 20.69 KB text/plain
LI_mn_14dx_well_data.txt 353.17 KB text/plain
LI_mn_14dx_predinput_griddata.zip 86.04 MB application/zip
LI_mn_14dx_predoutput_rasters.zip 22.17 MB application/zip
LI_mn_14dx_prediction_grid.tif 1.74 MB image/geotiff
LI_mn_14dx_models.zip 235.57 KB application/zip

Purpose

These data and models were compiled and developed to demonstrate the use of machine learning methods to model and map contaminants in groundwater on Long Island. Groundwater on Long Island is the sole source of drinking water for 2.9 million people and is susceptible to contamination from a variety of sources. Manganese and 1,4-dioxane were chosen as representative of contaminants from natural and anthropogenic sources of concern for drinking water on Long Island. The models are considered preliminary because they were based on only a selected fraction of the available data that potentially could be used for modeling and because they could be improved by modeling enhancements, such as methods to address class imbalance in the concentration data. The models are not intended to provide precise estimates of manganese or 1,4-dioxane at any given location, but, even though based on limited data, can be used to generally identify areas where these contaminants may occur to prioritize future monitoring or guide future modeling and mapping efforts.

Additional Information

Identifiers

Type Scheme Key
DOI https://www.sciencebase.gov/vocab/category/item/identifier doi:10.5066/P90AT9YG

Item Actions

View Item as ...

Save Item as ...

View Item...