- Publication Date
- 2019-03-20
- Start Date
- 2018-01-01
- End Date
- 2018-12-31

Engle, M.A., 2019, Codebook vectors and predicted rare earth potential from a trained emergent self-organizing map displaying multivariate topology of geochemical and reservoir temperature data from produced and geothermal waters of the United States: U.S. Geological Survey data release, https://doi.org/10.5066/P9GCYKG0.

This data release consists of three products relating to a 82 x 50 neuron Emergent Self-Organizing Map (ESOM), which describes the multivariate topology of reservoir temperature and geochemical data for 190 samples of produced and geothermal waters from across the United States. Variables included in the ESOM are coordinates derived from reservoir temperature and concentration of Sc, Nd, Pr, Tb, Lu, Gd, Tm, Ce, Yb, Sm, Ho, Er, Eu, Dy, F, alkalinity as bicarbonate, Si, B, Br, Li, Ba, Sr, sulfate, H (derived from pH), K, Mg, Ca, Cl, and Na converted to units of proportion. The concentration data were converted to isometric log-ratio coordinates (following Hron et al., 2010), where the first ratio is Sc serving as the denominator to the [...]

This data release is provided to: 1) allow users to map new sample sources to the ESOM using a minimum distance measurement (such Euclidean distance) through an algorithm such a k-nearest neighbor and 2) provide predicted rare earth element potential output from the exercise for produced and geothermal waters of the United States. Any data sets used for mapping to the trained ESOM need to be isometrically log-ratio transformed and standardized (using means and standard deviations from the first table) using the exact same formulation of the training dataset used to create this matrix. This case be useful both for instances of data classification or for non-linear estimation. In the case of the latter, missing values (i.e., those in need of estimation) can be imputed from the codebook vector for the best match unit (i.e., the neuron with the smallest multivariate distance to the point being estimated). The imputed value can then convert back into the original units through the inverse of data standardization and for concentration data, the inverse of the isometric log-ratio transformation (Hron et al., 2010). Note that for concentration data, the results are in units of proportion and can be converted back into the original units by multiplying each row by the sum of the compositional data in the original dataset.

- Energy Resources Program
- USGS Data Release Products

Please see attached metadata record for full dataset provenance.