November 7, 2023
Journal Article

Characterizing Families of Spectral Similarity Scores and their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification

Abstract

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as in-formative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprised of both standards and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing

Published: November 7, 2023

Citation

Degnan D.J., J.E. Flores, E. Brayfindley, V.L. Paurus, B.M. Webb-Robertson, C.S. Clendinen, and L.M. Bramer. 2023. Characterizing Families of Spectral Similarity Scores and their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification. Metabolites 13, no. 10:Art. No. 1101. PNNL-SA-189894. doi:10.3390/metabo13101101

Research topics