US 12,327,716 B2
	Mass spectrometry extraction and selection pipeline for machine learning
Mohit Jain, San Diego, CA (US); Saumya Tiwari, San Diego, CA (US); and Jeramie Watrous, San Diego, CA (US)
Assigned to Sapient Bioanalytics, LLC, San Diego, CA (US)
Filed by Sapient Bioanalytics, LLC, San Diego, CA (US)
Filed on May 20, 2022, as Appl. No. 17/750,245.
Prior Publication US 2023/0377860 A1, Nov. 23, 2023
Int. Cl. H01J 49/00 (2006.01)

CPC H01J 49/0036 (2013.01) [H01J 49/0009 (2013.01)]

17 Claims

1. A computer-implemented method, comprising:

obtaining raw mass spectrometry data from samples;

determining signals present across the samples;

separating the raw mass spectrometry data into discrete intervals in each of the samples;

at each interval of the discrete intervals of the raw mass spectrometry data:

determining a local highest intensity signal, relative to any other signal within each interval; and

determining a frequency of occurrence of each local highest intensity signal across the samples;

retrieving a subset of local highest intensity signals based on respective frequencies of occurrence of the local highest intensity signals;

normalizing the subset of the local highest intensity signals, wherein the normalizing comprises:

segmenting the subset of the local highest intensity signals;

generating a three-dimensional representation indicating peak intensities within windows corresponding to the segmented subset of local highest intensity signals across different samples, wherein each of the windows is based on a size of each of the discrete intervals, wherein three dimensions of the three-dimensional representation comprise a mass-to-charge ratio or a retention time, a sample number, and a respective peak intensity corresponding to the sample number and the mass-to-charge ratio or the retention time; and

transforming the three-dimensional representation into a two-dimensional representation, the two-dimensional representation indicating normalized peak intensities corresponding to the segmented subset of local highest intensity signals across different samples, wherein the two-dimensional representation represents the normalized peak intensities based on a color or shading rather than as a separate dimension or axis;

ingesting the two-dimensional representation into a machine learning model, wherein the machine learning model comprises a neural network classifier;

obtaining, from the machine learning model, veracities of each of the ingested subset of the local highest intensity signals; and

based on the obtained veracities, inferring one or more constituents of the samples.