US 12,094,579 B2
Machine-learning method and apparatus to isolate chemical signatures
Gerrad Jones, Corvallis, OR (US)
Assigned to Oregon State University, Corvallis, OR (US)
Filed by Oregon State University, Corvallis, OR (US)
Filed on Mar. 29, 2021, as Appl. No. 17/216,401.
Claims priority of provisional application 63/005,090, filed on Apr. 3, 2020.
Prior Publication US 2021/0313016 A1, Oct. 7, 2021
Int. Cl. G16C 20/70 (2019.01); G01N 33/18 (2006.01); G01N 33/49 (2006.01); H01J 49/00 (2006.01)
CPC G16C 20/70 (2019.02) [G01N 33/18 (2013.01); G01N 33/49 (2013.01); H01J 49/0036 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A method for distinguishing a presence or absence of an individual chemical source of at least two chemical sources, the method comprising:
receiving a plurality of samples from the at least two chemical sources, wherein an individual sample of the plurality of samples is associated with one or more predictor variables representing a chemical characteristic associated with a chemical source, from the at least two chemical sources, from which the individual sample was taken;
analyzing the plurality of samples with a mass spectrometer to obtain a plurality of mass spectral data sets for each chemical source of the at least two chemical sources;
storing the plurality of mass spectral data sets on a machine-readable storage medium;
reading the plurality of mass spectral data sets by one or more processors; and
executing a set of machine-readable instructions by the one or more processors to perform:
binning a mass spectral data set for each individual chemical source of the at least two chemical sources into an individual bin to generate a binned source, wherein the binned source corresponds to an individual category of the at least two chemical sources;
converting the individual bin into a binary variable comprising 1s and Os for the individual chemical source, wherein 1s represent one or more samples from a chemical source of interest and Os represent other chemical sources from the at least two chemical sources;
selecting, based on a specification of a spectral signature derived from the mass spectral data set associated with the individual category, one or more representative sample sites;
analyzing the binned source by applying a supervised classification process in which a machine-learning classifier is trained on the one or more representative sample sites to differentiate between first one or more samples of the chemical source of interest and second one or more samples of the other chemical sources from the at least two chemical sources based on an associated chemical composition;
generating a set of coefficients for an individual predictor variable that evaluates a relevance of the individual predictor variable based on an ability to discriminate the spectral signature of the first one or more samples from the second one or more samples by applying the spectral signature of the one or more representative sample sites;
averaging and sorting coefficients of the set of coefficients for the individual predictor variable associated for the individual chemical source;
selecting chemicals with highest negative and positive coefficients from the sorted coefficients for the individual chemical source; and
generating an output, based on the sorted coefficients for the individual chemical source, indicative of a subset of chemical features that predicts the individual chemical source.