| CPC G16B 15/00 (2019.02) [G06N 3/0464 (2023.01); G16B 40/10 (2019.02); G16B 40/20 (2019.02)] | 19 Claims |

|
1. A method for molecule identification, comprising:
training a first encoder using a known match between a training sample and a set of training spectra, the training sample comprising a set of training molecules, wherein training the first encoder comprises:
using the first encoder, determining a set of molecule predictions based on the set of training spectra, wherein a number of true-positive molecule predictions and a number of true-negative molecule predictions are determined based on a comparison between the set of molecule predictions and the set of training molecules;
determining a loss for the set of molecule predictions, the loss comprising an accuracy metric determined based on the number of true-positive molecule predictions and the number of true-negative molecule predictions; and
training the first encoder based on the loss;
determining a mass spectrometry spectrum for a molecule;
determining an embedding for the mass spectrometry spectrum, using the first encoder;
for each candidate molecule in a set of candidate molecules:
determining an embedding for the candidate molecule based on a sequence for the candidate molecule, using a second encoder; and
using a scoring model, determining a score for the candidate molecule based on the embedding for the mass spectrometry spectrum and the embedding for the candidate molecule; and
selecting a candidate molecule from the set of candidate molecules based on the scores.
|