US 12,334,189 B2
System and method for processing experimental data
Maximilien Burq, San Diego, CA (US); Jure Zbontar, San Diego, CA (US); and Peter Cimermancic, San Diego, CA (US)
Assigned to Tesorai, Inc., San Diego, CA (US)
Filed by Tesorai, Inc., San Diego, CA (US)
Filed on Nov. 11, 2024, as Appl. No. 18/943,185.
Claims priority of provisional application 63/682,215, filed on Aug. 12, 2024.
Claims priority of provisional application 63/597,505, filed on Nov. 9, 2023.
Prior Publication US 2025/0157568 A1, May 15, 2025
Int. Cl. G16B 15/00 (2019.01); G06N 3/0464 (2023.01); G16B 40/10 (2019.01); G16B 40/20 (2019.01)
CPC G16B 15/00 (2019.02) [G06N 3/0464 (2023.01); G16B 40/10 (2019.02); G16B 40/20 (2019.02)] 19 Claims
OG exemplary drawing
 
1. A method for molecule identification, comprising:
training a first encoder using a known match between a training sample and a set of training spectra, the training sample comprising a set of training molecules, wherein training the first encoder comprises:
using the first encoder, determining a set of molecule predictions based on the set of training spectra, wherein a number of true-positive molecule predictions and a number of true-negative molecule predictions are determined based on a comparison between the set of molecule predictions and the set of training molecules;
determining a loss for the set of molecule predictions, the loss comprising an accuracy metric determined based on the number of true-positive molecule predictions and the number of true-negative molecule predictions; and
training the first encoder based on the loss;
determining a mass spectrometry spectrum for a molecule;
determining an embedding for the mass spectrometry spectrum, using the first encoder;
for each candidate molecule in a set of candidate molecules:
determining an embedding for the candidate molecule based on a sequence for the candidate molecule, using a second encoder; and
using a scoring model, determining a score for the candidate molecule based on the embedding for the mass spectrometry spectrum and the embedding for the candidate molecule; and
selecting a candidate molecule from the set of candidate molecules based on the scores.