US 11,967,436 B2
Methods and apparatus for making biological predictions using a trained multi-modal statistical model
Marylens Hernandez, Guilford, CT (US); Umut Eser, Lexington, MA (US); Michael Meyer, Guilford, CT (US); Henri Lichenstein, Guilford, CT (US); Tian Xu, Guilford, CT (US); and Jonathan M. Rothberg, Guilford, CT (US)
Assigned to Quantum-Si Incorporated, Branford, CT (US)
Filed by Quantum-Si Incorporated, Guilford, CT (US)
Filed on May 8, 2019, as Appl. No. 16/406,993.
Claims priority of provisional application 62/678,094, filed on May 30, 2018.
Prior Publication US 2019/0371476 A1, Dec. 5, 2019
Prior Publication US 2020/0350081 A9, Nov. 5, 2020
Int. Cl. G16H 70/40 (2018.01); G16H 50/20 (2018.01); G16H 50/50 (2018.01)
CPC G16H 70/40 (2018.01) [G16H 50/20 (2018.01); G16H 50/50 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A method for predicting a new disease indication for a given drug, the method comprising using at least one processor to perform:
training a statistical model by applying a self-supervised learning technique to training data to obtain a trained statistical model, the training data comprising representations of drugs in a first modality and representations of diseases in a second modality, the statistical model comprising a drug encoder, a disease encoder, a common representation space, a drug decoder and a disease decoder, wherein the training comprises:
projecting the representations of the drugs and diseases into the common representation space using the drug encoder to obtain drug vectors;
projecting the representations of the diseases into the common representation space using the disease encoder to obtain disease vectors;
combining the drug vectors with the disease vectors to obtain a plurality of joint drug-disease vectors;
providing the joint drug-disease vectors as input to the drug decoder and/or the disease decoder to obtain decoded output vectors;
determining a difference between the decoded output vectors, and at least some of the representations of drugs and diseases; and
updating parameters of the statistical model based on the difference between the decoded output vectors and the at least some representations of drugs and diseases;
obtaining a representation of the given drug comprising data from the first modality;
obtaining representations of a plurality of diseases comprising data from the second modality; and
predicting the new disease indication for the given drug using the trained statistical model, the trained statistical model comprising learned parameters for projecting data from the first and second modalities into the common representation space in which data from the first and second modalities can be compared, the learned parameters including parameters of the drug encoder trained to project drug representations into the common representation space and parameters of the disease encoder trained to project disease representations into the common representation space, the predicting comprising:
projecting the representation of the given drug into the common representation space using the learned parameters of the trained drug encoder to obtain a first vector in the common representation space representing the given drug;
projecting the representations of the plurality of diseases into the common representation space using the learned parameters of the trained disease encoder to obtain a plurality of vectors in the common representation space representing respective ones of the plurality of diseases;
determining a measure of similarity between the first vector representing the given drug and each of the plurality of vectors representing the plurality of diseases; and
identifying at least one disease of the plurality of diseases as the new disease indication based on the measure of similarity between the first vector representing the given drug and each of the plurality of vectors representing the plurality of diseases.