US 12,014,831 B2
Approaches to reducing dimensionality of genetic information used for machine learning and systems for implementing the same
Cheuk Ying Tang, Cupertino, CA (US); Victor Solovyev, San Francisco, CA (US); and Gene Lee, Millbrae, CA (US)
Assigned to AIONCO, Inc., Menlo Park, CA (US)
Filed by AIONCO, Inc., Menlo Park, CA (US)
Filed on Dec. 1, 2022, as Appl. No. 18/073,471.
Claims priority of provisional application 63/285,429, filed on Dec. 2, 2021.
Prior Publication US 2023/0335279 A1, Oct. 19, 2023
Int. Cl. G16H 50/20 (2018.01); G16B 40/20 (2019.01); G16H 50/70 (2018.01)
CPC G16H 50/20 (2018.01) [G16B 40/20 (2019.02); G16H 50/70 (2018.01)] 19 Claims
OG exemplary drawing
 
1. A method comprising:
receiving an input indicative of an instruction to train a neural network to identify text phrases that are representative of mutations that are diagnostically relevant for a given type of cancer,
wherein each of the text phrases is representative of a different set of characters, each of which is representative of a nucleotide;
accessing a dataset that includes genetic information of multiple individuals that are known to have the given type of cancer;
generating a first set of locations by examining the dataset,
wherein each of the locations included in the first set is representative of a different molecular position at which a mutation is discovered through analysis of the genetic information;
producing a set of metrics by computing, for each of the locations included in the first set, a metric that is indicative of correlation with the given type of cancer, as determined based on an analysis of the genetic information of the multiple individuals;
generating a second set that includes fewer locations than the first set by
identifying the locations included in the first set in order based on the set of metrics, ordered from most correlated with the given type of cancer to least correlated with the given type of cancer, and
filtering at least some of the locations included in the first set, so as to produce the second set; and
training the neural network using the second set.