CPC G16H 50/20 (2018.01) [G16B 40/20 (2019.02); G16H 50/70 (2018.01)] | 19 Claims |
1. A method comprising:
receiving an input indicative of an instruction to train a neural network to identify text phrases that are representative of mutations that are diagnostically relevant for a given type of cancer,
wherein each of the text phrases is representative of a different set of characters, each of which is representative of a nucleotide;
accessing a dataset that includes genetic information of multiple individuals that are known to have the given type of cancer;
generating a first set of locations by examining the dataset,
wherein each of the locations included in the first set is representative of a different molecular position at which a mutation is discovered through analysis of the genetic information;
producing a set of metrics by computing, for each of the locations included in the first set, a metric that is indicative of correlation with the given type of cancer, as determined based on an analysis of the genetic information of the multiple individuals;
generating a second set that includes fewer locations than the first set by
identifying the locations included in the first set in order based on the set of metrics, ordered from most correlated with the given type of cancer to least correlated with the given type of cancer, and
filtering at least some of the locations included in the first set, so as to produce the second set; and
training the neural network using the second set.
|