| CPC G16B 30/00 (2019.02) [G16B 20/00 (2019.02); G16B 40/20 (2019.02)] | 29 Claims |

|
1. A computer-implemented method for predicting properties of an mRNA molecule, the method comprising:
obtaining data representing (i) a codon sequence of a coding sequence (CDS) of the mRNA molecule and (ii) a respective nucleotide sequence of each of one or more non-coding regions of the mRNA molecule;
generating an input token vector by numerically encoding the codon sequence;
generating an embedded feature vector for the CDS of the mRNA molecule by processing the input token vector using an embedding neural network, wherein the embedding neural network has the first set of model parameters that have been updated using a first training process of a first neural network, wherein the first training process is performed based on a dataset specifying known codon sequences of mRNA molecules, and the first neural network is configured to perform one or more pre-training tasks;
generating a joint embedding by combining the embedded feature vector generated for the CDS of the mRNA molecule with one or more embeddings generated from the nucleotide sequences of the one or more non-coding regions of the mRNA molecule;
processing the joint embedding using a property-prediction machine-learning model to generate an output that predicts one or more properties of the mRNA molecule, wherein the property-prediction machine-learning model has a second set of model parameters that have been updated using a second training process of a machine-learning model, wherein the second training process is based on a plurality of training examples, each training example comprising (i) a respective training input specifying a representation of a respective mRNA molecule and (ii) a respective label specifying one or more properties of the respective mRNA molecule; and
providing the one or more predicted properties of the mRNA molecule for physically generating the mRNA molecule.
|