| CPC G16B 40/20 (2019.02) [G06N 3/123 (2013.01)] | 18 Claims |

|
1. A system, comprising:
memory storing an inference protein sequence that requires translation from a protein input space to a codon output space;
a protein embedder configured to process the inference protein sequence through protein embedding coefficients to generate an inference protein embedding, wherein the protein embedding coefficients are trained to encode the inference protein sequence in a higher-dimensional protein latent space,
wherein each instance of the inference protein embedding is M-dimensional,
wherein each instance of the inference protein sequence is N-dimensional, and
wherein M is greater than N;
a protein-to-codon translator configured with translation coefficients trained using training protein and codon embedding pairs that are higher-dimensional representations of corresponding training protein and codon pairs,
wherein the protein-to-codon translator is an encoder-decoder neural network configured with an encoder neural network and a decoder neural network, and wherein the protein-to-codon translator provides an embedding space that generates the higher-dimensional representations,
wherein each of the training protein and codon embedding pairs are M-dimensional,
wherein each of the training protein and codon pairs are N-dimensional, and
wherein M is greater than N;
an inference logic configured to process the inference protein embedding through the translation coefficients by sampling the embedding space using the protein-to-codon translator to generate an inference codon embedding; and
a reverse mapping logic configured to process the inference codon embedding through the translation coefficients by sampling the embedding space using the protein-to-codon translator to generate an inference codon sequence, wherein the inference codon sequence is a translation of the inference protein sequence in the codon output space,
wherein each instance of the inference codon embedding is M-dimensional,
wherein each instance of the inference codon sequence is N-dimensional, and
wherein M is greater than N.
|