CPC G16B 15/30 (2019.02) [G06N 3/08 (2013.01); G16B 5/20 (2019.02); G16B 15/20 (2019.02); G16B 40/00 (2019.02); G16B 40/20 (2019.02)] | 20 Claims |
1. A method comprising:
obtaining, by a computing system including one or more computing devices having one or more processors and memory, a first training dataset including first amino acid sequences of first proteins, individual first proteins have one or more first features of a first group of features;
performing, by the computing system, a first training process that includes:
encoding individual first amino acid sequences as a matrix according to amino acids located at positions of individual first proteins to produce a plurality of matrices corresponding to encoded versions of the first amino acid sequences;
generating input data using a random number generator or pseudo-random number generator, wherein the input data is provided to a generating component of a generative adversarial network;
producing, by the generating component and based on the input data, generated sequences that correspond to additional amino acid sequences, wherein the generated sequences are represented as vectors indicating amino acids located at a number of positions;
computationally analyzing, by a challenging component of the generative adversarial network and implementing a distance function, the vectors corresponding to the generated sequences and the plurality of matrices corresponding to the encoded versions of the first amino acid sequences to produce challenging component output indicating differences between the generated sequences and the first amino acid sequences; and
modifying, based on the challenging component output, at least one of one or more parameters, one or more weights, or one or more variables of one or more machine learning models of the generating component until a loss function of the generating component is minimized to produce a trained version of the generating component;
obtaining, by the computing system, a second training dataset including second amino acid sequences of second proteins, individual second proteins having one or more second features of a second group of features, the second group of features being different from the first group of features;
performing by the computing system, transfer learning with respect to the trained version of the generating component based on the second training dataset, wherein the transfer learning includes modifying at least one of the one or more parameters, the one or more weights, or the one or more variables of the one or more machine learning models of the generating component in response to minimizing the loss function of trained version of the generating component with respect to the second training dataset to produce a modified version of the generating component, wherein the modified version of the generating component produces second additional amino acid sequences of second additional proteins having at least one second feature of the one or more second features;
producing, by the modified version of the generating component as implemented by the computing system, third amino acid sequences of third proteins, individual third proteins having at least a portion of the one or more second features, and the third amino acid sequences being greater in number than the second amino acid sequences;
performing, by the computing system, an additional training process for an inferential model using a third training dataset that includes at least a portion of the third amino acid sequences to produce a trained version of the inferential model, the trained version of the inferential model to classify amino acid sequences as having features that correspond to at least a portion of the second group of features;
obtaining, by the computing system, fourth amino acid sequences of fourth proteins; and
determining, by the trained version of the inferential model as implemented by the computing system, one or more classifications for the fourth proteins, the one or more classifications indicating at least one or more structural features of the fourth proteins.
|