CPC G06F 30/27 (2020.01) [G06N 3/04 (2013.01); G06N 5/04 (2013.01); G06N 20/10 (2019.01)] | 18 Claims |
1. A computer-implemented method comprising:
generating, by running at least one automated computational tool on a computer, a set of material candidates expected to yield materials with target properties, at least some of the set of material candidates being split fragments of monomers previously produced;
interacting with a subject matter expert via a computer-implemented user interface by presenting to the subject matter expert the set of material candidates initially ordered by synthesizability scores on the user interface, the user interface presenting the set of material candidates with corresponding visual representations of molecule structures of the set of material candidates;
receiving via the computer-implemented user interface, subject matter expert's decision indicating accepted and rejected material candidates from the set of material candidates presented to the subject matter expert; and
based on the subject matter expert's input, including structure and properties of the accepted material candidates, training a machine learning model to replicate the subject matter expert's decision, wherein the machine learning model is trained to select and rank from the set of material candidates, a subset that includes the structure and properties of the accepted material candidates, wherein an input to the machine learning model includes molecule bit vector, the machine learning model being a convolutional neural network (CNN) having a bit-vector embedding matrix in an embedding layer that represents the input, the embedding layer being coupled with convolution layers operating with rectified linear unit and filters, the convolution layers being coupled with a max pooling layer, the max pooling layer being coupled with a fully connected softmax layer, the convolutional neural network performing feature extraction and combinations in each of the convolutional layers, in identifying patterns considered to be desirable in a candidate for synthesis, wherein given the input, the CNN outputs a score associated with the input based on the identified patterns, wherein the score is used to select and rank the subset from the set of material candidates;
wherein based on the subject matter expert selecting and unselecting the material candidates, features in the material candidates that are desirable and not desirable to the subject matter expert are learned by the machine learning model,
wherein the interacting with the subject matter expert and the receiving of the subject matter expert's decision are repeated and the machine learning model is retrained based on the repetition of the interacting and the receiving, wherein the repetitively retrained machine learning model captures an expertise of the subject matter expert in identifying material, which are viable and practical to synthesize and polymerize, among the set of material candidates.
|