CPC G16C 20/50 (2019.02) [G06N 3/045 (2023.01); G06N 3/047 (2023.01); G06N 5/04 (2013.01); G06N 7/01 (2023.01); G06N 7/08 (2013.01); G06N 20/20 (2019.01); G16C 20/70 (2019.02)] | 12 Claims |
1. A method of designing molecules using a machine learning algorithm, the method comprising:
representing, by a Simplified Molecular Input Line Entry System (SMILES) representation unit, molecular structures included in a dataset by using a SMILES, wherein the SMILES uses a set of characters;
converting, by a binary representation unit, a SMILES representation of the molecular structures into a binary representation;
pre-training, by a molecular structure generating unit, a stack of Restricted Boltzmann Machines (RBMs) using the binary representation of the molecular structures to determine a probability density function that estimates whether a candidate molecule comprises a valid molecular structure, the stack of RBMs comprising a three-layer deep belief network (DBN);
constructing, by the molecular structure generating unit, a four-layer Deep Boltzmann Machine (DBM) by combining the three-layer DBN with a two-layer Gaussian Bernoulli Restricted Boltzmann Machine (GBRBM);
determining, by the molecular structure generating unit, limited molecular property data by running a Density Functional Theory (DFT) on a subset of the molecule structures in the dataset;
training, by the molecular structure generating unit, the DBM with the limited molecular property data;
combining, by the molecular structure generating unit, the pre-trained stack of the RBMs and the trained DBM in a Bayesian inference framework;
generating, by the molecular structure generating unit, a sample of molecules with target properties by using the Bayesian inference framework; and
manufacturing, based on the sample of molecules with target properties, one or more real molecules with the target properties.
|