US 11,721,413 B2
Method and system for performing molecular design using machine learning algorithms
Piyush Tagade, Bangalore (IN); Shanthi Pandian, Bangalore (IN); S Krishnan Hariharan, Bangalore (IN); and Parampalli Shashishekara Adiga, Bangalore (IN)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Gyeonggi-do (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Apr. 5, 2019, as Appl. No. 16/376,132.
Claims priority of application No. IN201841015526 (IN), filed on Apr. 24, 2018; and application No. 10-2018-0117878 (KR), filed on Oct. 2, 2018.
Prior Publication US 2019/0325983 A1, Oct. 24, 2019
Int. Cl. G16C 20/50 (2019.01); G16C 20/70 (2019.01); G06N 20/20 (2019.01); G06N 7/08 (2006.01); G06N 7/01 (2023.01); G06N 5/04 (2023.01); G06N 3/047 (2023.01); G06N 3/045 (2023.01)
CPC G16C 20/50 (2019.02) [G06N 3/045 (2023.01); G06N 3/047 (2023.01); G06N 5/04 (2013.01); G06N 7/01 (2023.01); G06N 7/08 (2013.01); G06N 20/20 (2019.01); G16C 20/70 (2019.02)] 12 Claims
 
1. A method of designing molecules using a machine learning algorithm, the method comprising:
representing, by a Simplified Molecular Input Line Entry System (SMILES) representation unit, molecular structures included in a dataset by using a SMILES, wherein the SMILES uses a set of characters;
converting, by a binary representation unit, a SMILES representation of the molecular structures into a binary representation;
pre-training, by a molecular structure generating unit, a stack of Restricted Boltzmann Machines (RBMs) using the binary representation of the molecular structures to determine a probability density function that estimates whether a candidate molecule comprises a valid molecular structure, the stack of RBMs comprising a three-layer deep belief network (DBN);
constructing, by the molecular structure generating unit, a four-layer Deep Boltzmann Machine (DBM) by combining the three-layer DBN with a two-layer Gaussian Bernoulli Restricted Boltzmann Machine (GBRBM);
determining, by the molecular structure generating unit, limited molecular property data by running a Density Functional Theory (DFT) on a subset of the molecule structures in the dataset;
training, by the molecular structure generating unit, the DBM with the limited molecular property data;
combining, by the molecular structure generating unit, the pre-trained stack of the RBMs and the trained DBM in a Bayesian inference framework;
generating, by the molecular structure generating unit, a sample of molecules with target properties by using the Bayesian inference framework; and
manufacturing, based on the sample of molecules with target properties, one or more real molecules with the target properties.