US 11,680,063 B2
Entangled conditional adversarial autoencoder for drug discovery
Daniil Polykovskiy, Moscow (RU); Artur Kadurin, Taipei (TW); Aleksandr M. Aliper, Moscow (RU); Alexander Zhebrak, Moscow (RU); and Aleksandrs Zavoronkovs, Pak Shek Kok (HK)
Assigned to INSILICO MEDICINE IP LIMITED, Hong Kong (HK)
Filed by INSILICO MEDICINE IP LIMITED, Hong Kong (HK)
Filed on Sep. 5, 2019, as Appl. No. 16/562,373.
Claims priority of provisional application 62/727,926, filed on Sep. 6, 2018.
Prior Publication US 2020/0082916 A1, Mar. 12, 2020
Int. Cl. G16C 20/30 (2019.01); G16C 20/70 (2019.01); G16C 20/50 (2019.01); C07D 471/04 (2006.01); G16C 20/40 (2019.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06F 18/21 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)
CPC C07D 471/04 (2013.01) [G06F 18/2178 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G16C 20/30 (2019.02); G16C 20/40 (2019.02); G16C 20/70 (2019.02)] 22 Claims
OG exemplary drawing
 
1. A method for generating an object comprising:
obtaining a plurality of objects and object properties thereof from a dataset, wherein at least an object data is coupled to an object property data;
inputting the plurality of objects and object properties into a machine learning platform;
creating a trained model with the machine learning platform that is trained with the plurality of objects and object properties, wherein the trained model is an entangled conditional adversarial autoencoder that is configured to:
include an encoder that encodes a distribution of latent code to a latent space;
a predictor model configured to extract object property data from latent code and updating the encoder to eliminate extracted object property data from the latent code;
include a discriminator model that forces a distribution on the latent code to match a prior distribution, wherein the discriminator model discriminates between samples of the distribution of the latent code in the latent space and the prior distribution;
include a decoder that decodes the latent code into decoded object data;
consider the object data that is coupled with the object property data; and
concatenating a defined property with the latent code for input into the decoder;
processing the trained model to obtain the distribution of the latent codes of the objects from the encoder;
reparameterizing the latent codes into reparameterized latent codes with the object properties;
disentangling the latent codes from the object properties with a combined disentanglement protocol of a predictive disentanglement with the predictor model and a joint disentanglement with the discriminator model to provide independence between the latent codes and the object properties;
determining a distribution difference loss of distribution of reparameterized latent codes with a defined prior distribution;
determining a property prediction loss of an estimated property prediction;
generating a plurality of generated objects with the decoder from the disentangled latent codes each having the defined property value of the object properties;
determining reconstruction loss of the generated objects from the input plurality of objects;
determining a total loss from the reconstruction loss, distribution difference loss, and property prediction loss, wherein objects with lowest total loss is selected as a candidate object and
providing a report of the plurality of the candidate objects, wherein the report defines at least one defined property value of the plurality of the candidate objects, wherein the objects are molecules and the properties are one or more of physical properties, molecular fingerprint, lipophilicity, synthetic accessibility, biochemical properties, binding activity, or solubility.