| CPC G16H 70/40 (2018.01) [G06N 5/022 (2013.01); G16C 20/70 (2019.02)] | 20 Claims |

|
1. A method, comprising:
receiving, by a device, a knowledge graph representing information and simplified molecular-input line-entry (SMILE) data identifying compounds;
training, by the device, embeddings based on the knowledge graph;
generating, by the device, graph embeddings for the SMILE data based on the embeddings;
encoding, by the device, the SMILE data into a latent space;
combining, by the device, the graph embeddings and the latent space to generate a combined latent-embedding space;
decoding, by the device, the combined latent-embedding space to generate decoded SMILE data;
utilizing, by the device, the decoded SMILE data to train an encoder and to generate a trained encoder;
processing, by the device, source SMILE data, with the trained encoder, to generate a source combined latent-embedding space;
searching, by the device, the source combined latent-embedding space to identify new SMILE data associated with new compounds;
decoding, by the device, the new SMILE data to generate decoded new SMILE data; and
evaluating, by the device, the decoded new SMILE data to identify particular SMILE data associated with a new compound.
|