US 12,462,902 B2
	Artificial intelligence engine architecture for generating candidate drugs
Francis Lee, Cambridge, MA (US); Jonathan D. Steckbeck, Cranberry Township, PA (US); and Hannes Holste, Los Angeles, CA (US)
Assigned to Peptilogics, Inc., Pittsburgh, PA (US)
Filed by Peptilogics, Inc., Pittsburgh, PA (US)
Filed on Jun. 4, 2021, as Appl. No. 17/339,520.
Application 17/339,520 is a division of application No. 16/870,611, filed on May 8, 2020, granted, now 11,049,590.
Claims priority of provisional application 62/975,470, filed on Feb. 12, 2020.
Prior Publication US 2021/0366581 A1, Nov. 25, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G16C 20/70 (2019.01); G06N 3/042 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G16C 20/50 (2019.01); G16C 60/00 (2019.01)

CPC G16C 20/70 (2019.02) [G06N 3/042 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G16C 20/50 (2019.02); G16C 60/00 (2019.02)]

25 Claims

1. A computer-implemented method for using an artificial intelligence engine to generate candidate drug compounds, wherein the computer-implemented method comprises:

generating a multi-dimensional representation of a plurality of protein drug compounds, wherein for each protein drug compound of the plurality of protein drug compounds, the multi-dimensional representation includes one or more relationships between protein drug compound structural information, protein drug compound activity information, and protein drug compound semantic information;

translating, by the artificial intelligence engine, the multi-dimensional representation to a plurality of encodings;

concatenating, by the artificial intelligence engine, a plurality of encodings to form a concatenated vector, wherein:

the encodings are each respective protein sequences represented in a vector,

a first encoding of the plurality of encodings pertains to the protein drug compound structural information,

a second encoding of the plurality of encodings pertains to the protein drug compound activity information, and

a third encoding of the plurality of encodings pertains to the protein drug compound semantic information;

using an autoencoder of the artificial intelligence engine to compress the concatenated vector from a higher-dimensional vector to a lower-dimensional vector, wherein compressing the concatenated vector from the higher-dimensional vector to the lower-dimensional vector reduces processing complexity;

generating, using the lower-dimensional vector, a candidate drug compound comprising a protein sequence via a creator module of the artificial intelligence engine, wherein the artificial intelligence engine is executed by one or more processing devices;

determining, using a decoder of the artificial intelligence engine, which dimensions are included in the candidate drug compound by converting the candidate drug compound to the higher-dimensional vector and obtaining a set of coordinates from the higher-dimensional vector, wherein the coordinates represent the protein drug compound structural information, the protein drug compound activity information, the protein drug compound semantic information, or some combination thereof; and

based on the dimensions, determining, by the artificial intelligence engine, an effectiveness of a biomedical feature of the candidate drug compound, wherein the creator module comprises a generator machine learning model and a discriminator machine learning model, wherein:

the generator machine learning model is trained to receive the lower-dimensional vector and to generate, based on a counterfactual comprising a specification for modifying a different candidate drug compound, the candidate drug compound, and

the discriminator machine learning model is trained to receive the candidate drug compound as input and to predict, based on biomedical activity data pertaining to the plurality of protein drug compounds, an output related to the effectiveness of the biomedical feature which the candidate drug compound provides; and

using at least the output related to the effectiveness of the biomedical feature which the candidate drug compound provides, training, by the one or more processing devices, the generator machine learning model to remove the candidate drug compound from consideration, when the consideration occurs during an iteration of a subsequent generation.