US 12,462,899 B2
	Utilizing compound-protein machine learning representations to generate bioactivity predictions
Seyed Ali Madani Tonekaboni, Toronto (CA); Daniella Fiora Lato, Toronto (CA); and Stephen Scott MacKinnon, Burlington (CA)
Assigned to Recursion Pharmaceuticals, Inc., Salt Lake City, UT (US)
Filed by Recursion Pharmaceuticals, Inc., Salt Lake City, UT (US)
Filed on Nov. 9, 2023, as Appl. No. 18/505,754.
Prior Publication US 2025/0157585 A1, May 15, 2025
Int. Cl. G16B 40/20 (2019.01)

CPC G16B 40/20 (2019.02)

20 Claims

1. A computer-implemented method comprising:

training a target machine learning model by:

generating, utilizing the target machine learning model, a training predicted bioactivity result from a compound-protein machine learning representation comprising a plurality of training match scores;

comparing the training predicted bioactivity result from the compound-protein machine learning representation with a ground truth bioactivity result to determine a measure of loss; and

modifying parameters of the target machine learning model based on the measure of loss;

generating, utilizing a trained compound-protein interaction machine learning model, a plurality of match scores for a plurality of compound-protein pairs corresponding to a query compound and a plurality of proteins by generating a match score of the plurality of match scores by stripping one or more layers from the trained compound-protein interaction machine learning model, wherein the plurality of match scores comprise binding probabilities between the query compound and the plurality of proteins;

generating, utilizing the target machine learning model, a predicted bioactivity result for the query compound from the plurality of match scores for the plurality of compound-protein pairs; and

determining, utilizing a machine learning explainability model, one or more proteins from the plurality of compound-protein pairs contributing to the predicted bioactivity result for the query compound by modifying features comprising the plurality of match scores for the plurality of compound-protein pairs input to the target machine learning model to determine modified bioactivity results and marginal contributions of the plurality of proteins in generating the modified bioactivity results utilizing the target machine learning model.