US 12,462,903 B2
Utilizing compound-protein machine learning representations to generate bioactivity predictions
Seyed Ali Madani Tonekaboni, Toronto (CA); Daniella Fiora Lato, Toronto (CA); and Stephen Scott MacKinnon, Burlington (CA)
Assigned to Recursion Pharmaceuticals, Inc., Salt Lake City, UT (US)
Filed by Recursion Pharmaceuticals, Inc., Salt Lake City, UT (US)
Filed on Nov. 9, 2023, as Appl. No. 18/505,728.
Prior Publication US 2025/0157595 A1, May 15, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G16C 20/70 (2019.01); G16B 40/20 (2019.01)
CPC G16C 20/70 (2019.02) [G16B 40/20 (2019.02)] 20 Claims
 
1. A computer-implemented method comprising:
identifying a plurality of compound-protein pairs comprising a training compound matched to a plurality of proteins;
generating, utilizing a compound-protein interaction machine learning model from the plurality of compound-protein pairs, a plurality of binding scores between the training compound and the plurality of proteins;
generating a compound-protein machine learning representation comprising the plurality of binding scores between the training compound and the plurality of proteins;
generating, utilizing the compound-protein interaction machine learning model, an additional compound-protein machine learning representation comprising an additional plurality of binding scores between an additional training compound and the plurality of proteins; and
iteratively training a target machine learning model to improve accuracy of the target machine learning model by,
for a first training iteration:
inputting the compound-protein machine learning representation comprising the plurality of binding scores between the training compound and the plurality of proteins into the target machine learning model to generate a first predicted bioactivity result for the training compound; and
modifying parameters of the target machine learning model by comparing the first predicted bioactivity result to a ground truth bioactivity result corresponding to the training compound; and
for a second training iteration:
inputting the additional compound-protein machine learning representation comprising the additional plurality of binding scores between the additional training compound and the plurality of proteins into the target machine learning model to generate an additional predicted bioactivity result for the additional training compound; and
modifying the parameters of the target machine learning model by comparing the additional predicted bioactivity result to an additional ground truth bioactivity result corresponding to the additional training compound.