| CPC G16B 15/30 (2019.02) [G06N 3/08 (2013.01); G16B 15/20 (2019.02)] | 20 Claims |

|
1. A method, comprising:
a) receiving, at a processor, a pre-trained mixed modality neural network:
i) wherein the representation modalities are for representations of features of proteins,
ii) wherein the represented features include one or more of sequence, structure, function, interactions, interactors, binding partners, attributes, and properties,
iii) wherein the neural network is configured to accept as input data, a query consisting of one or more of the modalities, and to yield as output data, a response to the query, wherein the response also consists of one or more of the modalities,
iv) wherein the neural network is autoregressive,
v) wherein the neural network has one or more output heads, each with its own loss function,
vi) wherein for each respective output head of the neural network, the final output is a probability distribution over a set of possible values at that head;
b) using a plurality of mixed modality reason-oriented query response pairs to perform supervised fine tuning of the pre-trained neural network:
i) wherein for each reason-oriented query used, the pre-trained neural network's output is scored against a corresponding chain-of-thought response,
ii) wherein an optimization process is used to iteratively update the weights of the pre-trained neural network,
iii) wherein the supervised fine tuning weight updates proceed until termination criteria are met,
iv) wherein the output is a reasoning-oriented mixed modality neural network;
c) using the reasoning-oriented mixed modality neural network as an output generator method for obtaining a representation of an output ligand, in response to an input query specifying conditions on the ligand;
d) using the output generator method to generate an output ligand by randomly sampling the output probability distribution of the neural network's active head at each iteration of the autoregression;
e) running the random-sampling based generation process a plurality of times with a given input query, wherein each of the plurality of runs uses the same input query, and wherein each of the plurality of runs yields a representation of a candidate ligand;
f) obtaining the plurality of generated representations of ligands as output.
|