CPC G16B 40/00 (2019.02) [G06N 3/044 (2023.01); G16B 5/20 (2019.02); G16B 15/00 (2019.02)] | 20 Claims |
1. A computer-implemented method comprising:
receiving, by one or more processors of a computing device, a request to predict a pair of an enzyme primary sequence and a substrate, and an interaction probability for the pair, the request including a representation of a primary sequence of an enzyme, a representation of a substrate composition, and an indication of reactivity of the substrate and the enzyme;
applying, by the one or more processors, an embedding to the representation of the primary sequence of the enzyme to generate an enzyme vector;
decomposing, by the one or more processors, the substrate into a 2-D connection matrix with 1-D embeddings for each node of the 2-D connection matrix;
tiling, by the one or more processors, the 1-D embeddings to form 2-D representations for each node of the 2-D connection matrix;
stacking, by the one or more processors, the 2-D connection matrix with the 2-D representations and upsampling, thereby generating a 2-D upsampled stack;
convolving, by the one or more processors, the 2-D upsampled stack to generate a substrate vector;
concatenating, by the one or more processors, the enzyme vector, substrate vector, and interaction indication to form a machine learning (ML) model input;
receiving, by a multi-headed self-attention ML model executed by the one or more processors, the ML input;
predicting, using the multi-headed self-attention ML model based at least on the ML input, one or more pairs of an enzyme primary sequence and a substrate, and an interaction probability for each pair; and
outputting, by the multi-headed self-attention ML model, a result of the predicting using the multi-headed self-attention ML model, the result including the one or more pairs of an enzyme primary sequence and a substrate, and an interaction probability for each pair.
|