US 11,929,152 B1
	Systems, methods, and apparatuses to predict pairs of enzyme primary sequences and substrates and an interaction probability for each predicted pair
Alexander Sewall Ford, Seattle, WA (US); Zachary Wu, College Station, TX (US); Layne Christopher Price, Seattle, WA (US); Franziska Seeger, Seattle, WA (US); and Yen Ling Adelene Sim, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 9, 2020, as Appl. No. 16/896,877.
Int. Cl. G01N 33/48 (2006.01); G06N 3/044 (2023.01); G16B 5/20 (2019.01); G16B 15/00 (2019.01); G16B 40/00 (2019.01)

CPC G16B 40/00 (2019.02) [G06N 3/044 (2023.01); G16B 5/20 (2019.02); G16B 15/00 (2019.02)]

20 Claims

1. A computer-implemented method comprising:

receiving, by one or more processors of a computing device, a request to predict a pair of an enzyme primary sequence and a substrate, and an interaction probability for the pair, the request including a representation of a primary sequence of an enzyme, a representation of a substrate composition, and an indication of reactivity of the substrate and the enzyme;

applying, by the one or more processors, an embedding to the representation of the primary sequence of the enzyme to generate an enzyme vector;

decomposing, by the one or more processors, the substrate into a 2-D connection matrix with 1-D embeddings for each node of the 2-D connection matrix;

tiling, by the one or more processors, the 1-D embeddings to form 2-D representations for each node of the 2-D connection matrix;

stacking, by the one or more processors, the 2-D connection matrix with the 2-D representations and upsampling, thereby generating a 2-D upsampled stack;

convolving, by the one or more processors, the 2-D upsampled stack to generate a substrate vector;

concatenating, by the one or more processors, the enzyme vector, substrate vector, and interaction indication to form a machine learning (ML) model input;

receiving, by a multi-headed self-attention ML model executed by the one or more processors, the ML input;

predicting, using the multi-headed self-attention ML model based at least on the ML input, one or more pairs of an enzyme primary sequence and a substrate, and an interaction probability for each pair; and

outputting, by the multi-headed self-attention ML model, a result of the predicting using the multi-headed self-attention ML model, the result including the one or more pairs of an enzyme primary sequence and a substrate, and an interaction probability for each pair.