US 12,288,599 B2
Protein structure information prediction method and apparatus, device, and storage medium
Jiaxiang Wu, Shenzhen (CN); Yuzhi Guo, Shenzhen (CN); and Junzhou Huang, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on Dec. 1, 2021, as Appl. No. 17/539,946.
Application 17/539,946 is a continuation of application No. PCT/CN2020/114386, filed on Sep. 10, 2020.
Claims priority of application No. 201911042649.9 (CN), filed on Oct. 30, 2019.
Prior Publication US 2022/0093213 A1, Mar. 24, 2022
Int. Cl. G01N 33/48 (2006.01); G01N 33/50 (2006.01); G16B 5/00 (2019.01); G16B 20/30 (2019.01); G16B 30/10 (2019.01); G16B 50/00 (2019.01)
CPC G16B 30/10 (2019.02) [G16B 5/00 (2019.02); G16B 20/30 (2019.02); G16B 50/00 (2019.02)] 16 Claims
OG exemplary drawing
 
1. A method for predicting structure information of a protein, performed by a computer device, the method comprising:
performing sequence alignment query in a first database according to an amino acid sequence of the protein to obtain multi-sequence aligned data;
performing feature extraction on the multi-sequence aligned data to obtain an initial sequence feature;
processing the initial sequence feature by using a sequence feature augmentation model to obtain an augmented sequence feature of the protein, the sequence feature augmentation model being a machine learning model trained by using a sample initial sequence feature and a sample augmented sequence feature and updated according to an augmented sample initial sequence feature and the sample augmented sequence feature, the sample initial sequence feature being obtained by performing sequence alignment query in the first database according to a sample amino acid sequence, the sample augmented sequence feature being obtained by performing sequence alignment query in a second database according to the sample amino acid sequence, wherein a data scale of the second database being greater than a data scale of the first database, and the augmented sample initial sequence feature being obtained by processing the sample initial sequence feature by using the sequence feature augmentation model, wherein the sequence feature augmentation model comprises one of:
a fully convolutional network (FCN) model for one-dimensional sequence data;
a recurrent neural network (RNN) model comprising a plurality of layers of long short-term memory (LSTM) units; or
an RNN model comprising bidirectional LSTM units; and
predicting structure information of the protein based on the augmented sequence feature.