CPC G06N 3/08 (2013.01) [G06F 18/213 (2023.01); G06N 3/045 (2023.01); G06N 20/20 (2019.01); G06F 18/22 (2023.01)] | 8 Claims |
1. A method of embedding a sentence feature vector, which is performed by a computing device comprising one or more processors and a memory in which one or more programs to be executed by the one or more processors are stored, the method comprising:
acquiring a first sentence and a second sentence, each including one or more words;
extracting a first feature vector corresponding to the first sentence and a second feature vector corresponding to the second sentence by independently inputting each of the first sentence and the second sentence into a bidirectional encoder representations from transformers (BERT)-based feature extraction network; and
compressing the first feature vector and the second feature vector into a first compressed vector and a second compressed vector, respectively, by independently inputting each of the first feature vector and the second feature vector into a convolutional neural network (CNN)-based vector compression network,
wherein the CNN-based vector compression network comprises:
a plurality of convolution filters configured to reduce a dimension of the first feature vector or the second feature vector that is an input feature vector,
an activation function application unit configured to generate a plurality of 1×N activation vectors by applying a predetermined activation function to feature vectors with a reduced dimension, and
a pooling layer configured to perform max pooling in which a plurality of elements located in a same row and a same column of the plurality of 1×N activation vectors are compared based on a depth direction, and a first element with a maximum value is selected as an element in the same row and the same column of an 1×N compression vector,
wherein N is a natural number,
wherein the plurality of 1×N activation vectors have same sizes of row and columns,
wherein the BERT-based feature extraction network is trained by updating training parameters based on a similarity between the first compressed vector and the second compressed vector,
wherein the BERT-based feature extraction network comprises a Siamese network architecture composed of a first feature extraction network configured to receive the first sentence and extract the first feature vector and a second feature extraction network configured to receive the second sentence and extract the second feature vector,
wherein the CNN-based vector compression network comprises a Siamese network architecture composed of a first vector compression network configured to receive the first feature vector and compress the first feature vector into the first compressed vector and a second vector compression network configured to receive the second feature vector and compress the second feature vector into the second compressed vector, and
wherein the CNN-based vector compression network is separately provided from the BERT-based feature extraction network.
|