CPC G06F 16/3329 (2019.01) [G06F 16/3344 (2019.01); G06F 16/3347 (2019.01); G06N 3/08 (2013.01)] | 1 Claim |
1. A computer-implemented multi-turn dialogue method based on retrieval, comprising:
(1) converting each turn of dialogue into a cascade vector Eu of the dialogue, and converting a candidate answer r into a cascade vector Er of the candidate answer; the cascade vector Eu of the dialogue is obtained by cascading a word level vector and a character level vector in the dialogue; the cascade vector Er of the candidate answer is obtained by cascading a word level vector and a character level vector in the candidate answer; the word level vector is obtained by a tool Word2vec; the character level vector is obtained by encoding character information through a convolutional neural network;
(2) taking the cascade vector of the dialogue and the cascade vector of the candidate answer as an input, dynamically absorbing context information based on a global attention mechanism, and recursively calculating a k-th layer self-attention dialogue representation Ûk, a k-th layer self-attention candidate answer representation Rk, a k-th layer mutual attention dialogue representation Ūk, a k-th mutual attention candidate answer representation Rk, a k-th layer dialogue synthesis representation Uk, and a k-th layer candidate answer synthesis representation Rk, by the following formulas, to obtain a matching vector (v1, . . . , vl):
Ûk=fcatt(Uk-1,Uk-1,C)
Rk=fcatt(Rk-1,Rk-1,C)
Ūk=fcatt(Uk-1,Rk-1,C)
Rk=fcatt(Rk-1,Uk-1,C)
Ũk=[Uk-1,Ûk,Ūk,Uk-1⊙Ūk]
Rk=[Rk-1,Rk,Rk,Rk-1⊙Rk]
Uk=max(0,WhŨk-1+bh)
Rk=max(0,WhRk-1+bh)+Rk-1
in the formulas, Uk-1∈
![]() ![]() ![]() C∈
![]() in the formulas, fcatt( ) represents the global attention mechanism, which is specifically defined as follows:
fcatt(Q,K,C)=Q+FNN(Q)
where, FNN(Q)=max(0,QWf+bf)Wg+bg, wherein W{f,g}∈
![]() Q=S(Q,K,C)·K
where, Q∈
![]() ![]() ![]() ![]() where, W{b,c,d,e} are trainable parameters, Ciq represents an i-th row of Cq, and its physical meaning is fusion context information related to an i-th word in the query sequence Q; Cjk represents a j-th row of Ck, and its physical meaning is fusion context information related to a j-th word of the key sequence K;
Cq∈
![]() ![]() Cq=softmax(QWaCT)·C
Ck=softmax(KWaCT)·C
Wa∈
![]() extracting a d dimension matching vector vl from a matching image Mi of an i-th turn of dialogue by a convolutional neural network, and matching vectors from the first to 1-th turn of dialogues are represented by (v1, . . . , vl); the matching image Mi of the i-th turn of dialogue is calculated according to the following formula:
Mi=Mi,self⊕Mi,interaction⊕Mi,enhanced
where, Mi∈
![]() ![]() (3) receiving the matching vector (v1, . . . , vl), processing the matching vector by an RNN network to obtain a short-term dependence information sequence (h1, . . . , hl), and processing the matching vector by a Transformer network to obtain a long-term dependence information sequence (g1, . . . , gl);
wherein a specific calculation process of the short-term dependence information sequence (h1, . . . , hl) is:
obtaining/hidden layer state vectors by processing the matching vector (v1, . . . , vl) through a GRU model, wherein an i-th hidden layer state is:
hl=GRU(vl,hl-1)
where, h0 is initialized randomly;
a specific calculation process of the long-term dependence information sequence (g1, . . . , gl) is:
(g1, . . . , gl)=MultiHead(Q,K,V)
where,
Q=VmWQ, K=VmWK, V=VmWV,
where WQ, WK and WV are training parameters; Multihead ( ) represents a multi-head attention function; Vm=(v1, . . . , vl);
(4) calculating a matching score of the context c and the candidate answer involved in matching according to the short-term dependence information sequence (h1, . . . , hl) and the long-term dependence information sequence (g1, . . . , gl), wherein the calculating includes:
calculating
![]() to obtain (ĝ1, . . . , gl), wherein ⊕ represents the multiplication of elements;
then inputting (ĝ1, . . . , ĝl) into a GRU model, to obtain:
gi=GRU(ĝi,gi-1)
wherein g0 is initialized randomly; a final hidden layer state of the GRU model is g1;
calculating the matching score of the context c and the candidate answer r involved in matching based on gl:
g(c,r)=σ(gl·wo+bo)
where, σ(·) represents a sigmoid function, wo and bo are training parameters,
(5) selecting a candidate answer with a highest matching score as a correct answer.
|