US 12,292,905 B2
Multi-turn dialogue system and method based on retrieval
Haifeng Sun, Beijing (CN); Zirui Zhuang, Beijing (CN); Bing Ma, Beijing (CN); Jingyu Wang, Beijing (CN); Cheng Zhang, Beijing (CN); Tong Xu, Beijing (CN); and Jing Wang, Beijing (CN)
Assigned to BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, Beijing (CN)
Filed by BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, Beijing (CN)
Filed on Jan. 10, 2023, as Appl. No. 18/095,196.
Claims priority of application No. 202210649202.3 (CN), filed on Jun. 9, 2022.
Prior Publication US 2023/0401243 A1, Dec. 14, 2023
Int. Cl. G06F 17/00 (2019.01); G06F 16/3329 (2025.01); G06F 16/334 (2025.01); G06N 3/08 (2023.01)
CPC G06F 16/3329 (2019.01) [G06F 16/3344 (2019.01); G06F 16/3347 (2019.01); G06N 3/08 (2013.01)] 1 Claim
OG exemplary drawing
 
1. A computer-implemented multi-turn dialogue method based on retrieval, comprising:
(1) converting each turn of dialogue into a cascade vector Eu of the dialogue, and converting a candidate answer r into a cascade vector Er of the candidate answer; the cascade vector Eu of the dialogue is obtained by cascading a word level vector and a character level vector in the dialogue; the cascade vector Er of the candidate answer is obtained by cascading a word level vector and a character level vector in the candidate answer; the word level vector is obtained by a tool Word2vec; the character level vector is obtained by encoding character information through a convolutional neural network;
(2) taking the cascade vector of the dialogue and the cascade vector of the candidate answer as an input, dynamically absorbing context information based on a global attention mechanism, and recursively calculating a k-th layer self-attention dialogue representation Ûk, a k-th layer self-attention candidate answer representation Rk, a k-th layer mutual attention dialogue representation Ūk, a k-th mutual attention candidate answer representation Rk, a k-th layer dialogue synthesis representation Uk, and a k-th layer candidate answer synthesis representation Rk, by the following formulas, to obtain a matching vector (v1, . . . , vl):
Ûk=fcatt(Uk-1,Uk-1,C)
Rk=fcatt(Rk-1,Rk-1,C)
Ūk=fcatt(Uk-1,Rk-1,C)
Rk=fcatt(Rk-1,Uk-1,C)
Ũk=[Uk-1kk,Uk-1⊙Ūk]
Rk=[Rk-1,Rk,Rk,Rk-1Rk]
Uk=max(0,WhŨk-1+bh)
Rk=max(0,WhRk-1+bh)+Rk-1
in the formulas, Uk-1custom characterm×d and Rk-1custom charactern×d represent inputs of a k-th global interaction layer, wherein m and n represent the number of words contained in a current turn of dialogue and the number of words contained in the candidate answer, respectively, and inputs of a first global interaction layer are U0=Eu, R0=Er; Whcustom character4d×d and bh are training parameters; an operator ⊕ represents a multiplication of elements; d represents a dimension of a vector;
C∈custom characterlc×d represents context obtained by cascading contents of all l turns of dialogues; all l turns of dialogues contain lc words, C can be obtained by cascading word level vectors of the lc words;
in the formulas, fcatt( ) represents the global attention mechanism, which is specifically defined as follows:
fcatt(Q,K,C)=Q+FNN(Q)
where, FNN(Q)=max(0,QWf+bf)Wg+bg, wherein W{f,g}∈custom characterd×d and b{f,g} are trainable parameters, Q and Q are mixed using a residual connection to obtain Q, wherein Q is calculated according to the following formula:
Q=S(Q,K,CK
where, Q∈custom characternq×d represents a query sequence, K∈custom characternk×d represents a key sequence, wherein nq and nk represent the number of words, S(Q,K,C)∈custom characternq×nk represents a similarity of Q and K in the context C; S(Q, K, C) is calculated according to the following formula:

OG Complex Work Unit Math
where, W{b,c,d,e} are trainable parameters, Ciq represents an i-th row of Cq, and its physical meaning is fusion context information related to an i-th word in the query sequence Q; Cjk represents a j-th row of Ck, and its physical meaning is fusion context information related to a j-th word of the key sequence K;
Cqcustom characternq×d and Ckcustom characternk×d represent context information compression vector fusing the query vector Q and context information compression vector fusing the key vector K, respectively, and are calculated according to the following formulas:
Cq=softmax(QWaCTC
Ck=softmax(KWaCTC
Wacustom characterd×d are training parameters; and
extracting a d dimension matching vector vl from a matching image Mi of an i-th turn of dialogue by a convolutional neural network, and matching vectors from the first to 1-th turn of dialogues are represented by (v1, . . . , vl); the matching image Mi of the i-th turn of dialogue is calculated according to the following formula:
Mi=Mi,self⊕Mi,interaction⊕Mi,enhanced
where, Micustom charactermi×n×3, ⊕ is a cascading operation, mi is the number of words contained in the i-th turn of dialogue ui; Mi,self, Mi,interaction and Mi,enhanced are calculated according to the following formulas:

OG Complex Work Unit Math
(3) receiving the matching vector (v1, . . . , vl), processing the matching vector by an RNN network to obtain a short-term dependence information sequence (h1, . . . , hl), and processing the matching vector by a Transformer network to obtain a long-term dependence information sequence (g1, . . . , gl);
wherein a specific calculation process of the short-term dependence information sequence (h1, . . . , hl) is:
obtaining/hidden layer state vectors by processing the matching vector (v1, . . . , vl) through a GRU model, wherein an i-th hidden layer state is:
hl=GRU(vl,hl-1)
where, h0 is initialized randomly;
a specific calculation process of the long-term dependence information sequence (g1, . . . , gl) is:
(g1, . . . , gl)=MultiHead(Q,K,V)
where,
Q=VmWQ, K=VmWK, V=VmWV,
where WQ, WK and WV are training parameters; Multihead ( ) represents a multi-head attention function; Vm=(v1, . . . , vl);
(4) calculating a matching score of the context c and the candidate answer involved in matching according to the short-term dependence information sequence (h1, . . . , hl) and the long-term dependence information sequence (g1, . . . , gl), wherein the calculating includes:
calculating

OG Complex Work Unit Math
to obtain (ĝ1, . . . , gl), wherein ⊕ represents the multiplication of elements;
then inputting (ĝ1, . . . , ĝl) into a GRU model, to obtain:
gi=GRU(ĝi,gi-1)
wherein g0 is initialized randomly; a final hidden layer state of the GRU model is g1;
calculating the matching score of the context c and the candidate answer r involved in matching based on gl:
g(c,r)=σ(gl·wo+bo)
where, σ(·) represents a sigmoid function, wo and bo are training parameters,
(5) selecting a candidate answer with a highest matching score as a correct answer.