US 12,282,747 B2
	Method for human-machine dialogue, computing device and computer-readable storage medium
Li Ma, Shenzhen (CN)
Assigned to UBTECH ROBOTICS CORP LTD, Shenzhen (CN)
Filed by UBTECH ROBOTICS CORP LTD, Shenzhen (CN)
Filed on Jul. 21, 2022, as Appl. No. 17/870,813.
Application 17/870,813 is a continuation of application No. PCT/CN2021/131221, filed on Nov. 17, 2021.
Claims priority of application No. 202011591934.9 (CN), filed on Dec. 29, 2020.
Prior Publication US 2022/0358297 A1, Nov. 10, 2022
Int. Cl. G06F 40/58 (2020.01); G06F 16/33 (2025.01); G06F 16/332 (2025.01); G06F 16/3329 (2025.01); G06F 16/3332 (2025.01); G06F 16/334 (2025.01); G06F 16/9032 (2019.01); G06F 40/47 (2020.01); G06N 20/00 (2019.01)

CPC G06F 40/58 (2020.01) [G06F 16/3329 (2019.01); G06F 16/3337 (2019.01); G06F 16/3343 (2019.01); G06F 16/3344 (2019.01); G06F 16/90332 (2019.01); G06F 40/47 (2020.01); G06N 20/00 (2019.01)]

20 Claims

20. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a method for a method for humanoid-machine dialogue, the method comprising:

acquiring an input sentence in a first language in a current round of conversation;

translating the input sentence in the first language in the current round of conversation to obtain an input sentence in a second language in the current round of conversation, according to dialogue contents in the first language and dialogue contents in the second language that have a mutual translation relationship with the dialogue contents in the first language in historical rounds of conversation, wherein the dialogue contents in the first language comprise input sentences in the first language and output sentences in the first language in corresponding rounds of conversation, and the dialogue contents in the second language comprise input sentences in the second language and output sentences in the second language in corresponding rounds of conversation;

invoking a pre-stored multi-round conversation generation model to parse the input sentence in the second language in the current round of conversation, according to the dialogue contents in the second language in the historical rounds of conversation, to generate an output sentence in the second language in the current round of conversation, wherein the multi-round conversation generation module is obtained by perform training based on multi-round dialogue corpora in the second language;

translating the output sentence in the second language in the current round of conversation, according to the dialogue contents in the first language and in the second language in the historical rounds of conversation, and the input sentence in the first language and the input sentence in the second language in the current round of conversation, to obtain at least one candidate result in the first language; and

determining an output sentence in the first language in the current round of conversation from the at least one candidate result in the first language for output;

wherein determining the output sentence in the first language in the current round of conversation from the at least one candidate result in the first language for output comprises:

for each of the at least one candidate result in the first language, invoking a pre-stored coherence evaluation model to calculate an expression coherence of the candidate result in the first language according to the dialogue contents in the first language in the historical rounds of conversation and the input sentence in the first language in the current round of conversation; and

selecting one of the at least one candidate result with a largest expressive coherence as the output sentence in the first language in the current round of conversation and outputting the output sentence in the first language in the current round of conversation;

wherein the method further comprises:

acquiring a plurality of valid dialogue corpus samples in the first language, wherein each of the valid dialogue corpus samples comprises dialogue contents in the first language in multiple consecutive rounds of conversation;

for each of the valid dialogue corpus samples, constructing a negative dialogue corpus sample corresponding to the valid dialogue corpus sample; and

training an initial classification model using the plurality of the valid dialogue corpus samples and the negative dialogue corpus samples to obtain the coherence evaluation model;

wherein constructing a negative dialogue corpus sample corresponding to the valid dialogue corpus sample comprises:

extracting, from the plurality of valid dialogue corpus samples, a target output sentence in the first language corresponding to a last one of the multiple consecutive rounds of conversation;

translating the target output sentence in the first language to obtain a corresponding expression sentence in the second language;

translating the expression sentence in the second language to a corresponding expression sentence in the first language;

calculating a minimum edit distance between the target output sentence in the first language and the expression sentence in the first language;

comparing the minimum edit distance with a preset distance threshold to obtain a comparison result, and determining a negative output sentence in the first language matching the target output sentence in the first language according to the comparison result; and

replacing the target output sentence in the first language in valid dialogue corpus samples with the negative output sentence in the first language to obtain the negative dialogue corpus sample; and

wherein determining the negative output sentence in the first language matching the target output sentence in the first language according to the comparison result comprises:

in response to the minimum edit distance being greater than the preset distance threshold, determining the expression sentence in the first language as the negative output sentence in the first language; and

in response to the minimum edit distance being equal to the preset distance threshold, performing a synonym substitution on at least one word in the expression sentence in the first language to obtain the negative output sentence in the first language.