US 12,112,746 B2
	Method and device for processing voice interaction, electronic device and storage medium
Jinfeng Bai, Beijing (CN); Zhijian Wang, Beijing (CN); and Cong Gao, Beijing (CN)
Assigned to Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed by Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed on Sep. 15, 2021, as Appl. No. 17/476,333.
Claims priority of application No. 202011246776.3 (CN), filed on Nov. 10, 2020.
Prior Publication US 2022/0005474 A1, Jan. 6, 2022
Int. Cl. G10L 15/22 (2006.01)

CPC G10L 15/22 (2013.01) [G10L 2015/223 (2013.01)]

16 Claims

1. A method for processing voice interaction, comprising:

determining a first integrity of a voice instruction from a user by using a pre-trained integrity detection model in response to detecting that the voice instruction from the user is not a high-frequency instruction;

determining a waiting duration for the voice instruction based on the first integrity and a preset integrity threshold, wherein the waiting duration for the voice instruction indicates a length of period between a time when a voice interaction device determines that receiving the voice instruction is completed and a time when the voice interaction device performs an operation in response to the voice instruction of the user; and

controlling the voice interaction device to respond to the voice instruction of the user based on the waiting duration;

wherein the method further comprises: prior to responding to the voice instruction based on the waiting duration for the voice instruction,

receiving a supplementary voice instruction from the user within the waiting duration for the voice instruction;

determining a second integrity of a combined instruction composed of the voice instruction of the user and the supplementary voice instruction of the user by using the integrity detection model in response to detecting that the supplementary voice instruction is not a high-frequency instruction; and

determining a waiting duration for the combined instruction based on the second integrity and the preset integrity threshold in response to determining that the second integrity is greater than the first integrity, wherein the waiting duration for the combined instruction indicates a length of period between a time when the voice interaction device determines that receiving the supplementary voice instruction is completed and a time when the voice interaction device performs an operation in response to the combined instruction.