US 12,217,747 B2
	Electronic apparatus for processing user utterance and controlling method thereof
Euisuk Chung, Suwon-si (KR); Sangki Kang, Suwon-si (KR); Sunghwan Baek, Suwon-si (KR); Seokyeong Jung, Suwon-si (KR); and Kyungtae Kim, Suwon-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Appl. No. 17/271,182
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
PCT Filed Aug. 23, 2019, PCT No. PCT/KR2019/010769 § 371(c)(1), (2) Date Feb. 24, 2021, PCT Pub. No. WO2020/040595, PCT Pub. Date Feb. 27, 2020.
Claims priority of application No. 10-2018-0099474 (KR), filed on Aug. 24, 2018.
Prior Publication US 2021/0335360 A1, Oct. 28, 2021
Int. Cl. G10L 15/22 (2006.01); G10L 15/06 (2013.01); G10L 15/10 (2006.01); G10L 15/14 (2006.01); G10L 15/16 (2006.01); G10L 15/30 (2013.01)

CPC G10L 15/22 (2013.01) [G10L 15/063 (2013.01); G10L 15/10 (2013.01); G10L 15/142 (2013.01); G10L 15/16 (2013.01); G10L 15/30 (2013.01)]

12 Claims

1. An electronic device comprising:

a communication interface;

memory;

a microphone;

a speaker;

a display;

a main processor; and

a sub-processor configured to activate the main processor by recognizing a wake-up word included in a voice input,

wherein the memory stores instructions that, when executed, cause the main processor to:

receive from a user a first voice input to register the wake-up word through the microphone;

determine whether the first voice input includes a specified word, wherein to determine whether the first voice input includes the specified word, the instructions cause the main processor to:

transmit the first voice input to an external server through the communication interface; and

receive a determination result of determining whether the first voice input includes the specified word, from the external server through the communication interface;

when the determination result indicates that the first voice input does not include the specified word:

display a first user interface (UI) guiding the user to speak a word identical to the first voice input, through the display, wherein the first UI requests to receive a second voice input including the word identical to the first voice input;

receive the second voice input from the user, through the microphone;

when receiving the second voice input, determine whether the second voice input is identical to the first voice input;

when the first voice input is different from the second voice input, output a second UI for receiving a third voice input identical to the first voice input, through the display;

when the first voice input is identical to the second voice input, generate a wake-up word recognition model for recognizing the wake-up word based on the first voice input and the second voice input; and

store the generated wake-up word recognition model in the memory based on the first voice input and the second voice input; and

when the determination result indicates that the first voice input includes the specified word, output a third UI for requesting a fourth voice input different from the first voice input, through the display, wherein the third UI does not include the word identical to the first voice input,

wherein the sub-processor is configured to activate the main processor by recognizing the registered wake-up word,

wherein the sub-processor consumes power less than the main processor, and

wherein when the main processor is activated, the instructions cause the activated main processor to:

receive a fifth voice input to perform a specified task and the wake-up word, through the microphone;

transmit the fifth voice input and the received wake-up word to the external server through the communication interface;

determine whether the received wake-up word includes the specified word by receiving a determination result of determining whether the transmitted wake-up word includes the specified word, from the external server through the communication interface;

when the determination result indicates that the transmitted wake-up word does not include the specified word:

receive a first response corresponding to the fifth voice input from the external server through the communication interface, wherein the first response includes action information for performing the specified task and a wake-up name of the electronic device included in the wake-up word; and

output the first response, through the speaker or the display; and

when the determination result indicates that the transmitted wake-up word includes the specified word:

receive a second response corresponding to the fifth voice input from the external server through the communication interface, wherein the second response includes the action information for performing the specified task and a name different from the wake-up word; and

output the second response, through the speaker or the display.