US 12,456,461 B2
	Electronic apparatus for processing user utterance and controlling method thereof
Sangmin Park, Suwon-si (KR); and Jaeyung Yeo, Suwon-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Jul. 20, 2022, as Appl. No. 17/869,411.
Application 17/869,411 is a continuation of application No. PCT/KR2021/015453, filed on Oct. 29, 2021.
Claims priority of application No. 10-2020-0142314 (KR), filed on Oct. 29, 2020.
Prior Publication US 2022/0358925 A1, Nov. 10, 2022
Int. Cl. G10L 15/22 (2006.01); G06F 3/16 (2006.01); G10L 15/30 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 3/167 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01); G10L 2015/228 (2013.01)]

20 Claims

1. An electronic device comprising:

a microphone;

memory storing instructions, a plurality of domain sets and a capsule database including a plurality of capsules corresponding to the plurality of domain sets, the plurality of capsules including relationships between a plurality of concepts and actions corresponding to the plurality of domain sets; and

at least one processor, comprising processing circuitry, electrically connected to the microphone and the memory,

wherein the instructions, when executed by the at least one processor individually or collectively, cause the electronic device to:

acquire a voice signal using the microphone;

acquire context information associated with at least one of the electronic device or a user;

determine whether domain sets are identified from among the plurality of domain sets stored in the memory based on the context information;

when a single domain set is identified among the plurality of domain sets based on the context information, determine a first domain set as the identified single domain set;

when more than one domain set are identified among the plurality of domain sets based on the context information, select the first domain set among the identified more than one domain set based on an utterance of the user in the voice signal;

when no domain sets are identified among the plurality of domain sets based on the context information:

transmit, to a server, the context information and the voice signal,

receive, from the server, the first domain set corresponding to the context information and the voice signal;

generate a plan including a plurality of operations for processing a task corresponding to the voice signal based on the determined first domain set and capsules corresponding to the determined first domain set; and

perform the plurality of operations corresponding to the voice signal.