US 12,230,261 B2
	Electronic apparatus and method for controlling user interface elements by voice
Ranjan Kumar Samal, Bengaluru (IN); Praveen Kumar Guvvakallu Sivamoorthy, Bengaluru (IN); Purushothama Chowdari Gonuguntla, Bengaluru (IN); Rituraj Laxminarayan Kabra, Bengaluru (IN); and Manjunath Belgod Lokanath, Bengaluru (IN)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Oct. 27, 2021, as Appl. No. 17/452,510.
Application 17/452,510 is a continuation of application No. PCT/KR2021/012839, filed on Sep. 17, 2021.
Claims priority of application No. 202041041137 (IN), filed on Sep. 23, 2020; application No. 2020 41041137 (IN), filed on Aug. 13, 2021; and application No. 10-2021-0122742 (KR), filed on Sep. 14, 2021.
Prior Publication US 2022/0093098 A1, Mar. 24, 2022
Int. Cl. G10L 15/22 (2006.01); G06F 3/0481 (2022.01); G06F 3/16 (2006.01); G06N 3/042 (2023.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/183 (2013.01); G10L 15/30 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 3/0481 (2013.01); G06F 3/167 (2013.01); G06N 3/042 (2023.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/1815 (2013.01); G10L 15/183 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01)]

17 Claims

1. A method for controlling an electronic device, the method comprising:

identifying, by the electronic device, at least one user interface (UI) element displayed on a screen of the electronic device using at least one of screen reading application programming interface (API), optical character recognition (OCR), or image classification;

identifying, by the electronic device, at least one characteristic of the identified at least one UI element;

predicting, by the electronic device, a natural language utterance based on the at least one characteristic of the identified at least one UI element;

generating, by the electronic device, a database including the predicted natural language utterance corresponding to the identified at least one UI element;

based on receiving a voice input, identifying, by the electronic device, whether an utterance of the received voice input matches the natural language utterance included in the generated database; and

based on identifying that the utterance of the voice input matches the natural language utterance, automatically accessing, by the electronic device, the at least one UI element,

wherein the at least one characteristic of the identified at least one UI element comprises:

at least one of positions of respective UI elements, relative positions of the respective UI elements for other UI elements, functions of the respective UI elements, capabilities of the respective UI elements, types of the respective UI elements, or appearances of the respective UI elements,

wherein the at least one UI element comprises a non-textual UI element,

wherein the predicting of the natural language utterance comprises determining a textual representation of the non-textual UI element and a viewable relative position of the non-textual UI element among viewable positions of the other UI elements,

wherein the generating of the database comprises:

identifying similarities between respective UI elements based on positions of the respective UI elements, and a similarity between at least one of relative positions of the respective UI elements, functions of the respective UI elements, capabilities of the respective UI elements, or shapes of the respective UI elements;

acquiring a knowledge graph by clustering the respective UI elements based on the identified similarities; and

storing the knowledge graph in the database, and

wherein the method further comprising:

performing a semantic translation on the knowledge graph to acquire natural language variations for at least one of a single-step intent or a multi-step intent;

identifying at least one action and at least one action sequence for at least one of the single-step intent or the multi-step intent using the knowledge graph; and

dynamically generating a natural language model for predicting a natural utterance of the identified UI element by mapping the acquired natural language variations with the identified at least one action and the identified at least one action sequence.