US 12,411,654 B1
	Voice control hub methods and systems
Joseph Kessler, Grayslake, IL (US); Suresh Bellam, Vernon Hills, IL (US); Andre Coetzee, Cary, IL (US); and Dan Verdeyen, Glenview, IL (US)
Assigned to CDW LLC, Vernon Hills, IL (US)
Filed by CDW LLC, Vernon Hills, IL (US)
Filed on Feb. 28, 2023, as Appl. No. 18/176,461.
Application 18/176,461 is a continuation of application No. 16/519,736, filed on Jul. 23, 2019, abandoned.
Int. Cl. G06F 3/16 (2006.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01); G06V 10/70 (2022.01); G10L 13/00 (2006.01); G10L 15/08 (2006.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)

CPC G06F 3/167 (2013.01) [G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01); G06V 10/70 (2022.01); G10L 13/00 (2013.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A voice control hub computing system for performing a task within an enterprise business software application, comprising:

one or more processors, and

a memory containing instructions that, when executed, cause the voice control hub computing system to:

receive a handler registration request specifying a dynamic object handler to respond to voice commands, wherein the dynamic object handler is compiled when the application initializes;

receive an utterance of a user of the enterprise business software application;

transmit the utterance of the user to a remote cloud services layer;

convert the utterance of the user to a text string representing speech-to-text output using a custom speech model, wherein the custom speech model filters out background noise from the utterance of the user;

analyze the text string using one or more trained machine learning models to generate an intent and an entity corresponding to the task,

wherein at least one of the models is trained using a respective command interpreter for interpreting utterances of a respective type, and at least one of the models is a convolutional neural network trained to map intents to graphical user interface components;

receive the intent and the entity from the remote cloud services layer, wherein the intent is associated with the entity;

dispatch the intent and the entity to the dynamic object handler; and

analyze the intent, the entity, and an image of a graphical user interface using at least one of the one or more trained machine learning models to map the entity to a component of the graphical user interface corresponding to performance of the task.