US 12,243,519 B2
Automatic adaptation of multi-modal system components
Mohammad Niknazar, Laramie, NY (US); Aditya Vempaty, Yorktown Heights, NY (US); Robert Smith, New York, NY (US); Amol Nayate, Yorktown Heights, NY (US); Javier Villafana, Long Beach, CA (US); Ravindranath Kokku, Yorktown Heights, NY (US); Shom Ponoth, Irvine, CA (US); Sharad Sundararajan, Union City, NJ (US); and Satya Nitta, Cross River, NY (US)
Assigned to Merlyn Mind, Inc., New York, NY (US)
Filed by Merlyn Mind, Inc., New York, NY (US)
Filed on Nov. 3, 2021, as Appl. No. 17/518,099.
Prior Publication US 2023/0134400 A1, May 4, 2023
Int. Cl. G10L 15/20 (2006.01); G06F 3/16 (2006.01); G08B 5/22 (2006.01); G10L 15/01 (2013.01); G10L 15/06 (2013.01); G10L 15/07 (2013.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01); G10L 15/26 (2006.01); G10L 25/60 (2013.01); G10L 25/84 (2013.01)
CPC G10L 15/20 (2013.01) [G08B 5/22 (2013.01); G10L 15/01 (2013.01); G10L 15/063 (2013.01); G10L 15/07 (2013.01); G10L 15/08 (2013.01); G10L 15/22 (2013.01); G10L 25/60 (2013.01); G10L 25/84 (2013.01); G10L 2015/088 (2013.01); G10L 2015/223 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A computer-implemented method of automatic adaptation in a multi-modal system, comprising:
receiving, by a processor, component data related to a plurality of components in a physical room, including a plurality of input devices and a plurality of output devices, the component data including a ranking of multiple components of the plurality of components;
continuously receiving, by the processor located in the physical room, input data generated by a default input device of the plurality of input devices, the input data including digital sound data;
detecting an utterance of a spoken word from specific input data including specific digital sound data;
generating, in response to the detecting, one or more sound metrics;
wherein a first sound metric comprises a speech metric for the specific digital sound data;
wherein a second sound metric comprises a noise level metric for digital sound data corresponding to a time interval prior to detecting the utterance;
when the first sound metric of the one or more sound metrics meets a first criterion, activating a first component of the plurality of components for improving the first sound metric;
when the second sound metric of the one or more sound metrics meets a second criterion, activating a second component of the plurality of components for improving the second sound metric,
wherein the activating the first component or the second component comprises activating an input component of the plurality of components that is ranked higher in the ranking than the default input device to help receive the input data in the physical room;
causing one or more input devices of the plurality of input devices or one or more output devices of the plurality of output devices to execute an action that alerts a user of the activated component.