US 12,136,431 B2
	Multimodal beamforming and attention filtering for multiparty interactions
Paolo Pirjanian, Glendale, CA (US); Stefan Scherer, Santa Monica, CA (US); and Mario E Munich, La Canada, CA (US)
Assigned to Embodied, Inc., Pasadena, CA (US)
Appl. No. 17/622,703
Filed by Embodied, Inc., Pasadena, CA (US)
PCT Filed Feb. 28, 2021, PCT No. PCT/US2021/020148 § 371(c)(1), (2) Date Dec. 23, 2021, PCT Pub. No. WO2021/174162, PCT Pub. Date Sep. 2, 2021.
Claims priority of provisional application 63/154,727, filed on Feb. 27, 2021.
Claims priority of provisional application 62/983,595, filed on Feb. 29, 2020.
Prior Publication US 2022/0180887 A1, Jun. 9, 2022
Int. Cl. G10L 21/0208 (2013.01); G06V 40/16 (2022.01); G06V 40/20 (2022.01); G10L 15/25 (2013.01); G10L 17/06 (2013.01); G10L 21/0216 (2013.01)

CPC G10L 21/0208 (2013.01) [G06V 40/172 (2022.01); G06V 40/176 (2022.01); G06V 40/20 (2022.01); G10L 15/25 (2013.01); G10L 17/06 (2013.01); G10L 2021/02087 (2013.01); G10L 2021/02166 (2013.01)]

19 Claims

1. A method of creating a view of an environment, comprising:

accessing computer-readable instructions from one or more memory devices for

execution by one or more processors of a computing device;

executing the computer-readable instructions accessed from the one or more memory devices by the one or more processors of the computing device; and

wherein executing the computer-readable instructions further comprising:

receiving, at the computing device, voice files, visual effect files, facial expression files and/or mobility files;

receiving parameters and measurements from at least two of one or more microphones, one or more imaging devices, a radar sensor, a lidar sensor and/or one or more infrared imaging devices located in the computing device;

analyzing the parameters and measurements received from the multimodal input; generating a world map of the environment around the computing device, the world map including two or more users and objects;

repeating the receiving of parameters and measurements from the input devices and the analyzing steps on a periodic basis to maintain a persistent world map of the environment;

tracking the engagement of the two or more users utilizing the received parameters and measurements to determine the one or more users that are engaged with the computing device;

determining a noise level for the environment based on receipt of sounds and/or sound files from the two or more users and the environment; and

generating mobility commands to cause the computing device to move closer to a user that is speaking to the computing device if the environment is too noisy to hear the user that is speaking.