US 12,282,606 B2
	VPA with integrated object recognition and facial expression recognition
Ajay Divakaran, Monmouth Junction, NJ (US); Amir Tamrakar, Philadelphia, PA (US); Girish Acharya, Redwood City, CA (US); William Mark, San Mateo, CA (US); Greg Ho, South Brunswick, NJ (US); Jihua Huang, Philadelphia, PA (US); David Salter, Bensalem, PA (US); Edgar Kalns, San Jose, CA (US); Michael Wessel, Palo Alto, CA (US); Min Yin, San Jose, CA (US); James Carpenter, Mountain View, CA (US); Brent Mombourquette, Menlo Park, CA (US); Kenneth Nitz, Redwood City, CA (US); Elizabeth Shriberg, Berkeley, CA (US); Eric Law, Hayward, CA (US); Michael Frandsen, Helena, MT (US); Hyong-Gyun Kim, Santa Clara, CA (US); Cory Albright, Helena, MT (US); and Andreas Tsiartas, Santa Clara, CA (US)
Assigned to SRI International, Menlo Park, CA (US)
Filed by SRI International, Menlo Park, CA (US)
Filed on Dec. 1, 2020, as Appl. No. 17/107,958.
Application 17/107,958 is a continuation of application No. 15/332,494, filed on Oct. 24, 2016.
Claims priority of provisional application 62/264,228, filed on Dec. 7, 2015.
Claims priority of provisional application 62/329,055, filed on Apr. 28, 2016.
Claims priority of provisional application 62/339,547, filed on May 20, 2016.
Prior Publication US 2021/0081056 A1, Mar. 18, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/03 (2006.01); G06F 3/01 (2006.01); G06F 3/16 (2006.01); G06N 3/006 (2023.01); G06N 5/022 (2023.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06V 40/16 (2022.01); G06V 40/20 (2022.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 25/63 (2013.01); G06N 7/01 (2023.01)

CPC G06F 3/017 (2013.01) [G06F 3/0304 (2013.01); G06F 3/167 (2013.01); G06N 3/006 (2013.01); G06N 5/022 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06V 40/16 (2022.01); G06V 40/20 (2022.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 25/63 (2013.01); G06N 7/01 (2023.01); G10L 15/1822 (2013.01); G10L 2015/228 (2013.01)]

22 Claims

1. A method comprising:

receiving, by a computing device, sensory input comprising at least audio input and visual input;

determining semantic information from the sensory input;

determining a first questioning style to question a user using the semantic information;

determining scene information from the visual input;

determining, after waiting for audio input from the user, an input state associated with the user using the audio input and the scene information;

changing the first questioning style to a second questioning style using the determined input state, the second questioning style being different from the first questioning style; and

outputting a question to the user using the second questioning style.