US 11,734,926 B2
Resolving automated assistant requests that are based on image(s) and/or other sensor data
Ibrahim Badr, Zurich (CH); Nils Grimsmo, Adliswil (CH); and Gökhan Bakir, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Nov. 10, 2020, as Appl. No. 17/93,880.
Application 17/093,880 is a continuation of application No. 16/360,789, filed on Mar. 21, 2019, granted, now 10,867,180.
Application 16/360,789 is a continuation of application No. 15/631,274, filed on Jun. 23, 2017, granted, now 10,275,651, issued on Apr. 30, 2019.
Claims priority of provisional application 62/507,153, filed on May 16, 2017.
Prior Publication US 2021/0056310 A1, Feb. 25, 2021
Int. Cl. G06K 9/00 (2022.01); G06V 20/20 (2022.01); G06F 3/16 (2006.01); G06F 16/9032 (2019.01); G06F 16/583 (2019.01); H04L 51/02 (2022.01); G06V 20/68 (2022.01)
CPC G06V 20/20 (2022.01) [G06F 3/167 (2013.01); G06F 16/5854 (2019.01); G06F 16/90332 (2019.01); H04L 51/02 (2013.01); G06V 20/68 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A method implemented by one or more processors,
comprising:
receiving, via an automated assistant interface of a client device, a voice input provided by a user;
determining, based on processing the voice input, that the voice input indicates a request, by the user, related to noise being generated by an object in an environment with the client device and the user;
in response to determining that the voice input indicates the request related to the noise:
processing audio data, that is captured via one or more microphones of the client device and that captures the noise being generated by the object, to determine one or more attributes of the noise being generated by the object;
determining whether the request is resolvable utilizing the one or more attributes of the noise being generated by the object;
in response to determining that the request is not resolvable utilizing the one or more attributes of the noise being generated by the object:
providing a prompt for presentation at the client device or an additional client device;
receiving, in response to the prompt, one or both of:
an image, of the object, captured by the client device or the additional client device, and
further voice input;
resolving the request utilizing the one or more attributes of the noise being generated by the object and based on processing of one or both of the image and the further voice input; and
causing output, that reflects the resolution of the request, to be rendered at the client device of the additional client device.