US 12,073,831 B1
	Using visual context to improve a virtual assistant
Saurabh Adya, San Jose, CA (US); Sameer Badaskar, San Jose, CA (US); Akanksha Bindal, Mountain View, CA (US); Ahmed S. Hussen Abdelaziz, San Ramon, CA (US); Xiaochuan Niu, Santa Clara, CA (US); Alkeshkumar M. Patel, San Jose, CA (US); and Srikanth Vishnubhotla, Santa Clara, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Jan. 14, 2022, as Appl. No. 17/576,419.
Claims priority of provisional application 63/138,156, filed on Jan. 15, 2021.
Int. Cl. G10L 15/22 (2006.01); G06F 18/214 (2023.01); G06V 10/82 (2022.01); G06V 20/50 (2022.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/24 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 18/214 (2023.01); G06V 10/82 (2022.01); G06V 20/50 (2022.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/18 (2013.01); G10L 15/24 (2013.01)]

33 Claims

12. An electronic device comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

receiving an image;

generating, based on the image, a question corresponding to a first object in the image;

retrieving, a plurality of speech recognition results based on a received utterance;

determining whether an attribute of an object referenced by a speech recognition result of the plurality of speech recognition results matches an attribute of an object referenced by the generated question; and

in accordance with a determination that the attribute of the object referenced by the speech recognition result of the plurality of speech recognition results matches the attribute of the object referenced by the generated question, determining that the received utterance is directed to a digital assistant.