US 11,996,094 B2
Automated assistant with audio presentation interaction
Victor Carbune, Zurich (CH); and Matthew Sharifi, Kilchberg (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jul. 15, 2020, as Appl. No. 16/947,030.
Prior Publication US 2022/0020365 A1, Jan. 20, 2022
Int. Cl. G10L 15/22 (2006.01); G06F 3/16 (2006.01); G10L 15/16 (2006.01); G10L 15/183 (2013.01); G10L 15/26 (2006.01)
CPC G10L 15/22 (2013.01) [G06F 3/165 (2013.01); G10L 15/16 (2013.01); G10L 15/183 (2013.01); G10L 15/26 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
analyzing spoken audio content associated with an audio presentation to identify one or more entities addressed in the audio presentation;
receiving, in an assistant device capable of generating audio responses and visual responses, first and second user queries during playback of the audio presentation;
in response to receiving the first user query:
determining that the first user query is directed to the assistant device;
in response to determining that the first user query is directed to the assistant device, determining that the first user query is directed to the audio presentation by determining that the first user query references at least one of the identified one or more entities; and
in response to determining that the first user query is directed to the audio presentation, generating a first response to the first user query, wherein generating the first response to the first user query uses at least one of the identified one or more entities;
in response to receiving the second user query:
determining that the second user query is directed to the assistant device;
in response to determining that the second user query is directed to the assistant device, determining that the second user query is not directed to the audio presentation by determining that the second user query does not reference at least one of the identified one or more entities; and
in response to determining that the second user query is not directed to the audio presentation, generating a second response to the second user query that is independent of the audio presentation;
determining whether the first user query can be responded to with a visual response;
in response to determining that the first user query can be responded to with a visual response, presenting the first response visually and without pausing the audio presentation;
in response to determining that the first user query cannot be responded to with a visual response, determining whether the audio presentation is being played on a pauseable device, wherein determining whether the audio presentation is being played on a pauseable device includes determining whether the audio presentation is being played on a device other than the assistant device that is not controllable by the assistant device;
in response to determining that the audio presentation is being played on a pauseable device, pausing the audio presentation and presenting the first response while the audio presentation is paused; and
in response to determining that the audio presentation is not being played on a pauseable device, presenting the first response without pausing the audio presentation.