US 12,348,469 B2
	Assistance during audio and video calls
Fredrik Bergenlid, Mountain View, CA (US); Vladyslav Lysychkin, Zurich (CH); Denis Burakov, Zurich (CH); Behshad Behzadi, Mountain View, CA (US); Andrea Terwisscha Van Scheltinga, Zurich (CH); Quentin Lascombes De Laroussilhe, Zurich (CH); Mikhail Golikov, Mountain View, CA (US); Koa Metter, Mountain View, CA (US); Ibrahim Badr, Mountain View, CA (US); and Zaheed Sabur, Zurich (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on May 21, 2024, as Appl. No. 18/670,389.
Application 18/670,389 is a continuation of application No. 18/215,221, filed on Jun. 28, 2023, granted, now 12,028,302.
Application 18/215,221 is a continuation of application No. 17/991,300, filed on Nov. 21, 2022, granted, now 11,765,113, issued on Sep. 19, 2023.
Application 17/991,300 is a continuation of application No. 17/031,416, filed on Sep. 24, 2020, granted, now 11,509,616, issued on Nov. 22, 2022.
Application 17/031,416 is a continuation of application No. 15/953,266, filed on Apr. 13, 2018, granted, now 10,791,078, issued on Sep. 29, 2020.
Claims priority of provisional application 62/538,764, filed on Jul. 30, 2017.
Prior Publication US 2024/0314094 A1, Sep. 19, 2024
Int. Cl. H04L 51/10 (2022.01); G06F 16/44 (2019.01); G10L 15/22 (2006.01); H04N 7/15 (2006.01); H04N 21/439 (2011.01); H04N 21/4788 (2011.01); G10L 15/00 (2013.01); G10L 15/16 (2006.01); G10L 25/63 (2013.01)

CPC H04L 51/10 (2013.01) [G06F 16/44 (2019.01); G10L 15/22 (2013.01); H04N 7/15 (2013.01); H04N 21/4394 (2013.01); H04N 21/4788 (2013.01); G10L 15/005 (2013.01); G10L 15/16 (2013.01); G10L 2015/223 (2013.01); G10L 25/63 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving session video content during a video communication session between a first computing device and a second computing device;

detecting, in the session video content, a gesture performed by a user associated with the first computing device;

determining, with a trained machine-learning model and based on the gesture, that the user invoked a request for assistance, wherein the request comprises a request for media and wherein the trained machine-learning model outputs a confidence score associated with the determination;

in response to the confidence score meeting a threshold, outputting, by the trained machine-learning model, the media; and

sending a first command to at least one of the first computing device or the second computing device to display the media.