US 12,407,894 B2
Digital assistant for providing graphical overlays of video events
Gavin K. Duffy, Los Gatos, CA (US); Raymond M. Macharia, San Francisco, CA (US); Jessica J. Peck Brown, Morgan Hill, CA (US); and Robert M. Schulman, Los Gatos, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Feb. 26, 2024, as Appl. No. 18/587,689.
Application 18/587,689 is a continuation of application No. PCT/US2022/041912, filed on Aug. 29, 2022.
Claims priority of provisional application 63/239,290, filed on Aug. 31, 2021.
Prior Publication US 2024/0205489 A1, Jun. 20, 2024
Int. Cl. H04N 21/431 (2011.01); G06F 3/01 (2006.01); G06F 3/14 (2006.01); G06F 3/16 (2006.01); G10L 15/18 (2013.01)
CPC H04N 21/4312 (2013.01) [G06F 3/013 (2013.01); G06F 3/017 (2013.01); G06F 3/1431 (2013.01); G06F 3/167 (2013.01); G10L 15/1822 (2013.01)] 43 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device having a display, cause the electronic device to:
while displaying, on the display, a video event:
receive, by a digital assistant operating on the electronic device, a first natural language speech input corresponding to a first participant of the video event;
detecting a user gesture input;
in accordance with receiving the first natural language speech input, identify, by the digital assistant, based on context information associated with the video event, a first location of the first participant, including:
in accordance with a determination that the first natural language speech input refers to the first participant in the present tense, identifying the first location of the first participant as a location corresponding to the user gesture input when a portion of the first natural language speech input is received; and
in accordance with identifying the first location of the first participant, augment, by the digital assistant, the display of the video event with a first graphical overlay displayed at a first display location corresponding to the first location of the first participant.