US 12,438,739 B2
Method and system for processing remote active speech during a call
Joseph M. Williams, Morgan Hill, CA (US); Eric H. Zhang, San Jose, CA (US); Taylor G. Carrigan, San Francisco, CA (US); Darin B. Adler, Los Gatos, CA (US); and David L. Biderman, Los Gatos, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on May 6, 2022, as Appl. No. 17/738,943.
Claims priority of provisional application 63/189,075, filed on May 15, 2021.
Prior Publication US 2022/0368554 A1, Nov. 17, 2022
Int. Cl. H04L 12/18 (2006.01); H04N 21/43 (2011.01); H04N 21/4788 (2011.01)
CPC H04L 12/1818 (2013.01) [H04L 12/1831 (2013.01); H04N 21/43076 (2020.08); H04N 21/4788 (2013.01)] 24 Claims
OG exemplary drawing
 
1. A method comprising:
initiating a call between a first electronic device and a second electronic device;
during the call, initiating, at the first electronic device, a joint media playback session in which the first and second electronic devices independently stream media content for synchronous playback;
receiving output from a voice activity detector (VAD) at a first instance along a playback duration of the media content;
determining that a downlink signal from the second electronic device includes speech based on the output from the VAD;
in response to determining that the downlink signal includes speech, applying a scalar gain to an audio signal of the media content to reduce a signal level of the audio signal;
driving a speaker with a mix of the downlink signal and the audio signal;
receiving subsequent output from the VAD at a second instance along the playback duration that is subsequent to the first instance; and
in response to determining that the downlink signal has ceased to include the speech based on the subsequent output:
pausing playback of the media content; and
continuing playback of the media content at or before the first instance along the playback duration of the media content,
wherein the synchronous playback of the media content occurs at least between the receiving of the output from the VAD and the receiving of the subsequent output from the VAD.