US 11,722,571 B1
	Recipient device presence activity monitoring for a communications session
Mario Chenier, Woodinville, WA (US); Tony Roy Hardie, Seattle, WA (US); Nawdesh Uppal, Mississauga (CA); Brian Oliver, Seattle, WA (US); and Ran Mokady, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 20, 2016, as Appl. No. 15/385,315.
Int. Cl. H04L 67/143 (2022.01); H04L 65/1069 (2022.01); H04L 67/306 (2022.01); G10L 25/84 (2013.01); G10L 15/18 (2013.01); G10L 17/06 (2013.01); G10L 15/22 (2006.01); H04L 67/54 (2022.01)

CPC H04L 67/143 (2013.01) [G10L 15/18 (2013.01); G10L 15/22 (2013.01); G10L 17/06 (2013.01); G10L 25/84 (2013.01); H04L 65/1069 (2013.01); H04L 67/306 (2013.01); H04L 67/54 (2022.05); G10L 2015/223 (2013.01)]

22 Claims

1. A method, comprising:

receiving, at an electronic device, first audio data representing an utterance, the first audio data being received from an initiating device;

generating first text data representing the first audio data by executing speech-to-text processing to the first audio data;

determining, using natural language understanding functionality, that the utterance is a request to establish a communications session with a named contact;

determining that a user account is associated with the initiating device;

determining that a recipient device is associated with a contact user account of the named contact;

determining, prior to establishing the communications session, that the user account is pre-authorized to establish a communications session with the contact user account of the named contact;

after determining the communications session has been pre-authorized, establishing a first communications session for the initiating device and the recipient device, the first communications session to be an audio call using a real time communication protocol;

receiving first data indicating that first sounds were received by a first microphone of the recipient device during a first amount of time after the first communications session is initiated;

determining from the first data a first time that the first sounds were received;

determining that the first sounds correspond to detection of a first speech activity;

after determining that the first sounds correspond to the detection of the first speech activity, determining the first sounds are non-device directed audio;

receiving second data indicating that second sounds were received by the first microphone during a second amount of time after an end of the first amount of time;

determining from the second data a second time that the second sounds were received;

determining that the second sounds correspond to a second detection of speech activity;

after determining the detection that the second sounds correspond to the second detection of speech activity, determining the second sounds are non-device directed audio;

determining that a third amount of time between the first time and the second time is greater than a predefined temporal threshold value;

determining that the first communications session is to end based, at least in part, on the first sounds and the second sounds corresponding to the non-device directed audio received at the recipient device and the third amount of time being greater than the predefined temporal threshold value; and

causing the first communications session to end.