US 11,722,571 B1 | ||
Recipient device presence activity monitoring for a communications session | ||
Mario Chenier, Woodinville, WA (US); Tony Roy Hardie, Seattle, WA (US); Nawdesh Uppal, Mississauga (CA); Brian Oliver, Seattle, WA (US); and Ran Mokady, Seattle, WA (US) | ||
Assigned to Amazon Technologies, Inc., Seattle, WA (US) | ||
Filed by Amazon Technologies, Inc., Seattle, WA (US) | ||
Filed on Dec. 20, 2016, as Appl. No. 15/385,315. | ||
Int. Cl. H04L 67/143 (2022.01); H04L 65/1069 (2022.01); H04L 67/306 (2022.01); G10L 25/84 (2013.01); G10L 15/18 (2013.01); G10L 17/06 (2013.01); G10L 15/22 (2006.01); H04L 67/54 (2022.01) |
CPC H04L 67/143 (2013.01) [G10L 15/18 (2013.01); G10L 15/22 (2013.01); G10L 17/06 (2013.01); G10L 25/84 (2013.01); H04L 65/1069 (2013.01); H04L 67/306 (2013.01); H04L 67/54 (2022.05); G10L 2015/223 (2013.01)] | 22 Claims |
1. A method, comprising:
receiving, at an electronic device, first audio data representing an utterance, the first audio data being received from an initiating device; generating first text data representing the first audio data by executing speech-to-text processing to the first audio data; determining, using natural language understanding functionality, that the utterance is a request to establish a communications session with a named contact; determining that a user account is associated with the initiating device; determining that a recipient device is associated with a contact user account of the named contact; determining, prior to establishing the communications session, that the user account is pre-authorized to establish a communications session with the contact user account of the named contact; after determining the communications session has been pre-authorized, establishing a first communications session for the initiating device and the recipient device, the first communications session to be an audio call using a real time communication protocol; receiving first data indicating that first sounds were received by a first microphone of the recipient device during a first amount of time after the first communications session is initiated; determining from the first data a first time that the first sounds were received; determining that the first sounds correspond to detection of a first speech activity; after determining that the first sounds correspond to the detection of the first speech activity, determining the first sounds are non-device directed audio; receiving second data indicating that second sounds were received by the first microphone during a second amount of time after an end of the first amount of time; determining from the second data a second time that the second sounds were received; determining that the second sounds correspond to a second detection of speech activity; after determining the detection that the second sounds correspond to the second detection of speech activity, determining the second sounds are non-device directed audio; determining that a third amount of time between the first time and the second time is greater than a predefined temporal threshold value; determining that the first communications session is to end based, at least in part, on the first sounds and the second sounds corresponding to the non-device directed audio received at the recipient device and the third amount of time being greater than the predefined temporal threshold value; and causing the first communications session to end. |