US 12,223,034 B2
Secure voice interface in a streaming media device to avoid vulnerability attacks
Vinod Jatti, Karnataka (IN); and Remesh Kousalya Sugunan, Trivandrum (IN)
Assigned to ARRIS ENTERPRISES LLC, Horsham, PA (US)
Filed by ARRIS Enterprises LLC, Suwanee, GA (US)
Filed on Sep. 16, 2021, as Appl. No. 17/476,622.
Claims priority of provisional application 63/106,044, filed on Oct. 27, 2020.
Prior Publication US 2022/0129543 A1, Apr. 28, 2022
Int. Cl. G06F 21/53 (2013.01); G06F 21/60 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G06F 21/53 (2013.01) [G06F 21/602 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/223 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A smart media device for secure voice commands to an application executing within the smart media device while attached to the Internet, the Internet communicatively interconnects the smart media device with a remote voice-speech server and one or more content servers, the smart media device comprising:
a secure microphone input for accepting audio data containing voice commands and providing encrypted audio data using a pre-stored encryption key;
a trusted execution environment configured to decrypt the encrypted audio data from the secure microphone input;
a memory having instructions stored thereon; and
a processor configured to execute one or more instructions on the memory to cause the smart media device to:
based on a determination that the application uses secure voice commands and the remote voice-speech server, perform the following:
within the trusted execution environment: receive and decrypt the encrypted audio data;
send the decrypted audio data to the remote voice-speech server for processing so as to cause the remote voice-speech server to generate one or more application commands;
receive, by the application, the one or more application commands; and
perform, by the application, the one or more application commands; and
based on a determination that the secure voice commands are not to be processed by the remote voice-speech server, perform the following:
receive the encrypted audio data in the trusted execution environment;
receive and decrypt, within the trusted execution environment, the encrypted audio data;
generate, within the trusted environment, one or more other application commands corresponding to processing decrypted voice commands;
receive, by the application, the one or more other application commands; and
perform, by the application, the one or more other application commands.