US 11,869,537 B1
Language agnostic automated voice activity detection
Mayank Sharma, Bhopal (IN); Sandeep Joshi, Bangalore (IN); and Muhammad Raffay Hamid, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 10, 2021, as Appl. No. 17/523,777.
Application 17/523,777 is a continuation of application No. 16/436,351, filed on Jun. 10, 2019, granted, now 11,205,445.
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 25/84 (2013.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01); G10L 15/16 (2006.01); G10L 25/18 (2013.01)
CPC G10L 25/84 (2013.01) [G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01); G10L 25/18 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
determining, by one or more computer processors coupled to memory, an audio file associated with video content;
generating a plurality of audio segments using the audio file, the plurality of audio segments comprising a first segment and a second segment;
determining that the first segment comprises first voice activity;
determining that the second segment comprises second voice activity;
determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment;
generating an empty subtitle file comprising an indication that the voice activity is present between the first timestamp and the second timestamp; and
generating text data representing the voice activity that is present between the first timestamp and the second timestamp.