US 11,854,536 B2
Keyword spotting apparatus, method, and computer-readable recording medium thereof
Sang Il Ahn, Chungcheongbuk-do (KR); Seung Woo Choi, Seoul (KR); Seok Jun Seo, Seoul (KR); and Beom Jun Shin, Seoul (KR)
Assigned to Hyperconnect Inc., Seoul (KR)
Filed by Hyperconnect Inc., Seoul (KR)
Filed on Sep. 4, 2020, as Appl. No. 17/013,391.
Claims priority of application No. 10-2019-0111046 (KR), filed on Sep. 6, 2019; and application No. 10-2019-0130044 (KR), filed on Oct. 18, 2019.
Prior Publication US 2021/0074270 A1, Mar. 11, 2021
Prior Publication US 2023/0162724 A9, May 25, 2023
Int. Cl. G10L 15/16 (2006.01); G10L 25/24 (2013.01); G10L 15/08 (2006.01)
CPC G10L 15/16 (2013.01) [G10L 25/24 (2013.01); G10L 2015/088 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method of operation of an apparatus for keyword spotting, the method comprising:
obtaining, from an input voice, an input feature map;
wherein lengths in a channel direction of the input feature map are independently determined for a plurality of sections;
wherein the plurality of sections is obtained by dividing the input voice by a predetermined period; and
wherein each length is defined based on frequency data extracted from a corresponding section of the input voice and corresponds to frequency value for the corresponding section;
performing a convolution operation between the input feature map and at least one filter;
wherein performing the convolution operation comprises performing a first convolution operation between the input feature map and each of n different filters;
wherein the n different filters cover a frequency range of the input feature map and each have a channel length that is the same as the channel length for the input feature map; and
wherein the channel length is above zero;
storing a result of the convolution operation as an output feature map; and
extracting a keyword from the input voice based on the output feature map,
wherein each filter of the n different filters used in the first convolution operation is configured to distinguish characteristics of different voices corresponding to letter sounds.