US 11,854,536 B2
	Keyword spotting apparatus, method, and computer-readable recording medium thereof
Sang Il Ahn, Chungcheongbuk-do (KR); Seung Woo Choi, Seoul (KR); Seok Jun Seo, Seoul (KR); and Beom Jun Shin, Seoul (KR)
Assigned to Hyperconnect Inc., Seoul (KR)
Filed by Hyperconnect Inc., Seoul (KR)
Filed on Sep. 4, 2020, as Appl. No. 17/013,391.
Claims priority of application No. 10-2019-0111046 (KR), filed on Sep. 6, 2019; and application No. 10-2019-0130044 (KR), filed on Oct. 18, 2019.
Prior Publication US 2021/0074270 A1, Mar. 11, 2021 Prior Publication US 2023/0162724 A9, May 25, 2023
Int. Cl. G10L 15/16 (2006.01); G10L 25/24 (2013.01); G10L 15/08 (2006.01)

CPC G10L 15/16 (2013.01) [G10L 25/24 (2013.01); G10L 2015/088 (2013.01)]

17 Claims

1. A method of operation of an apparatus for keyword spotting, the method comprising:

obtaining, from an input voice, an input feature map;

wherein lengths in a channel direction of the input feature map are independently determined for a plurality of sections;

wherein the plurality of sections is obtained by dividing the input voice by a predetermined period; and

wherein each length is defined based on frequency data extracted from a corresponding section of the input voice and corresponds to frequency value for the corresponding section;

performing a convolution operation between the input feature map and at least one filter;

wherein performing the convolution operation comprises performing a first convolution operation between the input feature map and each of n different filters;

wherein the n different filters cover a frequency range of the input feature map and each have a channel length that is the same as the channel length for the input feature map; and

wherein the channel length is above zero;

storing a result of the convolution operation as an output feature map; and

extracting a keyword from the input voice based on the output feature map,

wherein each filter of the n different filters used in the first convolution operation is configured to distinguish characteristics of different voices corresponding to letter sounds.