US 12,254,893 B2
	Electronic device for recognizing sound and method thereof
Jubum Han, Suwon-si (KR); Hosang Sung, Suwon-si (KR); Yeaseul Song, Suwon-si (KR); and Jeonghoon Lee, Suwon-si (KR)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Feb. 8, 2023, as Appl. No. 18/107,185.
Application 18/107,185 is a continuation of application No. PCT/KR2023/000604, filed on Jan. 12, 2023.
Claims priority of application No. 10-2022-0032999 (KR), filed on Mar. 16, 2022; and application No. 10-2022-0122409 (KR), filed on Sep. 27, 2022.
Prior Publication US 2023/0298614 A1, Sep. 21, 2023
Int. Cl. G10L 25/51 (2013.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01); G10L 21/12 (2013.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01)

CPC G10L 25/51 (2013.01) [G10L 15/063 (2013.01); G10L 21/12 (2013.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01)]

15 Claims

1. A sound recognition method comprising:

sampling input sound based on a preset sampling rate; and

performing Fast Fourier Transform (FFT) on the sampled input sound based on at least one of random FFT numbers or random hop lengths, and generating a two-dimensional (2D) feature map, with a time axis and a frequency axis, from the sampled input sound on which FFT is performed,

wherein the generating of the 2D feature map comprises:

transforming the sampled input sound into first FFT data based on at least one of a first FFT number among the random FFT numbers or a first hop length among the random hop lengths, generating a first 2D feature map including a first feature from the first FFT data, transforming the sampled input sound into n^thFFT data based on at least one of an n^thFFT number among the random FFT numbers and an n^thhop length among the random hop lengths, and generating an n^th2D feature map including an n^thfeature from the n^thFFT data, where n is greater than 1; and

training a neural network model, which recognizes sound, with a plurality of 2D feature maps including the first 2D feature map and the n^th2D feature map as training data.