US 11,967,340 B2
	Method for detecting speech in audio data
Subong Choi, Seoul (KR); Dongchan Shin, Seoul (KR); and Jihwa Lee, Seoul (KR)
Assigned to ActionPower Corp., Seoul (KR)
Filed by ActionPower Corp., Seoul (KR)
Filed on Jun. 23, 2023, as Appl. No. 18/340,767.
Claims priority of application No. 10-2022-0077482 (KR), filed on Jun. 24, 2022.
Prior Publication US 2023/0419988 A1, Dec. 28, 2023
Int. Cl. G10L 25/78 (2013.01); G10L 21/0272 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01)

CPC G10L 25/78 (2013.01) [G10L 21/0272 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01)]

10 Claims

1. A method for detecting a voice from audio data, performed by a computing device including at least one processor, the method comprising:

obtaining audio data;

generating image data based on a spectrum of the obtained audio data;

extracting a feature on the image data by utilizing a first neural network model; and

extracting a time-series feature of the image data by utilizing a second neural network model sequentially connected with the first neural network model;

determining whether an automated response system (ARS) voice is included in each of a plurality of sections of the audio data, based on the feature on the image data and the time-series feature of the image data;

eliminating a section including the ARS voice, among the plurality of sections of the audio data; and

generating input data or training data for analysis of spoken voice based on remaining sections excluding the eliminated section, among the plurality of sections of the audio data.