US 12,322,407 B2
	Artificial intelligence device configured to generate a mask value
Jaepil Seo, Seoul (KR); Sungmoon Cho, Seoul (KR); Sangjun Oh, Seoul (KR); and Hyeonsik Choi, Seoul (KR)
Assigned to LG ELECTRONICS INC., Seoul (KR)
Filed by LG ELECTRONICS INC., Seoul (KR)
Filed on Aug. 16, 2022, as Appl. No. 17/820,220.
Claims priority of application No. PCT/KR2022/007632 (WO), filed on May 30, 2022.
Prior Publication US 2023/0386491 A1, Nov. 30, 2023
Int. Cl. G10L 21/0216 (2013.01)

CPC G10L 21/0216 (2013.01) [G10L 2021/02166 (2013.01)]

9 Claims

1. An artificial intelligence device comprising:

a plurality of microphones; and

a processor configured to:

receive a video signal and a plurality of voice signals each respectively input from a corresponding microphone among the plurality of microphones;

obtain, based on the received video signal, an angle between a reference microphone and a specific speaker corresponding to a specific speaker image from the received video signal;

determine a first output value by performing adaptive beamforming based on the received plurality of voice signals and the obtained angle;

determine a second output value by performing fixed beamforming based on two voice signals input through two preset microphones among the received plurality of voice signals and the obtained angle;

generate a mask value based on the determined first output value, the determined second output value, and a video zooming magnification;

generate an enhancement signal based on the generated mask value and a phase of the second output value;

convert each of the two voice signals into a power spectrum;

obtain the second output value by performing the fixed beamforming to increase power of a point corresponding to the obtained angle from the converted power spectrum; and

generate the mask value according to Equation 1 below:

wherein E_Adaptive(k,l) denotes the first output value according to a k-th frequency and an l-th adaptive beamformer,

|E_Adaptive(k,l)| denotes a square root value of gain of the first output value,

E_fixed(k,l) denotes the second output value according to a k-th frequency and an l-th fixed beamformer,

|E_Fixed (k,l)| denotes a square root value of gain of the second output value,

β is set to 0 in case of a minimum magnification, β=|E_Fixed (k,l)| in case of a maximum magnification, and MAX (α)/α in the other case, and

α denotes the video zooming magnification.