US 12,424,227 B2
Sound signal refinement method, sound signal decode method, apparatus thereof, program, and storage medium
Ryosuke Sugiura, Tokyo (JP); Takehiro Moriya, Tokyo (JP); and Yutaka Kamamoto, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 18/031,595
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Nov. 5, 2020, PCT No. PCT/JP2020/041402
§ 371(c)(1), (2) Date Apr. 12, 2023,
PCT Pub. No. WO2022/097239, PCT Pub. Date May 12, 2022.
Prior Publication US 2023/0386481 A1, Nov. 30, 2023
Int. Cl. G10L 19/008 (2013.01)
CPC G10L 19/008 (2013.01) 13 Claims
OG exemplary drawing
 
1. A sound signal purification method for obtaining, for each frame, an n-th channel purified decoded sound signal ˜Xn that is a sound signal of each channel of stereo by using at least an n-th channel decoded sound signal  Xn (n is each integer of 1 or more and 2 or less) that is a decoded sound signal of the each channel of the stereo obtained by decoding a stereo code CS and a monaural decoded sound signal  XM that is a monaural decoded sound signal obtained by decoding a monaural code CM that is a code different from the stereo code CS, wherein
the n-th channel decoded sound signal  Xn is obtained by decoding the stereo code CS without using either information obtained by decoding the monaural code CM or the monaural code CM, and
the sound signal purification method comprises
a decoded sound common signal estimation step of obtaining, for the each frame, a decoded sound common signal  YM that is a signal common to all channels of the stereo by using at least all of one or more and two or less n-th channel decoded sound signals  Xn,
a decoded sound common signal upmixing step of obtaining, for the each frame, an n-th channel upmixed common signal  YMn that is a signal obtained by upmixing the decoded sound common signal  YM for the each channel by an upmixing process using the decoded sound common signal  YM and inter-channel relationship information that is information indicating a relationship between the channels of the stereo,
a monaural decoded sound upmixing step of obtaining, for the each frame, an n-th channel upmixed monaural decoded sound signal  XMn that is a signal obtained by upmixing the monaural decoded sound signal  XM for the each channel by an upmixing process using the monaural decoded sound signal  XM and information indicating a relationship between the channels of the stereo,
an n-th channel signal purification step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ˜yMn(t)=(1−αMn)× yMn(t)+αMn× xMn(t) obtained by adding a value αMn× xMn(t) obtained by multiplying an n-th channel purification weight αMn by a sample value  xMn(t) of the n-th channel upmixed monaural decoded sound signal  XMn and a value (1−αMn)× yMn(t) obtained by multiplying a value (1−αMn) obtained by subtracting the n-th channel purification weight αMn from 1 by a sample value  yMn(t) of the n-th channel upmixed common signal  YMn, as an n-th channel purified upmixed signal ˜YMn,
an n-th channel separation combination weight estimation step of obtaining, for the each frame with respect to the each channel n, a normalized inner product value for the n-th channel upmixed common signal  YMn of the n-th channel decoded sound signal  Xn as an n-th channel separation combination weight βn, and
an n-th channel separation combination step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ˜xn(t)= xn(t)−βn× yMn(t)+βnטyMn(t) obtained by subtracting a value βn× yMn(t) obtained by multiplying the n-th channel separation combination weight βn by the sample value  yMn(t) of the n-th channel upmixed common signal  YMn from a sample value  xn(t) of the n-th channel decoded sound signal  Xn and adding a value βn× yMn(t) obtained by multiplying the n-th channel separation combination weight βn by a sample value  yMn(t) of the n-th channel purified upmixed signal  YMn, as the n-th channel purified decoded sound signal ˜Xn,
the inter-channel relationship information includes information indicating a number of samples |τ| corresponding to a time difference between channels of a first channel and a second channel, information indicating which of the first channel and the second channel is preceding, and an inter-channel correlation coefficient γ that is a correlation coefficient between a first channel decoded sound signal and a second channel decoded sound signal, and
the decoded sound common signal upmixing step
uses the decoded sound common signal without change as a temporary first channel upmixed common signal Y′M1 and uses a signal obtained by delaying the decoded sound common signal by |τ| samples as a temporary second channel upmixed common signal Y′M2 in a case where the first channel is preceding,
uses a signal obtained by delaying the decoded sound common signal by |τ| samples as a temporary first channel upmixed common signal Y′M1 and uses the decoded sound common signal without change as a temporary second channel upmixed common signal Y′M2 in a case where the second channel is preceding, and
obtains, with respect to the each channel n, a sequence based on  yMN(t)=(1−γ)× xn(t)+γ×y′Mn(t) based on a sample value y′Mn(t) of the temporary n-th channel upmixed common signal Y′Mn, a sample value  xn(t) of the n-th channel decoded sound signal  Xn, and the inter-channel correlation coefficient γ as the n-th channel upmixed common signal  YMn.