US 12,451,148 B2
Sound signal refinement method, sound signal decode method, apparatus thereof, program, and storage medium
Ryosuke Sugiura, Tokyo (JP); Takehiro Moriya, Tokyo (JP); and Yutaka Kamamoto, Tokyo (JP)
Assigned to NTT, Inc., Tokyo (JP)
Appl. No. 18/031,579
Filed by NTT, Inc., Tokyo (JP)
PCT Filed Nov. 5, 2020, PCT No. PCT/JP2020/041400
§ 371(c)(1), (2) Date Apr. 12, 2023,
PCT Pub. No. WO2022/097237, PCT Pub. Date May 12, 2022.
Prior Publication US 2023/0377585 A1, Nov. 23, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 19/008 (2013.01); G10L 21/043 (2013.01)
CPC G10L 19/008 (2013.01) [G10L 21/043 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A sound signal purification method for obtaining, a purified decoded sound signal representing a sound signal of a channel of stereo,
the sound signal purification method comprising:
a decoded sound common signal estimation step of obtaining, for each frame, a decoded sound common signal  YM that is a signal common to all channels of the stereo by using at least all of one or more and N or less n-th channel decoded sound signals  Xn, wherein n represents each integer of 1 or more and N or less, the n-th channel decoded sound signal  Xn represents a decoded sound signal of the each channel of the stereo, the n-th channel decoded sound signal  Xn is obtained by decoding a stereo code CS and a monaural decoded sound signal  XM represents a monaural decoded sound signal, the monaural decoded sound signal  XM is obtained by decoding a monaural code CM that is a code different from the stereo code CS, the n-th channel decoded sound signal  Xn is obtained by decoding the stereo code CS without using either information obtained by decoding the monaural code CM or the monaural code CM;
a decoded sound common signal upmixing step of obtaining, for the each frame, an n-th channel upmixed common signal  YMn, wherein the n-th channel upmixed common signal  YMn is obtained by upmixing the decoded sound common signal  YM for the each channel by an upmixing process using the decoded sound common signal  YM and inter-channel relationship information, and the inter-channel relationship information indicates a relationship between the channels of the stereo;
a monaural decoded sound upmixing step of obtaining, for the each frame, an n-th channel upmixed monaural decoded sound signal  XMn, wherein the n-th channel upmixed monaural decoded sound signal  XMn is obtained by upmixing the monaural decoded sound signal  XM for the each channel by an upmixing process using the monaural decoded sound signal  XM and the inter-channel relationship information;
an n-th channel signal purification step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ˜yMn(t)=(1−αMn)× yMn(t)+αMn× xMn(t) as an n-th channel purified upmixed signal ˜YMn, wherein the value ˜yMn(t) is obtained by adding a value αMn× XMn(t) and a value (1−αMn)× yMn(t), the value αMn× XMn(t) is obtained by multiplying an n-th channel purification weight αMn by a sample value  xMn(t) of the n-th channel upmixed monaural decoded sound signal  XMn, the value (1−αMn)× yMn(t) is obtained by multiplying a value (1−αMn) by a sample value  yMn(t) of the n-th channel upmixed common signal  YMn, and the value (1−αMn) is obtained by subtracting the n-th channel purification weight αMn from 1;
an n-th channel separation combination weight estimation step of obtaining, for the each frame with respect to the each channel n, a normalized inner product value for the n-th channel upmixed common signal  YMn of the n-th channel decoded sound signal  Xn as an n-th channel separation combination weight βn; and
an n-th channel separation combination step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ˜xn(t)= xn(t)−βn× yMn(t)+βnטyMn (t) as an n-th channel purified decoded sound signal ˜Xn, wherein the value ˜xn(t) is obtained by subtracting a value βn× yMn(t) from a sample value  xn(t) of the n-th channel decoded sound signal  Xn and adding a value βnטyMn(t), the value βn× yMn(t) is obtained by multiplying the n-th channel separation combination weight βn by the sample value  yMn(t) of the n-th channel upmixed common signal  YMn, and the value βnטyMn(t) is obtained by multiplying the n-th channel separation combination weight βn by a sample value ˜yMn(t) of the n-th channel purified upmixed signal ˜YMn.