US 11,922,933 B2
	Voice processing device and voice processing method
Tetsuto Kawai, Aichi (JP)
Assigned to YAMAHA CORPORATION, Hamamatsu (JP)
Filed by YAMAHA CORPORATION, Hamamatsu (JP)
Filed on Jun. 2, 2020, as Appl. No. 16/889,965.
Claims priority of application No. 2019-106859 (JP), filed on Jun. 7, 2019.
Prior Publication US 2020/0388275 A1, Dec. 10, 2020
Int. Cl. G10L 25/78 (2013.01); G06F 17/18 (2006.01); G10L 15/16 (2006.01); G10L 15/20 (2006.01); G10L 25/24 (2013.01)

CPC G10L 15/20 (2013.01) [G06F 17/18 (2013.01); G10L 15/16 (2013.01); G10L 25/24 (2013.01); G10L 25/78 (2013.01)]

12 Claims

1. A voice processing method comprising:

collecting a first audio signal collected by a first microphone from a sound source and a second audio signal collected by a second microphone from the sound source;

estimating probability of the first audio signal including a person's voice;

setting a probability value to:

a first probability value indicative of the first audio signal including a person's voice; and

a second probability value of zero indicative of the first audio signal not including a person's voice;

estimating a correlation value of the first audio signal and the second audio signal;

obtaining a time difference between waveforms of the first audio signal and the second audio signal based on the estimated correlation value;

determining that the first audio signal:

includes a person's voice, in a state where the time difference is greater than a predetermined value; and

does not include a person's voice, in a state where the time difference is not greater than the predetermined value;

determining a gain of the first audio signal to be:

from among a range of greater than zero and less than one, in a state where the first probability value is set and the time difference is greater than the predetermined value; and

zero, in a state where the time difference is not greater than the predetermined value;

processing the first audio signal based on the determined gain of the first audio signal to improve an audio quality at a far-end side; and

sending the processed audio signal to the far-end side, where a voice processing device located at the far-end side reproduces the received processed audio signal to emit sound from a speaker.