US 12,148,432 B2
Signal processing device, signal processing method, and signal processing system
Atsuo Hiroe, Tokyo (JP)
Assigned to SONY GROUP CORPORATION, Tokyo (JP)
Appl. No. 17/756,874
Filed by SONY GROUP CORPORATION, Tokyo (JP)
PCT Filed Dec. 10, 2020, PCT No. PCT/JP2020/046023
§ 371(c)(1), (2) Date Jun. 3, 2022,
PCT Pub. No. WO2021/125037, PCT Pub. Date Jun. 24, 2021.
Claims priority of application No. 2019-227192 (JP), filed on Dec. 17, 2019.
Prior Publication US 2023/0005488 A1, Jan. 5, 2023
Int. Cl. G10L 25/78 (2013.01); G10L 17/18 (2013.01); G10L 21/034 (2013.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01)
CPC G10L 17/18 (2013.01) [G10L 21/034 (2013.01); G10L 25/78 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A signal processing device, comprising:
a main speech detection unit configured to:
receive a first signal from a first sound collection device, a second signal from a second sound collection device, and a third signal from a third sound collection device, wherein
the first sound collection device is associated with a first speaker,
the second sound collection device is associated with a second speaker, and
the third sound collection device is associated with a third speaker;
input, to a first neural network, the first signal and the second signal to obtain first information;
input, to a second neural network, the first signal and the third signal to obtain second information;
input, to a third neural network, the second signal and the third signal to obtain third information;
detect, by integration of the first information and the second information, a presence or an absence of a main speech of the first speaker in the first signal;
output first frame information indicating the presence or the absence of the main speech of the first speaker in the first signal;
detect, by integration of the first information and the third information, a presence or an absence of a main speech of the second speaker in the second signal;
output second frame information indicating the presence or the absence of the main speech of the second speaker in the second signal;
detect, by integration of the second information and the third information, a presence or an absence of a main speech of the third speaker in the third signal; and
output third frame information indicating the presence or the absence of the main speech of the third speaker in the third signal.