US 12,148,432 B2
	Signal processing device, signal processing method, and signal processing system
Atsuo Hiroe, Tokyo (JP)
Assigned to SONY GROUP CORPORATION, Tokyo (JP)
Appl. No. 17/756,874
Filed by SONY GROUP CORPORATION, Tokyo (JP)
PCT Filed Dec. 10, 2020, PCT No. PCT/JP2020/046023 § 371(c)(1), (2) Date Jun. 3, 2022, PCT Pub. No. WO2021/125037, PCT Pub. Date Jun. 24, 2021.
Claims priority of application No. 2019-227192 (JP), filed on Dec. 17, 2019.
Prior Publication US 2023/0005488 A1, Jan. 5, 2023
Int. Cl. G10L 25/78 (2013.01); G10L 17/18 (2013.01); G10L 21/034 (2013.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01)

CPC G10L 17/18 (2013.01) [G10L 21/034 (2013.01); G10L 25/78 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01)]

12 Claims

1. A signal processing device, comprising:

a main speech detection unit configured to:

receive a first signal from a first sound collection device, a second signal from a second sound collection device, and a third signal from a third sound collection device, wherein

the first sound collection device is associated with a first speaker,

the second sound collection device is associated with a second speaker, and

the third sound collection device is associated with a third speaker;

input, to a first neural network, the first signal and the second signal to obtain first information;

input, to a second neural network, the first signal and the third signal to obtain second information;

input, to a third neural network, the second signal and the third signal to obtain third information;

detect, by integration of the first information and the second information, a presence or an absence of a main speech of the first speaker in the first signal;

output first frame information indicating the presence or the absence of the main speech of the first speaker in the first signal;

detect, by integration of the first information and the third information, a presence or an absence of a main speech of the second speaker in the second signal;

output second frame information indicating the presence or the absence of the main speech of the second speaker in the second signal;

detect, by integration of the second information and the third information, a presence or an absence of a main speech of the third speaker in the third signal; and

output third frame information indicating the presence or the absence of the main speech of the third speaker in the third signal.