US 12,444,424 B2
	Sound data processing device and sound data processing method
Shota Ohkubo, Kanagawa (JP); Hirofumi Inoue, Kanagawa (JP); Masaki Okamoto, Kanagawa (JP); Jo Nishiyama, Kanagawa (JP); Jun Kasai, Kanagawa (JP); Takehito Teraguchi, Kanagawa (JP); Yu Shikoda, Kanagawa (JP); and Fangge Chen, Kanagawa (JP)
Assigned to Nissan Motor Co., Ltd., Kanagawa (JP)
Appl. No. 17/907,037
Filed by Nissan Motor Co., Ltd., Kanagawa (JP)
PCT Filed Mar. 25, 2020, PCT No. PCT/IB2020/000323 § 371(c)(1), (2) Date Sep. 22, 2022, PCT Pub. No. WO2021/191651, PCT Pub. Date Sep. 30, 2021.
Prior Publication US 2023/0121586 A1, Apr. 20, 2023
Int. Cl. G10L 17/22 (2013.01); G06F 3/01 (2006.01); G06V 20/59 (2022.01); G06V 40/20 (2022.01)

CPC G10L 17/22 (2013.01) [G06F 3/013 (2013.01); G06V 20/59 (2022.01); G06V 40/28 (2022.01)]

13 Claims

1. A sound data processing device comprising:

a sound collecting device, an imaging device for capturing a cabin of a vehicle, a speaker, a database, an output device, a sound data processing device, and a control device,

the control device being configured to:

acquire first sound data in a cabin of a vehicle from the sound collecting device;

acquire position information of a sound source, including installation position of the speaker installed in the cabin and position of an occupant in a state of sitting in a seat;

specify an attention object that is an object to which an occupant (U) of the vehicle directs attention and the sound source corresponding to the attention object, based on a captured image captured by the imaging device and the position information of the sound source, wherein a first occupant is specified as the sound source when a conversation is held between the occupant (U) and the first occupant and the speaker is specified as the sound source when the occupant (U) is gazing at a device in the cabin;

recognize a second occupant conversing with the first occupant as a related object in the cabin in which the occupant (U), the first occupant, and the second occupant are conversing;

generate second sound data from the first sound data by emphasizing sound related to the attention object and the related object without changing a number of sound sources of sound heard by the occupant and a sound position at which a sound image is localized with reference to the occupant; and

output the second sound data from the speaker to the occupant.