US 12,328,566 B2
Information processing device, information processing terminal, information processing method, and program
Takuto Onishi, Tokyo (JP); Kazunobu Ookuri, Tokyo (JP); Hiroaki Shinohara, Tokyo (JP); Asako Tomura, Tokyo (JP); and Satsuki Sato, Tokyo (JP)
Assigned to Sony Group Corporation, Tokyo (JP); and Sony Interactive Entertainment Inc., Tokyo (JP)
Appl. No. 18/024,577
Filed by Sony Group Corporation, Tokyo (JP); and Sony Interactive Entertainment Inc., Tokyo (JP)
PCT Filed Sep. 10, 2021, PCT No. PCT/JP2021/033280
§ 371(c)(1), (2) Date Mar. 3, 2023,
PCT Pub. No. WO2022/054900, PCT Pub. Date Mar. 17, 2022.
Claims priority of application No. 2020-152419 (JP), filed on Sep. 10, 2020.
Prior Publication US 2023/0362571 A1, Nov. 9, 2023
Int. Cl. H04S 7/00 (2006.01)
CPC H04S 7/302 (2013.01) [H04S 2420/01 (2013.01)] 11 Claims
OG exemplary drawing
 
1. An information processing device comprising:
a storage unit that stores HRTF data corresponding to a plurality of positions based on a listening position; and
a sound image localization processing unit that performs a sound image localization process on sound data of an utterer by using the HRTF data according to an utterance situation of a participant participating in a conversation via a network;
a transmission processing unit that transmits sound data, of the utterer, obtained by performing the sound image localization process to a terminal used by each of the participants, each of the participants being a listener, wherein
the sound image localization processing unit performs the sound image localization process using the HRTF data according to a relationship between a position of the listener and a position of the utterer in a virtual space, and
when a localization position of a sound image of an utterance voice that is a voice of the utterer is selected based on the utterance situation, performs the sound image localization process using the HRTF data according to a relationship between a position of the listener and a localization position of a sound image of the utterance voice; and
the sound image localization processing unit selects a localization position of a sound image of each of the utterance voices according to a sound volume of each of the utterance voices as the utterance situation, or the sound image localization processing unit selects a localization position of a sound image of each of the utterance voices according to a content of an utterance as the utterance situation.