US 11,989,976 B2
	Nonverbal information generation apparatus, nonverbal information generation model learning apparatus, methods, and programs
Ryo Ishii, Tokyo (JP); Ryuichiro Higashinaka, Tokyo (JP); Taichi Katayama, Tokyo (JP); Junji Tomita, Tokyo (JP); Nozomi Kobayashi, Tokyo (JP); and Kyosuke Nishida, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 16/969,765
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Feb. 15, 2019, PCT No. PCT/JP2019/005639 § 371(c)(1), (2) Date Aug. 13, 2020, PCT Pub. No. WO2019/160100, PCT Pub. Date Aug. 22, 2019.
Claims priority of application No. 2018-026516 (JP), filed on Feb. 16, 2018; application No. 2018-026517 (JP), filed on Feb. 16, 2018; application No. 2018-097338 (JP), filed on May 21, 2018; application No. 2018-097339 (JP), filed on May 21, 2018; and application No. 2018-230310 (JP), filed on Dec. 7, 2018.
Prior Publication US 2020/0401794 A1, Dec. 24, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 40/20 (2022.01); G06N 20/00 (2019.01); G06V 40/10 (2022.01); G10L 15/22 (2006.01)

CPC G06V 40/28 (2022.01) [G06N 20/00 (2019.01); G06V 40/10 (2022.01); G10L 15/22 (2013.01); G10L 2015/225 (2013.01)]

18 Claims

12. A nonverbal information generation model learning apparatus comprising:

a hardware processor that:

acquires voice information corresponding to voice of a speaker and time information representing times of predetermined units when the voice information is emitted;

acquires nonverbal information representing information relating to behavior of the speaker when the speaker performed speaking corresponding to the voice and time information representing times at which the behavior was performed and corresponding to the nonverbal information, and creates time-information-stamped nonverbal information;

extracts time-information-stamped voice feature quantities representing feature quantities of the voice information from the acquired voice information and the time information corresponding to the voice information; and

learns a nonverbal information generation model for generating the acquired time-information-stamped nonverbal information on the basis of the extracted time-information-stamped voice feature quantities.