US 12,444,432 B2
	Emotion tag assigning system, method, and program
Mitsuru Sawano, Kanagawa (JP)
Assigned to FUJIFILM Corporation, Tokyo (JP)
Filed by FUJIFILM Corporation, Tokyo (JP)
Filed on Aug. 9, 2022, as Appl. No. 17/883,608.
Claims priority of application No. 2021-130617 (JP), filed on Aug. 10, 2021.
Prior Publication US 2023/0049225 A1, Feb. 16, 2023
Int. Cl. G10L 25/63 (2013.01); G10L 17/04 (2013.01); G10L 25/78 (2013.01)

CPC G10L 25/63 (2013.01) [G10L 17/04 (2013.01); G10L 25/78 (2013.01)]

9 Claims

1. An emotion tag assigning system comprising:

a processor;

a microphone that detects voice data indicating a voice uttered by a person who participates in an event using a content during execution of the event; and

an emotion recognizer that recognizes an emotion of the person based on the voice data, wherein the emotion recognizer is implemented by the processor,

wherein the processor

acquires emotion information indicating the emotion of the person recognized by the emotion recognizer during the execution of the event using the content, and

assigns an emotion rank calculated from the acquired emotion information to the content as an emotion tag,

wherein the emotion recognizer is a recognizer implemented by the processor that is subjected to machine learning using, as training data, a large number of pieces of voice data including voice data of a voice uttered in a case where a person is delighted and voice data of a voice uttered in a case where the person is not delighted,

wherein the content is a plurality of images,

wherein the processor acquires, from the emotion recognizer, a plurality of pieces of emotion information in a time zone in which the plurality of images are reproduced, calculates an emotion rank corresponding to each image from a representative value of the plurality of pieces of emotion information, and assigns the calculated emotion rank to each image as the emotion tag by recording the emotion tag in a header of an image file comprising the plurality of images,

wherein in a case where a plurality of persons participate in the event, the processor specifies one or more main speakers in the time zone in which the plurality of images are reproduced based on the voice data detected by the microphone, and assigns speaker identification information indicating the specified one or more main speakers to each image by recording the speaker identification information in the header of the image file,

wherein the processor displays the emotion rank and the speaker identification information simultaneously with the plurality of images during the reproduction of the plurality of images by the image reproduction device.