US 11,943,074 B2
	Real-time video-based audience reaction sentiment analysis
Vi Dinh Chau, Seattle, WA (US)
Assigned to Zoom Video Communications, Inc., San Jose, CA (US)
Filed by Zoom Video Communications, Inc., San Jose, CA (US)
Filed on Oct. 29, 2021, as Appl. No. 17/514,918.
Prior Publication US 2023/0134143 A1, May 4, 2023
Int. Cl. H04L 12/18 (2006.01); G06V 10/50 (2022.01); G06V 40/20 (2022.01)

CPC H04L 12/1827 (2013.01) [G06V 10/507 (2022.01); G06V 40/20 (2022.01); H04L 12/1831 (2013.01)]

19 Claims

1. A method, comprising:

generating, by a transcription engine configured for automatic speech recognition, a real-time transcription of audio data from one or more participants temporally related to video data, wherein participants include a speaker participant and one or more audience participants;

determining, by a processing of the video data using a machine learning system, meanings of reactions of the one or more audience participants to the speaker participant during a video conference based on the real-time transcription of the video conference and the reactions,

wherein the video data visually represents the reactions,

wherein determining the meanings of the reactions based on the real-time transcription of the video conference and the reactions includes determining a context associated with the speaker participant based on the real-time transcription of the video conference, and

wherein the context associated with the speaker participant relates to a purpose of the conference;

determining, by a server during the video conference, sentiment types of the one or more audience participants based on the determined meanings of the reactions, and maintaining a count of the determined sentiment types aggregated as bins in a histogram;

determining, by the server during the video conference, an engagement level based on the most frequent bin in the histogram;

causing, by the server, an outputting of a real-time recommendation based on the engagement level at a device associated with the speaker participant during the video conference; and

providing, by the server using the machine learning system, a real-time recommendation to the speaker participant based on real-time recommendations that were previously effective, which are determined based on an aggregation of engagement levels, real-time recommendations, and speaker participant behaviors over multiple conferences.