US 12,456,300 B2
	Detection of titles in presentation slides of a communication session
Renjie Tao, Sunnyvale, CA (US); and Ling Tsou, Lawndale, CA (US)
Assigned to Zoom Communications, Inc., San Jose, CA (US)
Filed by Zoom Communications, Inc., San Jose, CA (US)
Filed on Jun. 4, 2022, as Appl. No. 17/832,637.
Prior Publication US 2023/0394827 A1, Dec. 7, 2023
Int. Cl. G06V 20/40 (2022.01); G06V 40/12 (2022.01)

CPC G06V 20/46 (2022.01) [G06V 40/1347 (2022.01)]

20 Claims

1. A method, comprising:

receiving video content of a communication session comprising a plurality of participants;

extracting frames from the video content;

classifying the frames of the video content;

identifying one or more distinguishing frames comprising a presentation slide, wherein identifying the one or more distinguishing frames comprises:

removing a thumbnail of a participant video feed from each of the frames;

inverting colors of each of the frames to obtain inverted frames to enhance text detection speed;

obtaining a pixel value summation for each of the inverted frames;

calculating a pixel value summation difference between adjacent inverted frames; and

determining a distinguishing frame when the pixel value summation difference meets a threshold;

for each distinguishing frame comprising the presentation slide, detecting a title within the frame, wherein detecting the title within the frame is based on one or more title detection rules, wherein one or more candidate titles are determined prior to determining the title, wherein one of the title detection rules comprises determining that a number of candidate titles determined for the frame does not exceed a threshold number of candidate titles, and wherein one of the title detection rules comprises determining that a font size for the title meets or exceeds a threshold ratio of font size relative to other text within the frame;

formatting, by a processing engine the titles for each distinguishing frame into a JavaScript Object Notation (JSON) file format; and

transmitting, by the processing engine, to one or more client devices, the titles for each of the distinguishing frames, which are extracted, via optical character recognition (OCR) technology, the distinguishing frames comprising the presentation slide in the JSON file format.