US 12,444,221 B2
	Extraction of textual content from video of a communication session
Renjie Tao, Sunnyvale, CA (US); and Ling Tsou, Lawndale, CA (US)
Assigned to Zoom Communications, Inc., San Jose, CA (US)
Filed by Zoom Communications, Inc., San Jose, CA (US)
Filed on Jun. 4, 2022, as Appl. No. 17/832,635.
Prior Publication US 2023/0394861 A1, Dec. 7, 2023
Int. Cl. G06V 30/19 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/62 (2022.01); G06V 30/14 (2022.01); G06V 30/148 (2022.01)

CPC G06V 30/19173 (2022.01) [G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/62 (2022.01); G06V 30/1448 (2022.01); G06V 30/15 (2022.01)]

20 Claims

1. A method, comprising:

receiving video content of a communication session comprising a plurality of participants;

extracting frames from the video content;

classifying the frames of the video content into categories including black frames that are devoid of content, slide frames that include presentation slides, and demo frames that include a demonstration;

identifying one or more distinguishing frames comprising text;

for each distinguishing frame comprising text:

detecting a title within the frame,

cropping a title area with the title within the frame, and

extracting, via optical character recognition (OCR), the title from the cropped title area of the frame;

extracting, via OCR, textual content from the distinguishing frames comprising text; and

transmitting, to one or more client devices, the extracted textual content and the extracted titles.