US 12,131,566 B2
Systems and methods for extracting text from portable document format data
Stephan Schwiebert, Sydney (AU); Velislava Yanchina, Byron Bay (AU); Henrry Eduardo Iguaro Jaramillo, Sydney (AU); and Andrew Bennett, Sydney (AU)
Assigned to CANVA PTY LTD, Surry Hills (AU)
Filed by Canva Pty Ltd, Surry Hills (AU)
Filed on Feb. 28, 2022, as Appl. No. 17/682,402.
Claims priority of application No. 2021201345 (AU), filed on Mar. 2, 2021.
Prior Publication US 2022/0284724 A1, Sep. 8, 2022
Int. Cl. G06V 30/416 (2022.01); G06T 3/60 (2024.01); G06T 11/60 (2006.01); G06V 30/414 (2022.01)
CPC G06V 30/416 (2022.01) [G06T 3/60 (2013.01); G06T 11/60 (2013.01); G06V 30/414 (2022.01); G06T 2210/12 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A computer implemented method including:
receiving, at a computer processing system including a processing unit, data for an area of text of a document, the area of text containing a plurality of glyphs, each glyph associated with position information defining a position of the glyph in the document; and
by the processing unit:
grouping the glyphs into a plurality of lines based on the position information, including reverting a rotation parameter of the glyphs in the data to a common rotation and then determining a bounding box for each glyph based on the position information and grouping the glyphs based on positions of the determined bounding boxes;
for each line determining one or more paragraph attributes, wherein a difference in the one or more paragraph attributes between different lines indicates a likelihood that the different lines are in different paragraphs;
responsive to the determination of the one or more paragraph attributes, grouping the plurality of lines into one or more paragraphs; and
generating an editable document containing text in paragraphs that corresponds to the one or more paragraphs.