CPC G06V 30/416 (2022.01) [G06T 3/60 (2013.01); G06T 11/60 (2013.01); G06V 30/414 (2022.01); G06T 2210/12 (2013.01)] | 15 Claims |
1. A computer implemented method including:
receiving, at a computer processing system including a processing unit, data for an area of text of a document, the area of text containing a plurality of glyphs, each glyph associated with position information defining a position of the glyph in the document; and
by the processing unit:
grouping the glyphs into a plurality of lines based on the position information, including reverting a rotation parameter of the glyphs in the data to a common rotation and then determining a bounding box for each glyph based on the position information and grouping the glyphs based on positions of the determined bounding boxes;
for each line determining one or more paragraph attributes, wherein a difference in the one or more paragraph attributes between different lines indicates a likelihood that the different lines are in different paragraphs;
responsive to the determination of the one or more paragraph attributes, grouping the plurality of lines into one or more paragraphs; and
generating an editable document containing text in paragraphs that corresponds to the one or more paragraphs.
|