| CPC G06F 40/109 (2020.01) [G06F 40/30 (2020.01)] | 21 Claims |

|
1. A method of device-dependent display of an article from a PDF file that has multiple columns in at least parts of the article, the method including:
using a library to render the article from the PDF file, including rendering of a plurality of bounding boxes, positioned at on-page coordinates, that contain one or more images and multiple text blocks of glyphs, with font information for the glyphs;
setting a reading order of the article after the rendering, including pulling out text blocks spanning more than half of a width of a page and pulling out images, then reflowing the text blocks to produce the reading order;
merging the text blocks as they appear in the reading order into one or more paragraphs of text using the font information and using starting and ending positions of horizontally arranged text elements in the text blocks to delimit the paragraphs;
inferring semantic information about typographic roles of the paragraphs in the merged text blocks from at least the font information, including font name and font size distribution for sequences of the glyphs; and
causing display of the article in a device-dependent format, including the merged text blocks, using the semantic information and the reading order.
|