US RE50,675 E1
Extracting information from tables embedded within documents
David Richard Milward, Cambridge (GB); Himanshu Agrawal, Boston, MA (US); James Robert Walton Cormack, London (GB); and Francisco Nuno Quintiliano Mendonca Carapeto Costa, Cambridge (GB)
Assigned to Linguamatics Ltd., Cambridge (GB)
Filed by LINGUAMATICS LTD., Cambridge (GB)
Filed on Jul. 7, 2022, as Appl. No. 17/859,132.
Application 17/859,132 is a reissue of application No. 15/594,762, filed on May 15, 2017, granted, now 10,706,218, issued on Jul. 7, 2020.
Claims priority of provisional application 62/337,216, filed on May 16, 2016.
Int. Cl. G06F 16/00 (2019.01); G06F 16/84 (2019.01); G06F 40/14 (2020.01); G06F 40/143 (2020.01); G06F 40/154 (2020.01); G06F 40/166 (2020.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01)
CPC G06F 16/86 (2019.01) [G06F 40/143 (2020.01); G06F 40/154 (2020.01); G06F 40/166 (2020.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A [ computing device implemented ] method of extracting information from heterogeneous tables in semi-structured text and unstructured text, the method comprising steps of:
identifying , by a computing device, target content from a table in an electronic document, wherein the target content is presented in a plurality of cells [ table cell context within a document] ;
classifying , by the computing device, [ each table cell as ] the plurality of cells into one or more of [ a ] header cells and a plurality of [ cell or ] data cells [ cell ] based on at least one of explicit coding of the plurality of cells, formatting of the plurality of cells, relationship between the one or more header cells and columns in the table, presence of horizontal lines in the table, type of the target content in the plurality of cells, presence of measurement units within brackets in the table, and presence of words referring to mathematical operations on values in a table [ its context or content] ;
annotating [ directly encoding] , automatically by the computing device, the plurality of data cells [ cell with annotations ] to indicate their positions [ the data cell's position ] in the [ a ] table and an association between each of the plurality of data cells [ cell ] and the one or more header cells [ cell ] to enable extraction of the target content from the table; and
indexing, by the computing device, the electronic document utilizing the association between the plurality of data cells [ cell ] and the one or more header cells for responding to search queries [ cell] .