US RE50,675 E1
	Extracting information from tables embedded within documents
David Richard Milward, Cambridge (GB); Himanshu Agrawal, Boston, MA (US); James Robert Walton Cormack, London (GB); and Francisco Nuno Quintiliano Mendonca Carapeto Costa, Cambridge (GB)
Assigned to Linguamatics Ltd., Cambridge (GB)
Filed by LINGUAMATICS LTD., Cambridge (GB)
Filed on Jul. 7, 2022, as Appl. No. 17/859,132.
Application 17/859,132 is a reissue of application No. 15/594,762, filed on May 15, 2017, granted, now 10,706,218, issued on Jul. 7, 2020.
Claims priority of provisional application 62/337,216, filed on May 16, 2016.
Int. Cl. G06F 16/00 (2019.01); G06F 16/84 (2019.01); G06F 40/14 (2020.01); G06F 40/143 (2020.01); G06F 40/154 (2020.01); G06F 40/166 (2020.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01)

CPC G06F 16/86 (2019.01) [G06F 40/143 (2020.01); G06F 40/154 (2020.01); G06F 40/166 (2020.01); G06F 40/177 (2020.01); G06F 40/18 (2020.01)]

20 Claims

1. A [ computing device implemented ] method ~~of extracting information from heterogeneous tables in semi-structured text and unstructured text~~, the method comprising steps of:

identifying ~~, by a computing device, target content from a table in an electronic document, wherein the target content is presented in a plurality of cells~~ [ table cell context within a document] ;

classifying ~~, by the computing device,~~ [ each table cell as ] ~~the plurality of cells into one or more of~~ [ a ] header ~~cells and a plurality of~~ [ cell or ] data ~~cells~~ [ cell ] based on at least one of explicit coding of the plurality of cells, formatting of the plurality of cells, relationship between the one or more header cells and columns in the table, presence of horizontal lines in the table, type of the target content in the plurality of cells, presence of measurement units within brackets in the table, and presence of words referring to mathematical operations on values in a table [ its context or content] ;

~~annotating~~ [ directly encoding] , automatically by the computing device, the ~~plurality of~~ data ~~cells~~ [ cell with annotations ] to indicate ~~their positions~~ [ the data cell's position ] in ~~the~~ [ a ] table and an association between ~~each of~~ the ~~plurality of~~ data ~~cells~~ [ cell ] and the ~~one or more~~ header ~~cells~~ [ cell ] to enable extraction of ~~the~~ target content from the table; and

indexing, by the computing device, the ~~electronic~~ document utilizing the association between the ~~plurality of~~ data ~~cells~~ [ cell ] and the ~~one or more~~ header ~~cells for responding to search queries~~ [ cell] .