US 12,032,546 B2
Systems and methods for populating a structured database based on an image representation of a data table
Ashim Prasad, Bangalore (IN); Melwin Babu, Thrissur (IN); and Dibakar Saha, Jalpaiguri (IN)
Assigned to nference, Inc., Cambridge, MA (US)
Filed by nference, inc., Cambridge, MA (US)
Filed on Jul. 16, 2020, as Appl. No. 16/931,074.
Claims priority of provisional application 62/874,830, filed on Jul. 16, 2019.
Prior Publication US 2021/0019287 A1, Jan. 21, 2021
Int. Cl. G06F 16/22 (2019.01); G06F 16/21 (2019.01); G06F 16/28 (2019.01); G06F 18/214 (2023.01); G06N 3/08 (2023.01); G06V 10/22 (2022.01); G06V 10/82 (2022.01); G06V 30/19 (2022.01); G06V 30/412 (2022.01)
CPC G06F 16/221 (2019.01) [G06F 16/21 (2019.01); G06F 16/2282 (2019.01); G06F 16/287 (2019.01); G06F 18/214 (2023.01); G06N 3/08 (2013.01); G06V 10/235 (2022.01); G06V 10/82 (2022.01); G06V 30/19173 (2022.01); G06V 30/412 (2022.01)] 25 Claims
OG exemplary drawing
 
1. A method comprising:
accessing, by one or more computer processors, an image representation of a data table, the data table comprising one or more cells arranged in one or more rows and one or more columns, the one or more cells comprising a first cell that belongs to at least one first row and at least one first column, the first cell being populated with a first content object, the first content object comprising a progress bar, wherein a length of the progress bar is associated with sequence information;
providing, by the one or more computer processors, the image representation as an input to a neural network model that is trained to identify locations of content objects in image representations;
executing, by the one or more computer processors, the neural network model to identify a location of the first content object in the image representation;
identifying, by the one or more computer processors, a location of the first cell based on the location of the first content object;
determining, by the one or more computer processors, that the first cell belongs to the at least one first row and the first column based on one or more of the location of the first cell and the first content object in relation to a plurality of content objects associated with the one or more rows and the one or more columns;
associating, by the one or more computer processors, the first content object with one or more categorical identifiers;
determining, by the one or more computer processors, the length of the progress bar based on the image representation;
extracting, by the one or more computer processors, the sequence information from the progress bar based on the determined length of the progress bar; and
populating, by the one or more computer processors, a structured database with the sequence information and the one or more categorical identifiers based on determining that the first cell belongs to the at least one first row and the at least one first column, the structured database including at least one data table row associated with the at least one first row and at least one data table column associated with the at least one first column.