US 11,055,524 B2
Data extraction pipeline
Jacques Cali, London (GB); Peter Roelants, London (GB); Christos Sagonas, London (GB); and Romain Sabathe, London (GB)
Assigned to ONFIDO LTD, London (GB)
Filed by ONFIDO LTD, London (GB)
Filed on Jan. 11, 2019, as Appl. No. 16/245,480.
Claims priority of application No. 18151489 (EP), filed on Jan. 12, 2018.
Prior Publication US 2019/0220660 A1, Jul. 18, 2019
Int. Cl. G06K 9/00 (2006.01); G06F 16/35 (2019.01); G06N 3/08 (2006.01)
CPC G06K 9/00456 (2013.01) [G06F 16/35 (2019.01); G06K 9/00442 (2013.01); G06K 9/00463 (2013.01); G06N 3/08 (2013.01); G06K 2209/01 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for extracting information from an image of a document comprising:
acquiring an image comprising image data relating to at least a part of the document;
manipulating the image data to obtain a document image;
performing optical character recognition of the image data to produce recognized data;
attempting non-textual classification of the document image by machine recognition of non-textual characters to obtain classification data;
attempting textual classification of the document image by machine recognition of textual characters to obtain classification data; and
using the classification data and recognized data to extract information from the image;
wherein attempting textual classification of the document image by machine recognition of textual characters to obtain classification data comprises keyword classification of the document image to obtain classification data, wherein keyword classification comprises:
obtaining the recognized data;
retrieving a list of template data;
obtaining a weighting matrix using the list of template data;
classifying the document to a determined document type using the list of template data and the weighting matrix; and
if a threshold level of confidence in the classification is achieved, obtaining classification data by semantic labelling of the document image.