US 11,055,525 B2
Determining experiments represented by images in documents
Anshuman Sahoo, Toronto (CA); Thomas Kai Him Leung, Toronto (CA); David Qixiang Chen, Toronto (CA); and Elvis Mboumien Wianda, Toronto (CA)
Assigned to Scinapsis Analytics Inc., Toronto (CA)
Filed by Scinapsis Analytics Inc., Toronto (CA)
Filed on Jun. 24, 2019, as Appl. No. 16/450,490.
Prior Publication US 2020/0401799 A1, Dec. 24, 2020
Int. Cl. G06K 9/62 (2006.01); G06K 9/00 (2006.01); G06N 20/00 (2019.01)
CPC G06K 9/00456 (2013.01) [G06K 9/00463 (2013.01); G06K 9/6256 (2013.01); G06N 20/00 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method, comprising:
acquiring, from a document, (i) one or more image texts of an image and (ii) a plurality of sub-legend texts, wherein the image is a visual representation of one or more experiments;
segmenting the image into one or more sub-images by:
recognizing a plurality of sub-image labels in the one or more image texts, wherein the plurality of sub-legend texts describe one or more experiments corresponding to the one or more sub-images, and
for each sub-legend text of the first plurality of sub-legend texts, attempting a match between the sub-legend text and a sub-image label of the plurality of sub-image labels by:
determining whether the number of sub-legend texts equals the number of sub-image labels, and
determining whether the sub-legend text comprises the sub-image label;
for each sub-image of the one or more sub-images, determining, by applying a machine learning model, that the sub-image is a visual representation of an experimental technique used in the one or more experiments; and
adding, to a knowledge base, one or more mappings of the one or more sub-images to the one or more experiments.