US 11,869,260 B1
Extracting structured data from an image
Bingyan Liu, Mountain View, CA (US); Pengxiang Hu, Mountain View, CA (US); Maxwell C. Goldberg, San Francisco, CA (US); and Taylor Harwin, San Anselmo, CA (US)
Assigned to Kargo Technologies Corporation, San Francisco, CA (US)
Filed by Kargo Technologies Corporation, San Francisco, CA (US)
Filed on Dec. 30, 2022, as Appl. No. 18/092,164.
Claims priority of provisional application 63/413,925, filed on Oct. 6, 2022.
Int. Cl. G06V 30/18 (2022.01); G06V 30/19 (2022.01); G06V 30/12 (2022.01); G06V 20/62 (2022.01); G06V 30/412 (2022.01); G06V 30/10 (2022.01); G06K 7/14 (2006.01)
CPC G06V 30/18019 (2022.01) [G06V 20/63 (2022.01); G06V 30/10 (2022.01); G06V 30/12 (2022.01); G06V 30/19013 (2022.01); G06V 30/1916 (2022.01); G06V 30/19093 (2022.01); G06V 30/412 (2022.01); G06K 7/1413 (2013.01)] 27 Claims
OG exemplary drawing
 
1. A computer-implemented method for extracting structured data from an image, comprising:
receiving a captured image depicting a readable item;
at a hardware processing device, extracting a plurality of anchor points from the captured image;
at the hardware processing device, generating an anchor point arrangement specifying locations of anchor points within the captured image;
at the hardware processing device, generating a plurality of templates, wherein each template specifies a plurality of anchor points and field read locations;
at the hardware processing device, automatically selecting one of the plurality of templates, based on a relative degree of similarity between the templates and the anchor point arrangement;
at the hardware processing device, automatically generating a transform for mapping points in the selected template to corresponding points in the captured image; and
at the hardware processing device, applying the generated transform to extract structured data from the captured image;
wherein generating the plurality of templates comprises, for each template:
displaying an image on a display screen, the displayed image comprising a plurality of fields at locations within the image;
receiving first input from a human user identifying read locations for the fields in the displayed image;
receiving second input from the human user specifying information about the identified read locations; and
at a storage device, storing the identified read locations and the information as the template.