US 12,033,415 B2
Systems and methods for generating document numerical representations
Jerome Gleyzes, Wellington (NZ); Mohamed Khodeir, Wellington (NZ); Salim Fakhouri, Wellington (NZ); Yu Wu, Wellington (NZ); and Soon-Ee Cheah, Wellington (NZ)
Assigned to XERO Limited, (NZ)
Filed by Xero Limited, Wellington (NZ)
Filed on Feb. 16, 2023, as Appl. No. 18/169,878.
Application 18/169,878 is a continuation of application No. 17/869,044, filed on Jul. 20, 2022, granted, now 11,694,463.
Application 17/869,044 is a continuation of application No. PCT/NZ2021/050133, filed on Aug. 19, 2021.
Claims priority of application No. 2021900419 (AU), filed on Feb. 18, 2021.
Prior Publication US 2023/0206676 A1, Jun. 29, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 30/416 (2022.01); G06V 30/14 (2022.01); G06V 30/18 (2022.01); G06V 30/19 (2022.01)
CPC G06V 30/416 (2022.01) [G06V 30/1448 (2022.01); G06V 30/18 (2022.01); G06V 30/19147 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
determining a candidate document comprising image data and character data;
extracting the image data and the character data from the candidate document;
providing, to an image-based numerical representation generation model, the image data;
generating, by the image-based numerical representation generation model, an image-based numerical representation of the image data;
providing, to a character-based numerical representation generation model, the character data;
generating, by the character-based numerical representation generation model, a character-based numerical representation of the character data;
providing, to a consolidated image-character based numerical representation generation model, the image-based numerical representation and the character-based numerical representation;
generating, by the consolidated image-character based numerical representation generation model, a combined image-character based numerical representation of the candidate document;
comparing the combined image-character based numerical representation of the candidate document with an index of combined image-character based numerical representations, each combined image-character based numerical representations of the index being indicative of a respective document having a first attribute value;
determining a combined image-character based numerical representation of the index that substantially corresponds with the combined image-character based numerical representation of the candidate document; and
associating the candidate document with the first attribute value of the determined combined image-character based numerical representation of the index;
wherein the image-based numerical representation generation model, the character-based numerical representation generation model and the consolidated image-character based numerical representation generation model are trained using an objective function configured to maximise a similarity metric between numerical representations of training documents with an identifier common set of attributes.