US 12,481,831 B2
Information processing apparatus, information processing method, storage medium, and learning apparatus
Ryo Kosaka, Tokyo (JP)
Assigned to CANON KABUSHIKI KAISHA, Tokyo (JP)
Filed by CANON KABUSHIKI KAISHA, Tokyo (JP)
Filed on Sep. 8, 2022, as Appl. No. 17/940,102.
Claims priority of application No. 2021-148490 (JP), filed on Sep. 13, 2021.
Prior Publication US 2023/0083959 A1, Mar. 16, 2023
Int. Cl. G06F 40/30 (2020.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06N 20/00 (2019.01)] 24 Claims
OG exemplary drawing
 
1. An information processing apparatus that extracts a candidate character string to be a candidate of an item value corresponding to a predetermined item, from among a plurality of character strings included in a document image to be processed, the information processing apparatus comprising:
one or more hardware processors; and
one or more memories storing one or more programs configured to be executed by the one or more hardware processors, the one or more programs including instructions for:
obtaining information on a conceptual feature group of a preset extraction target among a predetermined plurality of conceptual feature groups;
obtaining an attribute information table in which each of the plurality of conceptual feature groups is associated in advance with attribute information indicating items of attribute groups;
obtaining one or more conceptual feature groups associated with the same attribute information as the attribute information associated with the conceptual feature group of the extraction target, from among the plurality of conceptual feature groups as a conceptual feature group set based on the attribute information table;
identifying the conceptual feature group to which each of the plurality of character strings belongs from among the plurality of conceptual feature groups based on a feature vector corresponding to each of the plurality of character strings, and extracting the character string whose identified conceptual feature group is the same as any of the conceptual feature groups in the conceptual feature group set, from among the plurality of character strings as the candidate character string; and
outputting the extracted candidate character string.