US 12,380,717 B2
Selecting files for intensive text extraction
Tohru Hasegawa, Tokyo (JP); Takuya Goto, Kodaira (JP); and Shunsuke Ishikawa, Shinjuku-ku (JP)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Mar. 14, 2022, as Appl. No. 17/694,003.
Prior Publication US 2023/0290168 A1, Sep. 14, 2023
Int. Cl. G06V 30/19 (2022.01); G06F 16/3331 (2025.01); G06V 30/418 (2022.01)
CPC G06V 30/19093 (2022.01) [G06F 16/3331 (2019.01); G06V 30/418 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
identifying a feature of a subject file;
comparing the feature to a historical feature of a historical file;
calculating, based on the comparing, a similarity between the subject file and the historical file;
identifying a historical success metric for the historical file, wherein the historical success metric is an average historical success metric, and wherein the identifying comprises:
identifying a first historical success metric for the historical file;
identifying a second historical success metric for a second historical file; and
calculating an average of the first historical success metric and the second historical success metric, resulting in the average historical success metric;
calculating, based on the similarity and the historical success metric, an intensive text extraction success value for the subject file; and
determining, based on the intensive extraction success value, that more than one intensive text extraction method should be performed on the subject file.