US 12,437,155 B1
Information extraction system for unstructured documents using independent tabular and textual retrieval augmentation
Lei Zhang, New York, NY (US); and Christopher Cirelli, Roswell, GA (US)
Assigned to American International Group, Inc., New York, NY (US)
Filed by American International Group, Inc., New York, NY (US)
Filed on Jan. 24, 2025, as Appl. No. 18/831,434.
Int. Cl. G06F 40/289 (2020.01); G06F 40/143 (2020.01)
CPC G06F 40/289 (2020.01) [G06F 40/143 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method for extracting particular information from a document, the method comprising:
receiving, by one or more processors, a response payload that includes document text of the document and one or more tables of the document represented using markdown language, wherein the response payload is generated from an optical character recognition tool;
separating, by the one or more processors using the markdown language, the response payload into a first portion having the one or more tables and a second portion having the document text;
forming, by the one or more processors using a chunking methodology, one or more table chunks from the first portion of the response payload and one or more text chunks from the second portion of the response payload;
identifying, by the one or more processors, a relevant table chunk of the one or more table chunks or a relevant text chunk of the one or more text chunks based on a search criterion related to a prompt for a large language model; and
storing a response from the large language model to the prompt and the relevant table chunk or the relevant text chunk.