| CPC G06N 5/01 (2023.01) [G06F 16/31 (2019.01); G06F 16/3344 (2019.01)] | 20 Claims |

|
3. A method for generating responses to natural-language queries regarding items in unstructured documents, the method comprising:
receiving, at an application instance communicatively coupled to a subscriber computing system of a plurality of subscriber computing systems, a query and a document comprising unstructured data;
performing pre-processing operations on at least a portion of the document comprising the unstructured data, the pre-processing operations comprising generating an optimized model input comprising at least one parsed document section that includes alphanumeric data by:
using a computer vision machine learning model to:
detect a first image in the unstructured data, wherein the first image comprises the alphanumeric data;
generate a bounding box to encapsulate the first image; and
parse, from the first image, a second image comprising the alphanumeric data, the second image defined by the bounding box; and
extracting, by a trained machine learning model, the alphanumeric data from the second image;
identifying, in the unstructured data, a particular image comprising a globally applicable item related to the alphanumeric data;
extracting, from the particular image, the globally applicable item;
generating a searchable data structure comprising the alphanumeric data stored relationally to the globally applicable item; and
for each of the at least one parsed document section and using the searchable data structure,
generating a query response by performing, by a semantic similarity model, a semantic search; and
transmitting the query response to a target application operated or hosted at least in part by the subscriber computing system or a subscriber entity associated with the subscriber computing system.
|