US 12,260,342 B2
Multimodal table extraction and semantic search in a machine learning platform for structuring data in organizations
Chaithanya Manda, Jersey City, NJ (US); Anupam Kumar, Jersey City, NJ (US); Solmaz Torabi, Austin, TX (US); Raman Kumar, New Delhi (IN); Anish Goswami, Pune (IN); Sidhant Agarwal, Ranchi (IN); Md Sharique, Bandel (IN); Diksha Malhotra, Mohali (IN); Garimella Venkata BhanuTeja, Vijayawada (IN); Arvind Singh, Dehradun (IN); and Pavan Praneeth, Hyderabad (IN)
Assigned to ExlService Holdings, Inc., New York, NY (US)
Filed by ExlService Holdings, Inc., New York, NY (US)
Filed on Sep. 13, 2023, as Appl. No. 18/367,920.
Application 18/367,920 is a continuation in part of application No. 17/988,684, filed on Nov. 16, 2022, granted, now 11,842,286.
Claims priority of provisional application 63/280,062, filed on Nov. 16, 2021.
Prior Publication US 2024/0160953 A1, May 16, 2024
Int. Cl. G06F 16/33 (2019.01); G06F 16/31 (2019.01); G06F 16/334 (2025.01); G06N 5/01 (2023.01)
CPC G06N 5/01 (2023.01) [G06F 16/31 (2019.01); G06F 16/3344 (2019.01)] 20 Claims
OG exemplary drawing
 
3. A method for generating responses to natural-language queries regarding items in unstructured documents, the method comprising:
receiving, at an application instance communicatively coupled to a subscriber computing system of a plurality of subscriber computing systems, a query and a document comprising unstructured data;
performing pre-processing operations on at least a portion of the document comprising the unstructured data, the pre-processing operations comprising generating an optimized model input comprising at least one parsed document section that includes alphanumeric data by:
using a computer vision machine learning model to:
detect a first image in the unstructured data, wherein the first image comprises the alphanumeric data;
generate a bounding box to encapsulate the first image; and
parse, from the first image, a second image comprising the alphanumeric data, the second image defined by the bounding box; and
extracting, by a trained machine learning model, the alphanumeric data from the second image;
identifying, in the unstructured data, a particular image comprising a globally applicable item related to the alphanumeric data;
extracting, from the particular image, the globally applicable item;
generating a searchable data structure comprising the alphanumeric data stored relationally to the globally applicable item; and
for each of the at least one parsed document section and using the searchable data structure,
generating a query response by performing, by a semantic similarity model, a semantic search; and
transmitting the query response to a target application operated or hosted at least in part by the subscriber computing system or a subscriber entity associated with the subscriber computing system.