| CPC G06F 16/24578 (2019.01) [G06F 16/9024 (2019.01); G06F 16/9038 (2019.01); G06F 16/93 (2019.01)] | 16 Claims |

|
1. A method comprising:
obtaining a hierarchical document structure of a raw document, the hierarchical document structure comprising a plurality of sections arranged in a hierarchy of successive document levels;
constructing a hierarchical document graph having a graph hierarchical structure corresponding to the hierarchical document structure of the raw document in a physical storage device, comprising:
extracting the document hierarchy structure from the raw document,
initializing the hierarchical document graph in the physical storage device by adding a root node in a root storage location, wherein the root node comprises a root node identifier as a document identifier of the raw document,
adding first level section nodes in first individual corresponding storage locations to the hierarchical document graph, corresponding to respective first level sections of a first level of the hierarchical document structure,
connecting the root node to the first level section nodes by adding respective first storage references corresponding to the first individual corresponding storage locations to the root node,
obtaining first content embeddings corresponding to content of the respective first level sections of the hierarchical document structure, and
adding the first content embeddings to the first level section nodes corresponding to the respective first level sections of the first level of the hierarchical document structure in the first individual corresponding storage locations;
retrieving, responsive to a user query, the hierarchical document graph corresponding to the raw document matching the user query;
performing a hierarchical search operation on the hierarchical document graph using a user query embedding of the user query, to obtain a set of relevant nodes of the hierarchical document graph, wherein performing the hierarchical search operation comprises:
traversing the physical storage locations of the hierarchical document graph, to select a first set of nodes, connected to a root node of the hierarchical document graph, and having first context embeddings satisfying a similarity score threshold to similarity with respect to the user query embedding, as a search set of nodes, and
processing a first node of the search set of nodes by performing operations comprising:
adding the first node to a set of candidate nodes, and
responsive to the first node being connected to one or more second nodes,
traversing the physical storage locations of the hierarchical document graph to select a second set of nodes, connected to the first node, and having second context embeddings satisfying a similarity score threshold of similarity with respect to the user query embedding, as the search set of nodes;
retrieving a set of relevant content embeddings for the set of relevant nodes;
generating, by a large language model (LLM), a response to the user query from the set of relevant content embeddings; and
presenting the response in a user interface (UI) of a user application.
|