US 12,437,007 B2
System and method for document retrieval with domain adaption and out of distribution identification
Ajay Kumar Kuruba, Anantapur (IN); Saurabh Jha, Austin, TX (US); Adya Jha, Bengaluru (IN); and Atul Kumar, Bangalore (IN)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jan. 5, 2024, as Appl. No. 18/405,215.
Prior Publication US 2025/0225187 A1, Jul. 10, 2025
Int. Cl. G06F 7/00 (2006.01); G06F 16/215 (2019.01); G06F 16/28 (2019.01); G06F 16/93 (2019.01)
CPC G06F 16/93 (2019.01) [G06F 16/215 (2019.01); G06F 16/285 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
topic modelling a preprocessed dataset to identify topics, each of the topics associated with a topic identifier, wherein the preprocessed dataset is specific to a domain;
creating a custom layer in a model, wherein the model includes a plurality of frozen base layers and the custom layer is positioned between at least two of the base layers;
associating each topic identifier with a set of topic-specific weights in a lookup table;
freezing weights associated with the base layers of the model such that the weights are not updated during training; and
training the model using the topic identifiers and the preprocessed dataset by inputting a title and a masked passage associated with a topic identifier into the model and updating only the weights of the custom layer, weights being selected from the lookup table based on the topic identifier;
wherein the domain adapted model is configured to generate outputs based on queries from users, the outputs including documents or passages of the documents associated with the domain, and wherein the domain adapted model is further configured to return an out-of-distribution response when a probability or similarity of an output is below a threshold.