| CPC G06F 16/345 (2019.01) [G06F 40/295 (2020.01)] | 20 Claims |

|
1. A system comprising:
a non-transitory memory; and
one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising:
receiving a corpus of documents having text data from a plurality of sources that enable a generation of the documents in the corpus;
determining a domain having computing services that generate the documents in the corpus, wherein the computing services utilize the plurality of sources for the generation of the documents;
determining a named entity recognition (NER) selection module of a plurality of NER modules based on the domain, wherein the NER selection module enables identifications of different named entities found in the corpus based on a keyword analysis task associated with the domain and a category for the domain;
executing an unsupervised machine learning (ML) framework comprising the NER selection module and at least one unsupervised ML model configured for keyword analysis of the documents;
performing a keyword extraction of the documents in the corpus for a plurality of keywords based on the executing of the unsupervised ML framework, wherein the plurality of keywords include at least one of the different named entities;
weighting, based on the executing of the unsupervised ML framework, the plurality of keywords from the keyword extraction based on an occurrence of each of the plurality of keywords in the documents;
determining a keyword weighting setting associated with the domain in which the corpus of documents was generated for a service provider, wherein the keyword weighting setting indicates an elevated importance of one of the plurality of keywords for the domain;
reweighting at least one of the plurality of keywords based on the keyword weighting setting and a frequency that the one of the plurality of keywords occurs in the corpus of documents; and
outputting, based on the weighting and the reweighting, a searchable summarization of the corpus comprising the reweighted plurality of keywords, wherein the searchable summarization provides a search function of the plurality of keywords in the documents and correlations between each of the documents that are based on the occurrence of each of the plurality of keywords in each of the documents.
|