CPC H04L 63/1433 (2013.01) [G06F 18/213 (2023.01); G06F 18/217 (2023.01); G06F 18/22 (2023.01); G06F 21/128 (2013.01); G06F 21/577 (2013.01); G06F 40/295 (2020.01); G06N 20/00 (2019.01)] | 11 Claims |
1. A method for improved cybersecurity named entity recognition comprising:
conducting a machine learning process, by a processor, including:
accessing a training dataset, by a processor, the training dataset including a list of predetermined keyword-category pairs, each of the list of predetermined keyword-category pairs matching a keyword related to a cyber security entity to a corresponding category of a list of categories associated with cybersecurity risks,
conducting a search of incident reports by the processor, to return a sentence list that includes one or more sentences containing a keyword of the training dataset,
applying, by the processor, a sentence categorization process to assign a category of the predetermined list of categories associated with cybersecurity to an ambiguous keyword from the one or more sentences based on semantic similarity and context of the ambiguous keyword within a given sentence, including:
assigning a similarity score to each keyword of a list of keywords from a sentence including the ambiguous keyword based on semantic similarity and context of the keyword within the sentence for an untested category,
assigning a similarity score to each keyword of the list of keywords based on the semantic similarity and context of the keyword within the sentence for the untested category,
comparing a total similarity score of the sentence for the untested category to a highest total similarity score of the sentence corresponding to a most similar category,
assigning the category having a higher total similarity score between the untested category and the most similar category as a new most similar category, and
pairing the ambiguous keyword to the new most similar category,
wherein the ambiguous keyword has multiple meanings such that prior to application of the sentence categorization process the ambiguous keyword corresponds to more than one category of the list of categories, and
supplementing the training dataset with the ambiguous keyword paired to the category assigned under the sentence categorization process; and
training a named entity recognition model using the training dataset, such that the named entity recognition model as trained is executable by the processor to categorize keyword types and resolve ambiguous keyword recognition.
|