US 12,231,456 B2
System and method using a large language model (LLM) and/or regular expressions for feature extractions from unstructured or semi-structured data to generate ontological graph
Andrew Zawadowskiy, Hollis, NH (US); Oleg Bessonov, San Jose, CA (US); and Vincent Parla, North Hampton, NH (US)
Assigned to Cisco Technology, Inc., San Jose, CA (US)
Filed by Cisco Technology, Inc., San Jose, CA (US)
Filed on Jul. 28, 2023, as Appl. No. 18/361,405.
Claims priority of provisional application 63/493,552, filed on Mar. 31, 2023.
Prior Publication US 2024/0330365 A1, Oct. 3, 2024
Int. Cl. G06F 21/31 (2013.01); G06F 11/34 (2006.01); G06F 16/334 (2025.01); G06F 16/34 (2019.01); G06F 16/901 (2019.01); G06F 21/55 (2013.01); G06F 21/56 (2013.01); G06F 21/57 (2013.01); H04L 9/40 (2022.01)
CPC H04L 63/1433 (2013.01) [G06F 11/3476 (2013.01); G06F 16/334 (2019.01); G06F 16/345 (2019.01); G06F 16/9024 (2019.01); G06F 21/31 (2013.01); G06F 21/552 (2013.01); G06F 21/563 (2013.01); G06F 21/577 (2013.01); H04L 63/1425 (2013.01); H04L 63/145 (2013.01); H04L 63/1483 (2013.01); H04L 63/1491 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method of generating a graph from a data file comprising unstructured or semi-structured data, the method comprising:
applying a data file to a machine learning (ML) method and generating from the data file, entities and relations between said entities, wherein the entities and the relations are constrained by a predefined ontology or a predefined schema;
generating a graph using the entities and relations that are generated from the data file;
applying the data file to the ML method to generate regular expressions, wherein the regular expressions comprise patterns for how the entities and the relations are expressed in the data file, and the regular expressions are constrained by the predefined ontology or the predefined schema;
generating other entities and other relations by parsing another data file using the regular expressions generated from the data file, the parsed another data comprising the other entities and the other relations; and
calculating a score based on comparing statistics of the other entities and/or the other relations to baseline statistics for the entities and/or the relations;
comparing the score to a predefined threshold; and
determining that one or more criteria are met when the score exceeds the predefined threshold thereby indicating that the regular expressions are not effective for parsing the another data file; and
in response to the regular expressions generated from the data file being determined to not be effective for parsing the another data file, updating the regular expressions with additional entities and additional relations that are constraint entities categories of the predefined ontology or the predefined schema.