US 12,333,281 B2
	Method and system for automated discovery of artificial intelligence (AI)/ machine learning (ML) assets in an enterprise
Baskar Jayaraman, Fremont, CA (US); and Debashish Chatterjee, Fremont, CA (US)
Assigned to KONFER, INC., Fremont, CA (US)
Filed by KONFER, INC., Fremont, CA (US)
Filed on Apr. 5, 2023, as Appl. No. 18/296,171.
Claims priority of provisional application 63/328,206, filed on Apr. 6, 2022.
Prior Publication US 2023/0385037 A1, Nov. 30, 2023
Int. Cl. G06F 8/36 (2018.01); G06F 8/41 (2018.01)

CPC G06F 8/36 (2013.01) [G06F 8/433 (2013.01)]

14 Claims

1. A computer-implemented method for automatic discovery of artificial intelligence/machine learning (AI/ML) models, and parameters, data input and output specifications, and data transforms thereof in a production code repository using AI/ML, the method comprising:

a. automatically analyzing a plurality of source codes from a plurality of sources in conjunction with a production code repository, to identify a method of working on the plurality of source codes using the AI/ML, and wherein the plurality of sources comprise open source AI/ML libraries with Application Programming Interface (API) documentation and tagged/pre-classified code for the AI/ML models, including the data transforms and the data input and output specifications, and wherein automatically analyzing the plurality of source codes further includes:

recursively crawling through the plurality of source codes embedded in the production code repository;

selecting a source code file from the plurality of source codes based on an extension of the source code file; and

building a knowledge graph in a knowledge graph database for selected source code file, based on significant characteristics of the selected source code file, and wherein the significant characteristics include imported libraries, classes, methods, functions, and variables referenced, set, and used in the selected source code file from the imported libraries, and wherein the significant characteristics further include the selected source code file and line numbers of the selected source code file;

b. performing a semantic match for the plurality of source codes embedded in the plurality of sources in conjunction with the production code repository, and wherein the semantic match provides location and identification of the plurality of sources, and the parameters thereof in the production code repository, and wherein the semantic match further provides location, identification, classification and definition of the data transforms and the data input and output specifications of the tagged/pre-classified code in the production code repository; and

c. generating a graphical view of data and a source code flow, and wherein the graphical view of the data tracks a flow of data through the plurality of source codes and depicts that the data is read into a variable, and the data read into the variable is fed into a function that fits a machine learning (ML) model, and wherein the graphical view of the source code flow is a graph depicting function calls, and wherein the function calls read files and train the ML model.