US 12,443,396 B2
Identification of relevant code block within relevant software package for a query
Carl Emil Orm Wareus, Malmo (SE); Nils Valdemar Barr Zeilon, Kista (SE); and Per Filip Heden, Kista (SE)
Assigned to Micro Focus LLC, Santa Clara, CA (US)
Filed by MICRO FOCUS LLC, Santa Clara, CA (US)
Filed on Nov. 23, 2022, as Appl. No. 17/993,508.
Prior Publication US 2024/0168728 A1, May 23, 2024
Int. Cl. G06F 8/36 (2018.01); G06F 8/73 (2018.01); G06F 16/2455 (2019.01); G06F 16/28 (2019.01)
CPC G06F 8/36 (2013.01) [G06F 8/73 (2013.01); G06F 16/2455 (2019.01); G06F 16/285 (2019.01)] 15 Claims
OG exemplary drawing
 
1. A method comprising:
for each of a plurality of software packages:
generating, by a processor, code block embeddings respectively representing a plurality of code blocks of the software package, wherein:
the software package comprises source code in a computer programming language that is interpretable or compilable for execution by a computing device,
the plurality of code blocks each correspond to a function, class, object, or method of the source code that is individually reusable in development of a software project, and
the code block embeddings each comprise a vector of programming language syntax present in the source code of a corresponding one of the elesk the blocks represented by the code block embedding;
clustering, by the processor, the code block embeddings into functionality clusters;
generating, by the processor, functionality, wherein:
the functionality embeddings each represent a corresponding one of the functionality clusters,
the functionality embeddings each comprise a vector, and
each functionality embedding is generated by combining the vectors of the code block embeddings of the functionality cluster into the vector of the functionality embedding representing the corresponding one of the functionality clusters that is represented by the functionality embedding;
generating, by the processor, a software package embedding representing the software package, using either the functionality embeddings or both the functionality embeddings and the code block embeddings, wherein:
the software package embedding comprises a vector, and
the software package embedding is generated at least by combining the vectors of
the functionality embeddings into the vector of the software package embedding;
storing, by the processor, the software package embedding, the functionality embeddings, and the code block embeddings in a database;
generating, by the processor, a query embedding representing a query;
querying, by the processor, the database using the query embedding to identify a relevant code block within a relevant software package for the query; and
returning, by the processor, the relevant code block within the relevant software package that has been identified.