US 11,853,745 B2
	Methods and systems for automated open source software reuse scoring
Ashok Balasubramanian, Chennai (IN); Karthikeyan Krishnaswamy Raja, Chennai (IN); Meenakshisundaram Chinnappan, Chennai (IN); and Lakshmipathy Ganesh Eswaran, Chennai (IN)
Assigned to Open Weaver Inc., Miami, FL (US)
Filed by Open Weaver Inc., Miami, FL (US)
Filed on Feb. 22, 2022, as Appl. No. 17/676,987.
Claims priority of provisional application 63/154,354, filed on Feb. 26, 2021.
Prior Publication US 2022/0276860 A1, Sep. 1, 2022
Int. Cl. G06F 8/70 (2018.01); G06F 8/36 (2018.01)

CPC G06F 8/70 (2013.01) [G06F 8/36 (2013.01)]

18 Claims

1. A system for automatically scoring open-source libraries on a state of reuse in a software project, the system comprising:

one or more processors and memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

extracting information about the software project from a source code repository;

determining whether the extracted information includes information regarding any forked projects;

calculating, upon determining that a forked project is included, a useful fork reuse score for the forked project based on source code attributes including a source code class;

creating a tree structure for the source code class;

identifying functions from the tree structure of the source code class;

identifying similar code sections from the two source code files;

calculating a code attributes reuse score based on the similar code sections;

calculating a dependent consumption reuse score which indicates how much a function is reused by a dependent class; and

calculating a unified reuse score based on the useful fork reuse score of the forked project and the dependent consumption reuse score for the analyzed project;

wherein calculating the useful fork reuse score comprises:

collecting data of commit history records associated with source code of forked open-source projects;

retrieving each commit history record with the date and timestamp, and the number of files affected by each commit in the commit history record;

determining a number of commits performed during a defined interval to generate a source code commit activity score, wherein whether the fork is active or not is dependent on the source code commit activity score;

selecting useful forks by verifying whether regular commits are happening to a forked repository and ignoring other forks based on one of: no activity and activity being less than a threshold limit;

validating the forked project for its usefulness based on commit history trends of the forked project and a parent project;

comparing respective source code commit history rates of the parent project and the forked project to generate a weighted score based on increased or decreased rate of the commits;

combining the respective source code commit history rates of the parent project and the forked project to generate a final score for the forked projects by further comparing their scores against a set threshold baseline score; and

determining, via the scores, the useful fork reuse score of the forked project.