US 12,474,907 B2
Method and system for matching source code and binary code
Sanket Achal Sinha, Pune (IN); Venkata Rajan Mindigal Alasingara Bhattachar, Bangalore (IN); Vaibhav Harihar Agasti, Pune (IN); Kumar Mansukhlal Vidhani, Pune (IN); and Sachin Premsukh Lodha, Pune (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Jan. 9, 2024, as Appl. No. 18/408,164.
Claims priority of application No. 202321012791 (IN), filed on Feb. 24, 2023.
Prior Publication US 2024/0289102 A1, Aug. 29, 2024
Int. Cl. G06F 8/41 (2018.01)
CPC G06F 8/433 (2013.01) 12 Claims
OG exemplary drawing
 
1. A processor implemented method, comprising:
receiving, via one or more hardware processors, a source code file and a binary file of an application as an input;
generating, via the one or more hardware processors, an intermediate representation of the source code file;
generating, via the one or more hardware processors, an intermediate representation of the binary file;
generating, via the one or more hardware processors, a matching score for each of a plurality of code fragments in the intermediate representation of the source code file for each of a plurality of binary fragments in the intermediate representation of the binary file, further comprising:
computing an equivalence for each code fragment—binary fragment pair from among a plurality of code fragment—binary fragment pairs in the intermediate representation of the source code file and the intermediate representation of the binary file, by comparing individual parts of a plurality of program statements in the intermediate representation of the source code file and the intermediate representation of the binary file;
linking a plurality of operations in the intermediate representation of the source code file and the intermediate representation of the binary file, based on a determined data dependency between the plurality of program statements in the intermediate representation of the source code file and the intermediate representation of the binary file; and
generating the matching score based on a) the computed equivalence for the plurality of code fragment—binary fragment pairs, and b) the linking between the plurality of operations;
refining, via the one or more hardware processors, the generated matching score, if more than one program segment being identified as having identical matching score; and
matching, via the one or more hardware processors, each of a plurality of code fragments with associated one or more of the plurality of binary fragments, based on the refined matching score.