US 11,934,458 B2
Binary code similarity detection system
Yuede Ji, Washington, DC (US); and Hao Howie Huang, McLean, VA (US)
Assigned to The George Washington University, Washington, DC (US)
Filed by The George Washington University, Washington, DC (US)
Filed on May 21, 2021, as Appl. No. 17/327,351.
Claims priority of provisional application 63/028,700, filed on May 22, 2020.
Prior Publication US 2022/0244953 A1, Aug. 4, 2022
Int. Cl. G06F 16/901 (2019.01); G06F 8/41 (2018.01); G06F 8/75 (2018.01); G06F 21/56 (2013.01); G06N 3/084 (2023.01)
CPC G06F 16/9024 (2019.01) [G06F 8/41 (2013.01); G06F 8/751 (2013.01); G06F 21/563 (2013.01); G06N 3/084 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method for comparing a source code and a target binary code, wherein the source code cannot be directly compared to the target binary code, the method comprising:
identifying a target compiling configuration of the target binary code, the target compiling configuration indicating the compilation configuration for the target binary code, wherein identifying the target compiling configuration comprises:
generating an attributed function call graph of the target binary code and each of the binary codes in a training dataset, where the attributed function call graph represents a calling relationship between functions in the target binary code and the binary codes in the training dataset;
training a graph attention network on the attributed function call graph of the binary codes in the training dataset; and
identifying the target compiling configuration, by the graph attention network, based on the attributed function call graph of the target binary code;
generating a comparing binary for the source code, the comparing binary compiled from the source code using the target compiling configuration; and
comparing the target binary code to the generated comparing binary to determine a similarity between the source code and the target binary code.