CPC G06F 21/563 (2013.01) [G06F 8/74 (2013.01)] | 8 Claims |
1. A method for intelligent determination of similarity of big data mobile softwares based on descriptive entropy, comprising the following steps:
S1, acquiring a path for each of the mobile softwares to read the mobile softwares according to the paths;
S2, performing a preliminary reverse-engineering decompilation on each of the mobile softwares to acquire function characteristics for each of the mobile softwares;
S3, summarizing a descriptive entropy distribution for each of the mobile softwares through descriptive entropies in the function characteristics;
S4, integrating the descriptive entropies of the mobile softwares, comparing the descriptive entropy distributions of mobile software pairs based on the integrated descriptive entropy distributions, and calculating similarity scores of the mobile software pairs; and
S5, outputting the similarity scores of the mobile softwares to give a mobile software similarity result; wherein
in step S2, the preliminary reverse-engineering decompilation specifically comprises:
acquiring source codes for each of the mobile softwares using a decompilation tool, acquiring function compression codes for each of the mobile softwares through the source codes, and calculating a floating point number representing an amount of information of a function or class, that is, the descriptive entropy from each of the function compression codes by the following formula:
Hd(substri)=−Σi=0np(substri)log2 p(substri)
wherein, assuming that each of the function compression codes has n substrings, substn is the Ith substring of the function compression code, and p(substei) is the occurrence probability of the Ith substring; and
storing the function compression codes, descriptive entropies, and hash values for the mobile softwares in corresponding text files.
|