US 11,838,330 B2
Selective information extraction from network traffic traces both encrypted and non-encrypted
Ahmad Amro, Dhahran (SA); and Sultan Ahmad Almuhammadi, Dhahran (SA)
Assigned to King Fahd University of Petroleum and Minerals, Dhahran (SA)
Filed by King Fahd University of Petroleum and Minerals, Dhahran (SA)
Filed on May 16, 2018, as Appl. No. 15/981,431.
Claims priority of provisional application 62/608,400, filed on Dec. 20, 2017.
Prior Publication US 2019/0190960 A1, Jun. 20, 2019
Int. Cl. G06F 21/60 (2013.01); H04L 43/04 (2022.01); H04L 43/08 (2022.01); H04L 9/40 (2022.01); G06F 21/70 (2013.01); H04L 43/065 (2022.01)
CPC H04L 63/30 (2013.01) [H04L 43/04 (2013.01); H04L 43/08 (2013.01); H04L 63/0464 (2013.01); H04L 63/067 (2013.01); H04L 63/1408 (2013.01); H04L 63/166 (2013.01); H04L 63/302 (2013.01); H04L 43/065 (2013.01); H04L 63/0414 (2013.01); H04L 63/0428 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A system for information extraction from network traffic traces that are both encrypted and non-encrypted, the system comprising:
a client computer configured to set a session key log file environment variable, such that when the client computer launches a supported browser, a session key log file (KLF) is created, capture the computer network traffic traces by retrieving data from encrypted traffic, the encrypted traffic being part of a data exchange over the World Wide Web, and transfer the KLF and captured traffic traces to a remote server; and
a remote computer performing traffic mining to analyze the captured traffic traces and extract sensitive pieces of information,
wherein the performing traffic mining includes analyzing the captured traffic traces and identifying potential obfuscated login credentials using heuristics,
wherein the identifying potential obfuscated login credentials includes:
detecting a login request by filtering for Post methods, Content-type headers including URL-encoded forms and JSON, and the target URL of the login; and.
extracting parameters, including username and password parameters, by searching for parameter names that match keywords;
wherein when the parameter name does not match a keyword, the parameter name will be passed to a heuristic engine to determine how related the parameter is to the respective credential, which is a username or password, and decide whether the parameter is a username, password, or should be ignored, the heuristic engine using a set of low-level heuristics to compute and assign a pair of scores for the parameter indicating how the parameter is related to the respective credential.