US 12,254,116 B2
System and method for detecting and obfuscating confidential information in task logs
Pratap Dande, Saint Johns, FL (US); Akhila Mylaram, Frisco, TX (US); Gilberto R. Dos Santos, Jacksonville, FL (US); and JayaBalaji Murugan, Tamil Nadu (IN)
Assigned to Bank of America Corporation, Charlotte, NC (US)
Filed by Bank of America Corporation, Charlotte, NC (US)
Filed on Mar. 28, 2024, as Appl. No. 18/620,470.
Application 18/620,470 is a continuation of application No. 17/744,930, filed on May 16, 2022, granted, now 12,013,970.
Prior Publication US 2024/0265143 A1, Aug. 8, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/62 (2013.01); G06F 16/31 (2019.01)
CPC G06F 21/6254 (2013.01) [G06F 16/31 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
a memory configured to store:
a plurality of task logs comprising text that is confidential information; and
a training dataset comprising a set of keywords associated with the confidential information;
a processor operably coupled with the memory, and configured to:
access the plurality of task logs;
for a first task log from among the plurality of task logs:
select a first portion of the first task log;
for a first word in the first portion:
compare the first word with each of the set of keywords, wherein comparing the first word with each of the set of keywords comprises:
 extracting a first set of features from the first word, wherein the first set of features indicates a first identity of the first word, wherein the first set of features is represented by a first vector of numerical values;
 extracting a second set of features from a second word of the set of keywords, wherein the second set of features indicates a second identity of the second word, wherein the second set of features is represented by a second vector of numerical values; and
 comparing the first vector with the second vector;
determine that the first word is among the set of keywords, wherein determining that the first word is among the set of keywords comprises:
 determining a percentage of numerical values in the first vector that correspond to counterpart numerical values in the second vector;
 comparing the determined percentage of numerical values in the first vector that correspond to the counterpart numerical values in the second vector to a threshold percentage; and
 determining that the determined percentage of numerical values exceeds the threshold percentage;
determine a hierarchical relationship between the first word and neighboring words in the first portion, wherein the hierarchical relationship between the first word and the neighboring words indicates whether or not the first word is associated with each of the neighboring words;
determine that the hierarchical relationship between the first word and the neighboring words indicates that the first word is associated with at least a third word in the first portion;
generate a template pattern comprising the first word and the third word, wherein the template pattern indicates that the first word and the third word are among the confidential information; and
obfuscate the first word and the third word.