US 12,271,489 B2
Use of word embeddings to locate sensitive text in computer programming scripts
Vincent Pham, Champaign, IL (US); Kenneth Taylor, Champaign, IL (US); Jeremy Edward Goodsitt, Champaign, IL (US); Fardin Abdi Taghi Abad, Champaign, IL (US); Austin Grant Walters, Savoy, IL (US); Reza Farivar, Champaign, IL (US); Anh Truong, Champaign, IL (US); and Mark Louis Watson, Sedona, AZ (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Dec. 20, 2023, as Appl. No. 18/391,440.
Application 18/391,440 is a continuation of application No. 16/992,371, filed on Aug. 13, 2020, granted, now 11,853,431.
Application 16/992,371 is a continuation of application No. 16/722,867, filed on Dec. 20, 2019, granted, now 10,783,257, issued on Sep. 22, 2020.
Prior Publication US 2024/0202344 A1, Jun. 20, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/57 (2013.01); G06F 21/31 (2013.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01)
CPC G06F 21/577 (2013.01) [G06F 21/31 (2013.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06F 2221/033 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
memory to store at least a portion of a neural network; and
one or more processors coupled with the memory to perform operations to:
train a neural network on a corpus of content, wherein the training results in word embeddings for words in the corpus, wherein each of the word embeddings is a numeric vector in a vector or matrix space;
identify an initial word of interest;
locate a vector that encodes the initial word of interest in the vector or matrix space;
identify vectors in the vector or matrix space that lie in a specified proximity to the vector for the initial word of interest and identifying words encoded by the identified vectors as additional words of interest, wherein the identifying comprises identifying ones of the vectors in the vector or matrix space that have distances or cosine values within a specified range as being in the specified proximity;
perform a security scan of a set of input to identify instances of the initial word of interest and instances of the additional words of interest in the input; and
generate output that specifies the identified instances of the initial word of interest in the set of input and that specifies that the instances of the additional words of interest in the set of input may be of interest.