US 12,079,574 B1
Anomalous source detection in high-dimensional sequence data
Brendan Cruz Colon, Seattle, WA (US); Jason L. Thalken, Woodland Hills, CA (US); Aaron Boswell, Draper, UT (US); Matthew Michael Sommer, Issaquah, WA (US); and Kellen K. Axten, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 3, 2021, as Appl. No. 17/541,833.
Int. Cl. G06V 30/40 (2022.01); G06F 18/211 (2023.01); G06F 18/214 (2023.01); G06F 40/279 (2020.01); G06N 7/01 (2023.01)
CPC G06F 40/279 (2020.01) [G06F 18/211 (2023.01); G06F 18/214 (2023.01); G06N 7/01 (2023.01)] 20 Claims
OG exemplary drawing
 
4. A method comprising:
generating, for first text data, a first vector, wherein each element of the first vector comprises a value indicating whether the first text data includes a respective n-gram included in a corpus of text data;
determining, for the first text data, first label data indicating that a user associated with the first text data has connected to a first computer-implemented service more than a threshold number of times during a past time period;
training a first machine learning model based at least in part on the first vector and the first label data;
determining, using the first machine learning model, a first probability associated with a first n-gram of the first vector; and
determining at least a first user associated with the first n-gram.