US 11,756,097 B2
Methods and apparatus for automatically detecting data attacks using machine learning processes
Kannan Achan, Saratoga, CA (US); Durga Deepthi Singh Sharma, Bangalore (IN); Behzad Shahrasbi, Santa Clara, CA (US); Saurabh Agrawal, Bangalore (IN); Venugopal Mani, Sunnyvale, CA (US); Soumya Wadhwa, Sunnyvale, CA (US); Kamiya Motwani, Madhya Pradesh (IN); Evren Korpeoglu, San Jose, CA (US); and Sushant Kumar, Sunnyvale, CA (US)
Assigned to Walmart Apollo, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Jan. 5, 2021, as Appl. No. 17/141,794.
Prior Publication US 2022/0215453 A1, Jul. 7, 2022
Int. Cl. G06Q 30/00 (2023.01); G06Q 30/0601 (2023.01); G06F 21/62 (2013.01); G06F 18/214 (2023.01); G06N 3/044 (2023.01)
CPC G06Q 30/0631 (2013.01) [G06F 18/214 (2023.01); G06F 21/6218 (2013.01); G06N 3/044 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising: a computing device configured to:
determine a first portion of aggregated user session data for a plurality of users based on at least one rule, wherein the first portion includes user sessions for users that have at least a minimum level of interaction with a corresponding website;
generate a session dataset based on the determined first portion of the aggregated user session data;
train a machine learning model based on the session dataset;
receive user session data for a user from a server, wherein the user session data is aggregated with the session dataset;
apply the trained machine learning model to the user session data to generate output data, the output data including a first value associated with the user session data;
compare the first value to a predetermined threshold to generate a comparison;
determine, based on the comparison, whether the user session data includes polluted data;
generate item recommendation data identifying at least one item to advertise based on the determination of whether the user session data includes polluted data;
transmit the item recommendation data to the server;
refine, based on the determination of whether the user session data includes polluted data, the machine learning model, wherein refining the machine learning model includes removing the polluted data from the aggregated session dataset;
generate loss data based on the output data, the loss data being a difference between the output data and session data within the session dataset; and
refine the machine learning model based on the loss data.