CPC G06Q 30/0631 (2013.01) [G06F 18/214 (2023.01); G06F 21/6218 (2013.01); G06N 3/044 (2023.01)] | 20 Claims |
1. A system comprising: a computing device configured to:
determine a first portion of aggregated user session data for a plurality of users based on at least one rule, wherein the first portion includes user sessions for users that have at least a minimum level of interaction with a corresponding website;
generate a session dataset based on the determined first portion of the aggregated user session data;
train a machine learning model based on the session dataset;
receive user session data for a user from a server, wherein the user session data is aggregated with the session dataset;
apply the trained machine learning model to the user session data to generate output data, the output data including a first value associated with the user session data;
compare the first value to a predetermined threshold to generate a comparison;
determine, based on the comparison, whether the user session data includes polluted data;
generate item recommendation data identifying at least one item to advertise based on the determination of whether the user session data includes polluted data;
transmit the item recommendation data to the server;
refine, based on the determination of whether the user session data includes polluted data, the machine learning model, wherein refining the machine learning model includes removing the polluted data from the aggregated session dataset;
generate loss data based on the output data, the loss data being a difference between the output data and session data within the session dataset; and
refine the machine learning model based on the loss data.
|