CPC G06Q 30/0204 (2013.01) [G06F 16/2379 (2019.01); G06F 16/24578 (2019.01)] | 20 Claims |
1. A system comprising:
a computing device including a processor, an input system, a processing system and an output system, wherein:
the input system is configured to:
receive one or more parameters that correspond to each one of a plurality of sellers,
generate a plurality of time series metrics based on the one or more parameters,
receive scalar and vectorial data for each one of the plurality of sellers, wherein the scalar and vectorial data for each seller includes one or more of the following information of the respective seller: geolocation, payment account information, or categories of items sold;
the processing system is configured to:
perform a feature selection based on characteristics of the plurality of sellers,
transmit the feature selection as a feedback back to the input system;
the input system is further configured to:
receive the feedback of the feature selection from the processing system,
gather, according to the feature selection from the processing system, a set of relevant scalar and vectorial data among the scalar and vectorial data for each one of the plurality of sellers, and
gather, according to the feature selection from the processing system, a set of relevant time series metrics among the plurality of time series metrics for each one of the plurality of sellers based on the respective seller's one or more parameters over a time period, wherein the time period comprises a plurality of subperiods;
the processing system is further configured to:
calculate one or more aggregated metrics for each corresponding subperiod based on each seller's respective set of relevant time series metrics, wherein each aggregated metric comprises at least one numeric value for the corresponding subperiod,
cluster the plurality of sellers using a Gaussian Mixture Model to generate a plurality of seller persona clusters based at least in part on the calculated one or more aggregated metrics for each seller and each seller's respective set of relevant scalar and vectorial data, wherein each of the plurality of seller persona clusters includes a group of sellers among the plurality of sellers,
calculate an overall score for all of the generated plurality of seller persona clusters, wherein the overall score indicates how well the seller persona clusters are separated, and
transmit the plurality of seller persona clusters to the output system when the overall score is greater than or equal to a predetermined threshold value; the output system is configured to:
receive the plurality of seller persona clusters from the processing system,
generate a respective target label for each of the plurality of seller persona clusters, and
apply each respective target label to the group of sellers within each respective seller persona cluster to generate training data for a supervised machine learning model, wherein the generated target labels comprise: bad seller, high volume seller, seasonal seller, high customer dispute seller, and fraudulent seller; and
the processor is configured to:
train the supervised machine learning model based on the training data, and
apply the trained supervised machine learning model to classify and detect sellers on a retail platform.
|