CPC G06Q 30/0206 (2013.01) [G06Q 30/0201 (2013.01); G06Q 30/0223 (2013.01); G06Q 30/0641 (2013.01)] | 18 Claims |
1. A system comprising:
a non-transitory computer readable medium storing instructions; and
a processor communicatively coupled to the non-transitory computer readable medium, the processor configured to execute the instructions to:
receive a request for a price for an item;
obtain experimental sales data for the item from a database;
obtain historical inventory data for the item from the database;
determine a demand for the item by applying the experimental sales data and the inventory data as inputs to a first machine learning model, wherein the first machine learning model is trained with historical experimental transaction data identifying historical experimental sales of a first plurality of items;
determine a current inventory level of the item;
generate a price for the item by applying the demand for the item, the current inventory level of the item, and the experimental sales data as inputs to a second machine learning model,
wherein the second machine learning model includes a reinforcement learning model with a set of parameters including, for each product j and each time step t:
a state Sj representing a vector of feature values;
aj representing a price markdown percentage;
rj representing a reward value obtained based on revenue minus costs; and
Pj representing a transition probability that price markdown percentage aj in state Sj will lead to state Sj′,
wherein the second machine learning model is trained with historical transaction data identifying historical sales of a second plurality of items by recursively:
computing an expected reward value Q based on adjusted values of a set of parameters given a relationship between selected parameters; and
adjusting a value of one or more parameters in the set of parameters to increase the expected reward value Q for a subsequent iteration; and
transmit, in response to the request, the price for the item.
|