US 11,720,911 B2
Methods and apparatus for electronically determining item pricing
Prakhar Mehrotra, Sunnyvale, CA (US); and Yixian Chen, Sunnyvale, CA (US)
Assigned to Walmart Apollo, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Jan. 22, 2020, as Appl. No. 16/749,246.
Prior Publication US 2021/0224835 A1, Jul. 22, 2021
Int. Cl. G06Q 10/00 (2023.01); G06Q 30/0201 (2023.01); G06Q 30/0601 (2023.01); G06Q 30/0207 (2023.01)
CPC G06Q 30/0206 (2013.01) [G06Q 30/0201 (2013.01); G06Q 30/0223 (2013.01); G06Q 30/0641 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A system comprising:
a computing device comprising:
a user interface;
a non-transitory computer readable medium storing instructions; and
at least one processor communicatively coupled to the user interface and the non-transitory computer readable medium, the at least one processor configured to execute the instructions to:
receive, from a first processor associated with a sales channel, a request for a price for an item at the sales channel for a period of time, wherein the sales channel is one of: a first store of a plurality of stores of a retailer, or a web site of an online marketplace for the retailer;
obtain experimental sales data for the item from a database, wherein the experimental sales data identifies: previous sales transactions for the item resulting from experimentally pricing the item at one or more levels;
obtain inventory data for the item from the database, wherein the inventory data identifies an amount of the item in inventory at the sales channel at each of the one or more levels, and a previous period of time of when each of the one or more levels was in effect;
determine a demand for the item at the sales channel for the period of time based on applying the experimental sales data and the inventory data as inputs to a first machine learning model to output the demand for the item, wherein the first machine learning model is trained with historical experimental transaction data identifying historical experimental sales of a first plurality of items;
determine a current inventory level of the item at the sales channel;
determine a price for the item at the sales channel and a confidence value corresponding to the price, based on applying all of: (a) the demand for the item output by the first machine learning model, (b) the current inventory level of the item at the sales channel, (c) the experimental sales data, (d) the period of time received in the request, and (e) an inventory level of the item during each of the previous sales transactions resulting from experimentally pricing the item at the one or more levels, as inputs to a second machine learning model to output the price for the item and the confidence value corresponding to the price,
wherein the second machine learning model includes a reinforcement learning model with a set of parameters including, for each product j and each time step t:
a state Sj representing a vector of feature values,
aj representing a price markdown percentage,
rj representing a reward value obtained based on revenue minus costs, and
Pj representing a transition probability that price markdown percentage aj in state Sj will lead to state Sj′,
wherein the second machine learning model is trained with historical transaction data identifying historical sales of a second plurality of items based on:
computing an expected reward value Q based on adjusted values of the set of parameters given a policy π representing a relationship between Sj and aj,
wherein the set of parameters are initialized based on a random process and the historical transaction data,
adjusting values of the set of parameters to increase the expected reward value Q for next iteration,
recursively iterating the above steps of computing and adjusting over all products and all time steps, to derive an optimal policy π* that maximizes the expected reward value Q;
determine whether the confidence value is beyond a predetermined threshold;
transmit the price for the item to the first processor when the confidence value is beyond the predetermined threshold; and
transmit, when the confidence value is not beyond the predetermined threshold, the price for the item to a plurality of processors located at a plurality of test stores respectively, wherein the plurality of test stores is a subset of the plurality of stores of the retailer.