US 12,079,831 B2
	Methods and apparatus for electronically determining item pricing
Prakhar Mehrotra, Sunnyvale, CA (US); and Yixian Chen, Sunnyvale, CA (US)
Assigned to Walmart Apollo, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Jun. 2, 2023, as Appl. No. 18/328,065.
Application 18/328,065 is a continuation of application No. 16/749,246, filed on Jan. 22, 2020, granted, now 11,720,911.
Prior Publication US 2023/0316317 A1, Oct. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06Q 10/00 (2023.01); G06Q 30/0201 (2023.01); G06Q 30/0207 (2023.01); G06Q 30/0601 (2023.01)

CPC G06Q 30/0206 (2013.01) [G06Q 30/0201 (2013.01); G06Q 30/0223 (2013.01); G06Q 30/0641 (2013.01)]

18 Claims

1. A system comprising:

a non-transitory computer readable medium storing instructions; and

a processor communicatively coupled to the non-transitory computer readable medium, the processor configured to execute the instructions to:

receive a request for a price for an item;

obtain experimental sales data for the item from a database;

obtain historical inventory data for the item from the database;

determine a demand for the item by applying the experimental sales data and the inventory data as inputs to a first machine learning model, wherein the first machine learning model is trained with historical experimental transaction data identifying historical experimental sales of a first plurality of items;

determine a current inventory level of the item;

generate a price for the item by applying the demand for the item, the current inventory level of the item, and the experimental sales data as inputs to a second machine learning model,

wherein the second machine learning model includes a reinforcement learning model with a set of parameters including, for each product j and each time step t:

a state S_jrepresenting a vector of feature values;

a_jrepresenting a price markdown percentage;

r_jrepresenting a reward value obtained based on revenue minus costs; and

P_jrepresenting a transition probability that price markdown percentage a_jin state S_jwill lead to state S_j′,

wherein the second machine learning model is trained with historical transaction data identifying historical sales of a second plurality of items by recursively:

computing an expected reward value Q based on adjusted values of a set of parameters given a relationship between selected parameters; and

adjusting a value of one or more parameters in the set of parameters to increase the expected reward value Q for a subsequent iteration; and

transmit, in response to the request, the price for the item.