US 12,293,378 B2
	Method and system for simulation and calibration of markets
Nelson Vadori, New York, NY (US); Sumitra Ganesh, Short Hills, NJ (US); Mengda Xu, Jersey City, NJ (US); Prashant P Reddy, Madison, NJ (US); and Maria Manuela Veloso, New York, NY (US)
Assigned to JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed by JPMorgan Chase Bank, N.A., New York, NY (US)
Filed on Oct. 21, 2021, as Appl. No. 17/451,720.
Claims priority of provisional application 63/104,161, filed on Oct. 22, 2020.
Prior Publication US 2022/0129925 A1, Apr. 28, 2022
Int. Cl. G06Q 30/02 (2023.01); G06N 20/00 (2019.01); G06Q 10/067 (2023.01); G06Q 30/0202 (2023.01)

CPC G06Q 30/0202 (2013.01) [G06N 20/00 (2019.01); G06Q 10/067 (2013.01)]

15 Claims

1. A method for performing a simulation, the method being implemented by at least one processor in a market simulation and calibration device, the method comprising:

assigning, by the at least one processor to each respective computer agent from among a plurality of computer agents, a type value that relates to a state of the respective computer agent, such that the plurality of computer agents have differing type values, each of the differing type values indicating a different probability distribution of risk aversion and connectivity to external clients;

receiving, by the at least one processor and from a plurality of servers over a network, computer agent-specific data for the plurality of computer agents, the plurality of computer agents including computer agents of different types that behave differently, wherein the computer agent-specific data include real-world data that include a market-based observation, a market-based action and a market-based reward;

acquiring, by the at least one processor and from a network database over the network, simulator parameters;

generating, by a simulation processor of the market simulation and calibration device and providing on a display of the market simulation and calibration device, a simulation based on the assigned type values, the acquired simulator parameters and the received computer agent-specific data for each of the plurality of computer agents being in a different state and by using a shared policy that is shared by all of the plurality of computer agents, wherein the shared policy indicates a probability of a respective individual computer agent action for a corresponding state of the respective individual computer agent, wherein each of the plurality of computer agents use the same shared policy, and wherein each of the plurality of computer agents is restricted to observe only its own state and action to achieve partial observability;

acquiring actual real-world data of agent-specific data corresponding to a target locality;

performing first reinforcement learning calibration on the simulation processor of the market simulation and calibration device for performing the simulation,

wherein the first reinforcement learning calibration is performed using the actual real-world data to first constrain a shared equilibria to match a specific real-world target value,

wherein the first reinforcement learning calibration using the actual real-world data specifies a distribution of different computer agent types to correspond to the first constrained shared equilibria,

wherein the distribution of the different computer agent types is reflected on the market simulation and calibration device,

wherein the specific real-world target value for each of the plurality of computer agents is different to collectively satisfy certain constraints,

wherein the first reinforcement learning calibration includes modifying at least one of the type values assigned to the plurality of computer agents based on a result of the simulation until a calibration target is reached, and

wherein the first reinforcement learning calibration is performed by:

inputting learning rates (β_m^cal), (β_m^shared)satisfying a target condition, initial calibrator and shared policies π₀^Λ, π₀, initial supertype profile Λ₀^b=Λ₀across episodes b∈[1,B],

while π_m^Λ, π_mnot converged do, wherein π_m^Λ, π_mis a calibrator and shared policies for stage m,

for each episode b∈[1, B] do,

sample supertype increment and set δΛ^b˜π_m^Λ(·|Λ_m-1^b) and set Λ_m^b:=Λ_m-1^b+δΛ^b,

sample multi-agent episode with supertype profile Λ_m^band shared policy π_m, with λ_i˜p_Λ_{_m-1}_^b, α_l⁽ⁱ⁾˜π_m(·|·, λ_i), i∈[1, n],

update π_mwith learning rate β_m^sharedbased on gradient of a first equation with the episodes b∈[1,B],

update π_m^Λ with learning rate β_m^calbased on gradient to a second equation with episodes b∈[1,B],

the target condition specifies that the learning rates (β_m^cal), (β_m^shared) satisfy

as well as Robbins-Monro conditions, that is their respective sum is infinite, and sum of their squares is finite,

the first equation specifies:

wherein the first equation indicates that the shared policy is a Nash equilibrium of the 2-player symmetric game with payoff V, wherein a first player receives V (π₁, π₂) while the other receives V (π₂, π₁), and wherein _θ1V (π_θ, π_θ) corresponds to trying to improve utility of the first player while keeping the second player fixed, starting from the symmetric point (π_θ, π_θ), and

the second equation specifies:

wherein the second equation optimizes an objective of the stage m via a

calibrator's policy π^Λ;

updating the shared policy for the generating of the simulation by the simulation processor of the market simulation and calibration device based on a result of the first reinforcement learning calibration including the modifying of the at least one of the type values assigned to the plurality of computer agents, wherein the updated shared policy modifies the probability of the respective individual computer agent action for the corresponding state of the respective individual computer agent based on the distribution of the different computer agent types corresponding to the first constrained shared equilibria for at least one of the plurality of computer agents;

regenerating, by the simulation processor of the market simulation and calibration device, the simulation using the updated shared policy for obtaining a different output; and

performing second reinforcement learning calibration on the simulation processor of the market simulation and calibration device based on the first reinforcement learning calibration to modify the distribution of different computer agent types differently from the distribution of different computer agent types corresponding to the first reinforcement learning calibration, and to second constrain the share equilibria to more accurately constrain the shared equilibria than the first constrain of the shared equilibria in order to more closely match the specific real-world target value corresponding to the target locality, wherein the modified distribution of the different computer agent types are updated on the market simulation and calibration device.