US 12,236,369 B2
	System and method for machine learning architecture with adaptive importance sampling with normalizing flows
Zhexin Lai, Toronto (CA); Amir H. Khoshaman, Toronto (CA); and Marcus A. Brubaker, Toronto (CA)
Assigned to ROYAL BANK OF CANADA, Toronto (CA)
Filed by ROYAL BANK OF CANADA, Toronto (CA)
Filed on Jan. 29, 2021, as Appl. No. 17/163,106.
Claims priority of provisional application 62/968,860, filed on Jan. 31, 2020.
Prior Publication US 2021/0241156 A1, Aug. 5, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 17/18 (2006.01); G06N 7/01 (2023.01)

CPC G06N 7/01 (2023.01) [G06F 17/18 (2013.01); G06N 20/00 (2019.01)]

20 Claims

1. A computer system for generating a Monte Carlo estimation value from a target distribution p(x) using a proposal distribution q(x) representative of a learned importance distribution to encourage computational efficiency of the generation of the Monte Carlo estimation value, the computer system comprising:

one or more processors configured to:

instantiate a proposal distribution model data object having a first set of parameters representative of adjustable features of the proposal distribution q(x);

perform an iterative exploitation training stage for training the proposal distribution model data object, the iterative exploitation training stage including:

iteratively obtaining one or more new sample observations from the proposal distribution q(x) using a normalizing flow model to represent the learned importance distribution, wherein the normalizing flow model is defined using a succession of bi-jective transformations conducted on a random variable with a defined distribution having a base distribution p_z(z) representative of a probability density function of the random variable z, the base distribution having a second set of parameters, including a degree of freedom parameter adapted for modelling tail-behavior of the target distribution p(x);

for each new sample observation of the one or more new sample observations, iteratively determining a corresponding weight value based at least upon a ratio of the target distribution p(x) and the proposal distribution q(x) corresponding to the respective new sample observation;

updating both the first set of parameters representative of adjustable features of the proposal distribution q(x) and the second set of parameters of the base distribution based on the one or more new sample observations such that the proposal distribution q(x) is updated for a next iteration of the exploitation training stage;

recording the one or more new sample observations from the proposal distribution q(x) along with the corresponding weight values; and

continuing to alternatively obtain new sample observations and update both the first set of parameters of the proposal distribution q(x) and the second set of parameters of the base distribution until a number of exploitation training steps are taken; and

generate the Monte Carlo estimation value based at least on a sum of the recorded weight values applied to each of the recorded sample observations; and

store the Monte Carlo estimation value as a data value in an output data structure.