| CPC G06Q 30/0631 (2013.01) | 20 Claims |

|
1. A computer-implemented method comprising:
receiving historical slate data comprising observed rewards from selecting slate actions for a plurality of digital slots of a digital slate utilizing a first slate recommendation policy;
generating, for a second slate recommendation policy, a plurality of importance weights from the historical slate data by summing slot-level density ratios between the first slate recommendation policy and the second slate recommendation policy for the slate actions; and
generating a predicted reward distribution for the second slate recommendation policy by applying the plurality of importance weights to the historical slate data for the first slate recommendation policy.
|