| CPC G06N 20/00 (2019.01) [G06N 7/01 (2023.01); G06N 10/00 (2019.01)] | 19 Claims |

|
1. A computing device comprising:
a processor configured to:
transmit, to a quantum coprocessor, instructions to encode a Markov decision process (MDP) model as a quantum oracle; and
train a reinforcement learning model at least in part by:
transmitting a plurality of superposition queries to the quantum oracle encoded at the quantum coprocessor, wherein:
a number of the superposition queries is proportional to an inverse of a target accuracy; and
the target accuracy is a predefined maximum distance between an optimal value estimate included in one or more measurement results and an optimal value approximated by the optimal value estimate;
performing one or more measurements at the quantum oracle as specified by the superposition queries;
receiving, from the quantum coprocessor, one or more measurement results of the one or more measurements in response to the plurality of superposition queries; and
updating a policy function of the reinforcement learning model based at least in part on the one or more measurement results.
|