US 12,393,864 B2
Reinforcement learning with quantum oracle
Daochen Wang, College Park, MD (US); Aarthi Meenakshi Sundaram, Seattle, WA (US); Robin Ashok Kothari, Seattle, WA (US); Martin Henri Roetteler, Woodinville, WA (US); and Ashish Kapoor, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 27, 2021, as Appl. No. 17/160,309.
Prior Publication US 2022/0253743 A1, Aug. 11, 2022
Int. Cl. G06N 10/00 (2022.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) [G06N 7/01 (2023.01); G06N 10/00 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A computing device comprising:
a processor configured to:
transmit, to a quantum coprocessor, instructions to encode a Markov decision process (MDP) model as a quantum oracle; and
train a reinforcement learning model at least in part by:
transmitting a plurality of superposition queries to the quantum oracle encoded at the quantum coprocessor, wherein:
a number of the superposition queries is proportional to an inverse of a target accuracy; and
the target accuracy is a predefined maximum distance between an optimal value estimate included in one or more measurement results and an optimal value approximated by the optimal value estimate;
performing one or more measurements at the quantum oracle as specified by the superposition queries;
receiving, from the quantum coprocessor, one or more measurement results of the one or more measurements in response to the plurality of superposition queries; and
updating a policy function of the reinforcement learning model based at least in part on the one or more measurement results.