US 11,982,171 B2
	Active reinforcement learning for drilling optimization and automation
Yashas Malur Saidutta, Atlanta, GA (US); Raja Vikram R Pandya, Katy, TX (US); Srinath Madasu, Houston, TX (US); Shashi Dande, Spring, TX (US); and Keshava Rangarajan, Sugarland, TX (US)
Assigned to Landmark Graphics Corporation, Houston, TX (US)
Appl. No. 17/047,109
Filed by Landmark Graphics Corporation, Houston, TX (US)
PCT Filed Jun. 5, 2020, PCT No. PCT/US2020/036465 § 371(c)(1), (2) Date Oct. 13, 2020, PCT Pub. No. WO2021/040829, PCT Pub. Date Mar. 4, 2021.
Claims priority of provisional application 62/891,112, filed on Aug. 23, 2019.
Prior Publication US 2023/0116456 A1, Apr. 13, 2023
Int. Cl. E21B 44/00 (2006.01); G06N 3/092 (2023.01)

CPC E21B 44/00 (2013.01) [G06N 3/092 (2023.01); E21B 2200/20 (2020.05); E21B 2200/22 (2020.05)]

20 Claims

1. A method of automated drilling control and optimization, the method comprising:

acquiring training data including values of drilling parameters for a current stage of a drilling operation being performed within a subsurface formation;

training a reinforcement learning model to estimate values of the drilling parameters for a subsequent stage of the drilling operation to be performed, based on the acquired training data and a reward policy mapping inputs and outputs of the reinforcement learning model, wherein the reward policy maps the inputs and the outputs using an objective function that defines response values of one or more operating variables for the subsequent stage of the drilling operation based on current values of input parameters for the current stage of the drilling operation;

performing the subsequent stage of the drilling operation based on the values of the drilling parameters estimated using the trained reinforcement learning model;

calculating a difference between the estimated values and actual values of the drilling parameters, based on real-time data acquired as the subsequent stage of the drilling operation is performed within the subsurface formation and the reward policy associated with the reinforcement learning model;

retraining the reinforcement learning model using an on-policy reinforcement algorithm to refine the associated reward policy for at least one additional stage of the drilling operation to be performed, wherein the on-policy reinforcement algorithm refines the associated reward policy based on the actual values of the drilling parameters from performing the subsequent stage of the drilling operation based on the values of the drilling parameters estimated using the reward policy, and on the calculated difference; and

performing the at least one additional stage of the drilling operation using the retrained reinforcement learning model.