US 11,915,181 B2
	Upper confidence bound algorithm for oilfield logic
John Pang, Melo Park, CA (US); Vishakh Hegde, Melo Park, CA (US); and Joseph Chalupsky, Melo Park, CA (US)
Assigned to Schlumberger Technology Corporation, Sugar Land, TX (US)
Filed by Schlumberger Technology Corporation, Sugar Land, TX (US)
Filed on Nov. 14, 2019, as Appl. No. 16/684,495.
Prior Publication US 2021/0150440 A1, May 20, 2021
Int. Cl. G06Q 10/0637 (2023.01); G06N 5/02 (2023.01); G06Q 10/0631 (2023.01); G06F 30/20 (2020.01)

CPC G06Q 10/06375 (2013.01) [G06F 30/20 (2020.01); G06N 5/02 (2013.01); G06Q 10/06313 (2013.01)]

16 Claims

1. A method implemented by one or more processors, the method comprising:

receiving a set of well placement sequences for placing wells in a geographical region, each well placement sequence in the set defining a sequence of multiple wells to be placed within the geographical region;

executing a computer-implemented simulation on each of the well placement sequences in the set to determine, for each of the well placement sequences, a reward based upon a calculated hydrocarbon recovery for the well placement sequence and a cost of the calculated hydrocarbon recovery, wherein the cost of the calculated hydrocarbon recovery comprises a cost to drill at a particular location of the geographic region;

iteratively selecting, by an agent, well placement sequences in the set upon which to execute, by a simulator, computer-implemented simulations from among the plurality of well placement sequences to generate a respective updated reward for each of the iteratively selected well placement sequences, wherein the selecting is based upon the rewards determined for each of the plurality of well placement sequences, wherein the simulator and the agent comprise an agent-simulator environment that models a reinforcement learning environment;

obtaining an action space corresponding to the geographical region, the action space being an n dimensional representation of the geographical region, and the action space including one or more areas of interest indicative of predicted hydrocarbon saturation, wherein n is a positive integer;

obtaining a plurality of actions, wherein a given action of the actions is to be performed, at a given time step of the computer-implemented simulation, in the action space for each of the well placement sequences in the set;

configuring the simulator to execute the computer-implemented simulation on each of the well placement sequences in the set based on the action space and the set of well placement sequences; and

for each of the well placement sequences in the set:

performing, by the configured simulator, each action in the action space to determine:

the reward for each of the actions, at the given time step, based upon the calculated hydrocarbon recovery for the iteratively selected well placement sequence, and

the cost of the calculated hydrocarbon recovery for each of the actions, at the given time step, for the well placement sequence; and

generating, based on the reward and the cost for each of the actions, a reward distribution.