| CPC B29C 64/393 (2017.08) [B22F 10/30 (2021.01); B22F 10/85 (2021.01); B22F 12/90 (2021.01); B29C 64/209 (2017.08); B33Y 10/00 (2014.12); B33Y 50/02 (2014.12); G06F 18/2411 (2023.01); G06F 18/295 (2023.01); G06N 3/04 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 10/993 (2022.01); B22F 10/12 (2021.01); B22F 10/18 (2021.01); B22F 10/25 (2021.01); B22F 10/28 (2021.01)] | 20 Claims |

|
1. A method of training a reinforcement learning model for performing a corrective action in a manufacturing process executing in a manufacturing system, the method comprising:
receiving, by a computing system, an image of a specimen at a processing node in the manufacturing process;
detecting, by the computing system, an error in the specimen based on the image of the specimen;
determining, by the computing system using a reinforcement learning model, a change to a manufacturing parameter to correct the error based on a policy;
determining, by the computing system, state information of the specimen, the state information comprising a current action performed on the specimen and a previous action performed by the specimen at an upstream processing node in the manufacturing process;
generating, by the computing system, a quality metric for the specimen based on the state information;
generating, by the computing system, a reward corresponding to the state information;
comparing, by the computing system, an expected reward corresponding to the state information to the generated reward;
determining, by the computing system, that there is a deviation between the reward and the expected reward that exceeds a threshold amount; and
based on the determining, updating, by the computing system, the policy implemented by the reinforcement learning model.
|