US 12,233,343 B2
Method for training AI bot in computer game
Chiu-Chou Lin, New Taipei (TW); I-Chen Wu, Hsinchu County (TW); Jung-Chang Kuo, Taipei (TW); Ying-Hau Wu, Taipei (TW); An-Lun Teng, Taipei (TW); and Pei-Wen Huang, Taipei (TW)
Assigned to Kabushiki Kaisha Ubitus, Tokyo (JP)
Filed by Kabushiki Kaisha Ubitus, Tokyo (JP)
Filed on Jan. 31, 2022, as Appl. No. 17/589,462.
Application 17/589,462 is a continuation in part of application No. 16/747,403, filed on Jan. 20, 2020, granted, now 11,253,783.
Claims priority of provisional application 62/796,174, filed on Jan. 24, 2019.
Prior Publication US 2022/0152512 A1, May 19, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. A63F 13/67 (2014.01)
CPC A63F 13/67 (2014.09) 6 Claims
OG exemplary drawing
 
1. A method for training AI bot in computer game, comprising:
in a first phase, collecting a plurality sets of demonstration data to train a discrete encoder; wherein each set of said demonstration data comprises a plurality of gaming observations of a complete game-play; each said gaming observation is a gaming image frame observed during the game-play;
in a second phase, collecting a few professional players' demonstration examples, and then using the discrete encoder trained in the first phase to convert the professional players' demonstration examples into a plurality of state sets and a plurality of state-action sets; wherein each demonstration example comprises a plurality of said gaming observations and a plurality of actions of a complete game-play played by a professional player; each said gaming observation is said gaming image frame observed during the game-play; each said action is a playing-control action performed by the professional player during the game-play at the timing of a corresponding said gaming observation; the state sets are obtained by using the trained discrete encoder to convert each one of said gaming observations into a corresponding state; the state-action sets are obtained by merging each said state of said state sets with a corresponding said action referring to the same said gaming observation;
in a third phase, starting an AI (artificial intelligence) training process to train at least one AI bot by using a reward function; said reward function being built based on the state sets and state-action sets obtained in the second phase, wherein the following steps are performed in the third phase:
executing a gaming application in a server; said executed gaming application generating a playing environment;
executing an AI training application; said AI training application comprising at least one actor module and an AI learner module; wherein, when said AI training application is executed, said actor module generates an AI client for connecting to the playing environment and sending an AI playing command to the playing environment generated by said gaming application, and said playing environment generates a gaming output according to the AI playing command and sends the gaming output to the actor module; wherein said gaming outputs comprise perceiving rendered observations which are transmitted as a video stream containing a sequence of image frames;
said AI learner module executing said AI training process which retrieves said image frames of said gaming output from the actor module and uses at least one of said image frames of said gaming output as an input of the AI training process and then generates a feedback command according to the input; said AI learner module sequentially analyzing the image frames of said gaming output in order to obtain parameters required for proceeding with the AI training process; said actor module accepting said feedback command from said AI learner module and generating said AI playing command according to the said feedback command; and then, said generated AI playing command being sent to the playing environment in order to generate a consequent said image frame of said gaming output to be input to the AI training process again as a training loop for training the AI client;
wherein said parameters are obtained by using the discrete encoder to sequentially analyze the image frames of said gaming output and are including an AI state and a corresponding AI action for each said image frame; the reward function of the AI training process sequentially compares said AI state and said AI action of each one of said image frames with the state sets and state-action sets obtained in the second phase respectively; if the AI state is in the state sets, a first reward value is obtained; if the AI action is in the state-action sets, a second reward value is obtained; the first reward value and the second reward value are summed up as a summed reward and then feedback to the AI training process; wherein the second reward value is greater than the first reward value.