CPC G06T 13/40 (2013.01) [G06T 7/20 (2013.01); G06T 7/74 (2017.01); G06V 10/7747 (2022.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 18 Claims |
1. An information processing method, comprising:
determining, by processing circuitry of an information processing apparatus, a policy network to be trained according to a reference animation episode that includes a reference posture sequence for a reference virtual character to complete a target task, the policy network being an actor network of a proximal policy optimization (PPO) algorithm, the training the policy network including performing iteratively:
inputting status information of a training virtual character at a current moment in a sample animation episode and a training task to the policy network to be trained to obtain posture data of the training virtual character at a following moment outputted from the policy network to be trained, the current moment corresponding to a current key frame of animation, the following moment corresponding to a following key frame of animation;
inputting the status information of the training virtual character at the current moment and the training task to a value assessment network of the PPO algorithm to obtain a status value at the current moment that is output from the value assessment network, the value assessment network being obtained by training according to the reference animation episode; and
adjusting parameters of the policy network to be trained according to the status value and the posture data of the training virtual character at the following moment until a training condition is satisfied to obtain the trained policy network.
|