US 11,928,765 B2
	Animation implementation method and apparatus, electronic device, and storage medium
Qunfen Lin, Guangdong (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Guangdong (CN)
Filed on Mar. 4, 2022, as Appl. No. 17/686,947.
Application 17/686,947 is a continuation of application No. PCT/CN2020/123677, filed on Oct. 26, 2020.
Claims priority of application No. 202010059845.3 (CN), filed on Jan. 19, 2020.
Prior Publication US 2022/0198732 A1, Jun. 23, 2022
Int. Cl. G06T 7/20 (2017.01); G06T 7/73 (2017.01); G06T 13/40 (2011.01); G06V 10/774 (2022.01); G06V 20/40 (2022.01)

CPC G06T 13/40 (2013.01) [G06T 7/20 (2013.01); G06T 7/74 (2017.01); G06V 10/7747 (2022.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

18 Claims

1. An information processing method, comprising:

determining, by processing circuitry of an information processing apparatus, a policy network to be trained according to a reference animation episode that includes a reference posture sequence for a reference virtual character to complete a target task, the policy network being an actor network of a proximal policy optimization (PPO) algorithm, the training the policy network including performing iteratively:

inputting status information of a training virtual character at a current moment in a sample animation episode and a training task to the policy network to be trained to obtain posture data of the training virtual character at a following moment outputted from the policy network to be trained, the current moment corresponding to a current key frame of animation, the following moment corresponding to a following key frame of animation;

inputting the status information of the training virtual character at the current moment and the training task to a value assessment network of the PPO algorithm to obtain a status value at the current moment that is output from the value assessment network, the value assessment network being obtained by training according to the reference animation episode; and

adjusting parameters of the policy network to be trained according to the status value and the posture data of the training virtual character at the following moment until a training condition is satisfied to obtain the trained policy network.