US 12,307,545 B2
	Method, electronic device, and computer program product for processing virtual avatar
Zijia Wang, WeiFang (CN); Zhisong Liu, Shenzhen (CN); and Zhen Jia, Shanghai (CN)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Nov. 11, 2022, as Appl. No. 17/985,578.
Claims priority of application No. 202211275814.7 (CN), filed on Oct. 18, 2022.
Prior Publication US 2024/0135482 A1, Apr. 25, 2024
Int. Cl. G06T 1/00 (2006.01); G06T 13/40 (2011.01)

CPC G06T 1/0021 (2013.01) [G06T 13/40 (2013.01); G06T 2207/20081 (2013.01)]

20 Claims

1. A method of processing a virtual avatar, comprising:

generating an image feature of the virtual avatar based on a plurality of image blocks of the virtual avatar and corresponding positions of the plurality of image blocks in the virtual avatar;

generating, based on a watermark to be added to the virtual avatar, a text feature associated with text of the watermark; and

generating a watermarked virtual avatar based on the image feature and the text feature, wherein the watermark is invisible to human beings and identifies an identity of a user of the virtual avatar in a metaverse;

wherein generating the watermarked virtual avatar based on the image feature and the text feature comprises:

determining a key-value pair based on the image feature and the text feature, the key-value pair comprising (i) a key that represents a semantic embedding parameter of a self-attention mechanism in a machine learning model, and (ii) a value that represents the text feature;

determining the image feature as a query vector;

determining a weight set based on a similarity between the query vector and a key in the key-value pair; and

determining, based on the weight set, a watermarked image feature;

wherein generating the watermarked virtual avatar further comprises processing the watermarked virtual avatar through an invariant layer of the machine learning model, the invariant layer comprising at least one fully-connected layer of the machine learning model and implementing a mapping function that converts a first number of image channels of an input image space of the watermarked virtual avatar into a second number of image channels of a transformed image space of an updated watermarked virtual avatar, the second number being greater than the first number.