US 12,354,013 B2
	Systems and methods for masked self-training of unsupervised image classification
Junnan Li, Singapore (SG); and Chu Hong Hoi, Singapore (SG)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on May 27, 2022, as Appl. No. 17/827,339.
Claims priority of provisional application 63/337,946, filed on May 3, 2022.
Prior Publication US 2023/0359900 A1, Nov. 9, 2023
Int. Cl. G06N 3/088 (2023.01); G06V 10/75 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)

CPC G06N 3/088 (2013.01) [G06V 10/751 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)]

20 Claims

1. A method of masked self-training for image classification, the method comprising:

receiving, via a communication interface, an image

dividing the image into a plurality of image patches;

randomly replacing one or more image patches with a mask token;

encoding, via a first encoder, the plurality of image patches including one or more masked patches and a first start token into a first start embedding and a plurality of image embeddings including one or more mask embeddings;

normalizing, by a linear projection layer, the first start embedding and the one or more mask embeddings;

computing a global-local feature alignment loss based on an average squared distance between the normalized first start embedding and the normalized one or more mask embeddings; and

updating the first encoder based at least in part on the global-local feature alignment loss.