CPC G06N 3/088 (2013.01) [G06V 10/751 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)] | 20 Claims |
1. A method of masked self-training for image classification, the method comprising:
receiving, via a communication interface, an image
dividing the image into a plurality of image patches;
randomly replacing one or more image patches with a mask token;
encoding, via a first encoder, the plurality of image patches including one or more masked patches and a first start token into a first start embedding and a plurality of image embeddings including one or more mask embeddings;
normalizing, by a linear projection layer, the first start embedding and the one or more mask embeddings;
computing a global-local feature alignment loss based on an average squared distance between the normalized first start embedding and the normalized one or more mask embeddings; and
updating the first encoder based at least in part on the global-local feature alignment loss.
|