| CPC G06V 10/774 (2022.01) [G06V 10/82 (2022.01)] | 20 Claims |

|
1. An electronic device, comprising: circuitry configured to: fine-tune, based on first training data including a first set of images, an autoencoder model and a transformer model associated with the autoencoder model, wherein the autoencoder model includes an encoder model, a learned codebook associated with the transformer model, a generator model, and a discriminator model; select a subset of images from the first training data; apply the encoder model on the selected subset of images based on the learned codebook to determine encoded subset of images; generate second training data including a second set of images, based on the application of the encoder model, wherein the generated second training data corresponds to a quantized latent representation of the selected subset of images; and pre-train the autoencoder model to create a next generation of the autoencoder model, based on the generated second training data, wherein the next generation of the autoencoder model is used to generate specific training data for a next generation of the transformer model.
|