US 12,444,173 B2
Image component generation based on application of iterative learning on autoencoder model and transformer model
Marzieh Edraki, San Jose, CA (US); and Akira Nakamura, San Jose, CA (US)
Assigned to SONY GROUP CORPORATION, Tokyo (JP); and SONY CORPORATION OF AMERICA, New York, NY (US)
Filed by SONY GROUP CORPORATION, Tokyo (JP); and SONY CORPORATION OF AMERICA, New York, NY (US)
Filed on Mar. 1, 2023, as Appl. No. 18/177,084.
Claims priority of provisional application 63/368,264, filed on Jul. 13, 2022.
Prior Publication US 2024/0029411 A1, Jan. 25, 2024
Int. Cl. G06V 10/774 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/774 (2022.01) [G06V 10/82 (2022.01)] 20 Claims
OG exemplary drawing
 
1. An electronic device, comprising: circuitry configured to: fine-tune, based on first training data including a first set of images, an autoencoder model and a transformer model associated with the autoencoder model, wherein the autoencoder model includes an encoder model, a learned codebook associated with the transformer model, a generator model, and a discriminator model; select a subset of images from the first training data; apply the encoder model on the selected subset of images based on the learned codebook to determine encoded subset of images; generate second training data including a second set of images, based on the application of the encoder model, wherein the generated second training data corresponds to a quantized latent representation of the selected subset of images; and pre-train the autoencoder model to create a next generation of the autoencoder model, based on the generated second training data, wherein the next generation of the autoencoder model is used to generate specific training data for a next generation of the transformer model.