CPC G06V 30/1456 (2022.01) | 20 Claims |
1. At least one non-transitory, computer-readable storage medium comprising instructions recorded thereon, the instructions, when executed by at least one processor of a text recognition system, causing the text recognition system to:
obtain an image file comprising a visual representation of alphanumeric characters;
receive a prompt relating to the image file,
wherein the prompt is associated with a query regarding a region of the image file;
using the prompt and the image file, cause a trained region encoder to determine a first region of interest in the image file,
wherein the trained region encoder includes an attention-based continual knowledge distillation model;
modify a first image associated with the first region of interest of the image file to generate a data augmentation entity,
wherein the data augmentation entity comprises a modified image;
using a trained instance encoder, generate a first set of visual instances corresponding to the first image associated with the first region of interest and a second set of visual instances corresponding to the data augmentation entity of the first region of interest,
wherein the trained instance encoder is trained using self-supervised gradient recursion;
generate a first ordered sequence associated with the first set of visual instances and a second ordered sequence associated with the second set of visual instances; and
using an output of executing a self-supervised contrastive loss function on the first ordered sequence and the second ordered sequence, perform operations comprising:
automatically further train the attention-based continual knowledge distillation model of the trained region encoder; and
provide the first ordered sequence to an instance decoder to generate, for display on a user interface, an output item in response to the prompt.
|