US 12,033,408 B1
	Continual text recognition using prompt-guided knowledge distillation
Ankit Malviya, Ujjain (IN); Shubhanshu Kumar Singh, Gajraula (IN); Vishu Mittal, Pilkhuwa (IN); Anish Goswami, Pune (IN); Chaithanya Manda, Jersey City, NJ (US); Saurabh Khanna, Gurgaon (IN); and Sarika Pal, Bangalore (IN)
Assigned to ExlService Holdings, Inc., New York, NY (US)
Filed by ExlService Holdings, Inc., New York, NY (US)
Filed on Dec. 19, 2023, as Appl. No. 18/389,641.
Application 18/389,641 is a continuation in part of application No. 18/367,920, filed on Sep. 13, 2023.
Int. Cl. G06V 30/14 (2022.01)

CPC G06V 30/1456 (2022.01)

20 Claims

1. At least one non-transitory, computer-readable storage medium comprising instructions recorded thereon, the instructions, when executed by at least one processor of a text recognition system, causing the text recognition system to:

obtain an image file comprising a visual representation of alphanumeric characters;

receive a prompt relating to the image file,

wherein the prompt is associated with a query regarding a region of the image file;

using the prompt and the image file, cause a trained region encoder to determine a first region of interest in the image file,

wherein the trained region encoder includes an attention-based continual knowledge distillation model;

modify a first image associated with the first region of interest of the image file to generate a data augmentation entity,

wherein the data augmentation entity comprises a modified image;

using a trained instance encoder, generate a first set of visual instances corresponding to the first image associated with the first region of interest and a second set of visual instances corresponding to the data augmentation entity of the first region of interest,

wherein the trained instance encoder is trained using self-supervised gradient recursion;

generate a first ordered sequence associated with the first set of visual instances and a second ordered sequence associated with the second set of visual instances; and

using an output of executing a self-supervised contrastive loss function on the first ordered sequence and the second ordered sequence, perform operations comprising:

automatically further train the attention-based continual knowledge distillation model of the trained region encoder; and

provide the first ordered sequence to an instance decoder to generate, for display on a user interface, an output item in response to the prompt.