US 12,277,688 B2
	Multi-task text inpainting of digital images
Vijay Kumar Baikampady Gopalkrishna, Santa Clara, CA (US); and Raja Bala, Pittsford, NY (US)
Assigned to CAREAR HOLDINGS LLC, Norwalk, CT (US)
Filed by CareAR Holdings LLC, Norwalk, CT (US)
Filed on Jun. 30, 2021, as Appl. No. 17/363,253.
Prior Publication US 2023/0005107 A1, Jan. 5, 2023
Int. Cl. G06K 9/00 (2022.01); G06T 5/77 (2024.01); G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2006.01)

CPC G06T 5/77 (2024.01) [G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

36 Claims

1. A digital image frame editing method comprising, by a processor:

receiving a digital image frame;

processing the digital image frame to define a region of interest (ROI) that contains original text;

processing the ROI through a multi-task machine learning model to predict, in parallel processes:

a foreground image of the ROI, wherein the foreground image comprises the original text,

a background image of the ROI, wherein the background image omits the original text, and

a binary mask that distinguishes foreground image pixels from background image pixels in the ROI;

receiving a target mask that contains replacement text; and

applying the target mask to blend the background image with the foreground image and yield a modified digital image that includes the replacement text and omits the original text,

wherein the multi-task machine learning model comprises:

a single deep neural encoder that receives the ROI, and

separate deep neural decoders for predicting each of the foreground image, the background image and the binary mask; and

wherein the method further comprises, before applying the target mask:

using the binary mask to extract an average background signal of the ROI, and

using the average background signal to modify the background image produced by the decoder that predicted the background image, wherein using the average background signal to modify the background image comprises:

generating a residual signal as a difference between the average background signal extracted by the binary mask and the average background signal of the predicted background image, and

modifying the predicted background image by adding the residual signal to substantially every pixel in the predicted background image.