| CPC G06T 5/77 (2024.01) [G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/60 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] | 36 Claims |

|
1. A digital image frame editing method comprising, by a processor:
receiving a digital image frame;
processing the digital image frame to define a region of interest (ROI) that contains original text;
processing the ROI through a multi-task machine learning model to predict, in parallel processes:
a foreground image of the ROI, wherein the foreground image comprises the original text,
a background image of the ROI, wherein the background image omits the original text, and
a binary mask that distinguishes foreground image pixels from background image pixels in the ROI;
receiving a target mask that contains replacement text; and
applying the target mask to blend the background image with the foreground image and yield a modified digital image that includes the replacement text and omits the original text,
wherein the multi-task machine learning model comprises:
a single deep neural encoder that receives the ROI, and
separate deep neural decoders for predicting each of the foreground image, the background image and the binary mask; and
wherein the method further comprises, before applying the target mask:
using the binary mask to extract an average background signal of the ROI, and
using the average background signal to modify the background image produced by the decoder that predicted the background image, wherein using the average background signal to modify the background image comprises:
generating a residual signal as a difference between the average background signal extracted by the binary mask and the average background signal of the predicted background image, and
modifying the predicted background image by adding the residual signal to substantially every pixel in the predicted background image.
|