CPC H04N 19/124 (2014.11) [G06V 10/774 (2022.01); H04N 19/132 (2014.11); H04N 19/91 (2014.11)] | 16 Claims |
1. A method for encoding video for machine vision and human/machine hybrid vision, the method being executed by one or more processors, the method comprising:
receiving, at a hybrid codec, an input including at least one of video or image data, the hybrid codec including a first codec and a second codec, wherein the first codec is a traditional codec designed for human consumption and the second codec is a learning-based codec designed for machine vision;
compressing the input using the first codec, wherein the compressing includes down-sampling the input using a down-sampling module and up-sampling the compressed input using an up-sampling module producing a residual signal;
quantizing the residual signal to obtain a quantized representation of the input;
entropy encoding the quantized representation of the input using one or more convolutional filter modules; and
training one or more networks using the entropy encoded quantized representation,
wherein the up-sampled compressed input is subtracted from the input to generate a second residual signal,
wherein the second residual signal is provided to the learning-based codec,
wherein the output of the second codec is added on top of the up-sampled compressed input to form the reconstructed video for machine vision tasks, and
wherein training the one or more networks using the entropy encoded quantized representation comprises determining a value of an index specifying which of the machine vision tasks is targeted by the training.
|