| CPC H04N 7/147 (2013.01) [G06F 18/251 (2023.01); G06F 18/253 (2023.01); G06N 3/08 (2013.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 40/107 (2022.01); G06V 40/16 (2022.01); G06V 40/174 (2022.01); H04N 7/155 (2013.01)] | 23 Claims |

|
1. A video image transmission method, comprising:
acquiring a video image captured by a first video communication end;
determining an encoding mode, wherein the encoding mode comprises one of a preset object mode;
recognizing the preset object in the video image to obtain a sub-image of the preset object;
providing the sub-image of the preset object to a trained neural network, wherein the trained neural network comprises an encoder comprising a series of one or more convolution layers and a middle layer to sequentially process the sub-image, and wherein the one or more convolution layers comprise a lower convolution layer whose output is fed to the middle layer;
executing the trained neural network to output a part of feature vectors extracted from the lower convolution layer and a low-dimensional vector from the middle layer, the low-dimensional vector representing semantic information of the preset object in the video image; and
sending, through a communication network, the part of the feature vectors extracted from the lower convolution layer and the low-dimensional vector representing the semantic information to a second video communication end, wherein the semantic information is used by a decoder to reconstruct a reconstruction image of the video image at the second video communication end.
|