| CPC G06T 7/593 (2017.01) [G06V 10/803 (2022.01); H04N 13/106 (2018.05); G06T 2207/20081 (2013.01); G06T 2207/20228 (2013.01)] | 15 Claims |

|
6. An electronic device, comprising:
at least one processor; and
a data storage storing one or more programs which when executed by the at least one processor, cause the at least one processor to:
acquire multiple sets of binocular images to build a dataset containing instance segmentation labels based on the multiple sets of binocular images;
train an autoencoder network based on the dataset containing instance segmentation labels to obtain a trained autoencoder network;
acquire a monocular image and input the monocular image into the trained autoencoder network to obtain a first disparity map; and
convert the first disparity map to obtain a depth image corresponding to the monocular image;
wherein each set of the multiple sets of binocular images comprises a first image and a second image, training the autoencoder network based on the dataset containing instance segmentation labels to obtain the trained autoencoder network comprises:
inputting the first image of one set of the multiple sets of binocular images into the autoencoder network to obtain a second disparity map;
processing the second disparity map based on the instance segmentations labels to obtain a third disparity map;
adding the first image with the third disparity map to obtain a predicated image of the second image of the one set of the multiple sets of binocular images;
using a preset mean square error formula to calculate the error between the second image of the one set of the multiple sets of binocular images and the predicated image; and
determining the error as a training loss of the autoencoder network until the training loss converges to obtain the trained autoencoder network.
|