US 12,423,848 B2
	Method for generating depth in images, electronic device, and non-transitory storage medium
Jung-Hao Yang, New Taipei (TW); Chih-Te Lu, New Taipei (TW); and Chin-Pin Kuo, New Taipei (TW)
Assigned to HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW)
Filed by HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW)
Filed on Jan. 13, 2023, as Appl. No. 18/097,080.
Claims priority of application No. 202210570782.7 (CN), filed on May 24, 2022.
Prior Publication US 2023/0386063 A1, Nov. 30, 2023
Int. Cl. G06T 7/593 (2017.01); G06V 10/80 (2022.01); H04N 13/106 (2018.01)

CPC G06T 7/593 (2017.01) [G06V 10/803 (2022.01); H04N 13/106 (2018.05); G06T 2207/20081 (2013.01); G06T 2207/20228 (2013.01)]

15 Claims

6. An electronic device, comprising:

at least one processor; and

a data storage storing one or more programs which when executed by the at least one processor, cause the at least one processor to:

acquire multiple sets of binocular images to build a dataset containing instance segmentation labels based on the multiple sets of binocular images;

train an autoencoder network based on the dataset containing instance segmentation labels to obtain a trained autoencoder network;

acquire a monocular image and input the monocular image into the trained autoencoder network to obtain a first disparity map; and

convert the first disparity map to obtain a depth image corresponding to the monocular image;

wherein each set of the multiple sets of binocular images comprises a first image and a second image, training the autoencoder network based on the dataset containing instance segmentation labels to obtain the trained autoencoder network comprises:

inputting the first image of one set of the multiple sets of binocular images into the autoencoder network to obtain a second disparity map;

processing the second disparity map based on the instance segmentations labels to obtain a third disparity map;

adding the first image with the third disparity map to obtain a predicated image of the second image of the one set of the multiple sets of binocular images;

using a preset mean square error formula to calculate the error between the second image of the one set of the multiple sets of binocular images and the predicated image; and

determining the error as a training loss of the autoencoder network until the training loss converges to obtain the trained autoencoder network.