US 12,217,443 B2
Depth image generation method, apparatus, and storage medium and electronic device
Runze Zhang, Shenzhen (CN); Hongwei Yi, Shenzhen (CN); Ying Chen, Shenzhen (CN); Shang Xu, Shenzhen (CN); and Yu Wing Tai, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD, Guangdong (CN)
Filed on Apr. 6, 2022, as Appl. No. 17/714,654.
Application 17/714,654 is a continuation of application No. PCT/CN2020/127891, filed on Nov. 10, 2020.
Claims priority of application No. 202010119713.5 (CN), filed on Feb. 26, 2020.
Prior Publication US 2022/0230338 A1, Jul. 21, 2022
Int. Cl. G06T 7/50 (2017.01); G06F 18/25 (2023.01); G06T 7/55 (2017.01)
CPC G06T 7/55 (2017.01) [G06F 18/253 (2023.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A depth image generation method, performed by a computer device, comprising:
acquiring a plurality of target images;
performing multi-stage convolution processing on the plurality of target images through a plurality of convolutional layers in a convolution model to obtain feature map sets respectively outputted by the plurality of convolutional layers, each feature map set comprising feature maps corresponding to the plurality of target images;
performing view aggregation on a plurality of feature maps in each feature map set respectively to obtain an aggregated feature corresponding to each feature map set; and
performing fusion processing on the plurality of obtained aggregated features to obtain a depth image,
wherein the performing view aggregation on a plurality of feature maps in each feature map set respectively to obtain an aggregated feature corresponding to each feature map set comprises:
regarding any one of the target images as a reference image, and regarding other target images in the plurality of target images as a first image;
performing the following processing on a feature map set:
determining, in the feature map set, a reference feature map corresponding to the reference image and a first feature map corresponding to the first image;
performing, according to a difference between photographing views of the first image and the reference image, view conversion on the first feature map to obtain a second feature map after conversion; and
performing fusion processing on the reference feature map and the second feature map to obtain the aggregated feature.