US 12,406,438 B2
	Indoor scene virtual roaming method based on reflection decomposition
Weiwei Xu, Hangzhou (CN); Jiamin Xu, Hangzhou (CN); Xiuchao Wu, Hangzhou (CN); Zihan Zhu, Hangzhou (CN); and Hujun Bao, Hangzhou (CN)
Assigned to ZHEJIANG UNIVERSITY, Hangzhou (CN)
Filed by ZHEJIANG UNIVERSITY, Zhejiang (CN)
Filed on Oct. 20, 2023, as Appl. No. 18/490,790.
Application 18/490,790 is a continuation of application No. PCT/CN2021/088788, filed on Apr. 21, 2021.
Prior Publication US 2024/0169674 A1, May 23, 2024
Int. Cl. G06V 10/56 (2022.01); G06N 3/04 (2023.01); G06T 5/92 (2024.01); G06T 7/13 (2017.01); G06T 7/40 (2017.01); G06T 7/80 (2017.01); G06T 15/04 (2011.01); G06T 15/50 (2011.01); G06T 17/20 (2006.01)

CPC G06T 17/205 (2013.01) [G06T 5/92 (2024.01); G06T 7/13 (2017.01); G06T 7/40 (2013.01); G06T 7/80 (2017.01); G06V 10/56 (2022.01)]

6 Claims

1. An indoor scene virtual roaming method based on reflection decomposition, comprising:

step S1, capturing pictures sufficient for covering a target indoor scene, carrying out three-dimensional reconstruction for the target indoor scene based on the captured pictures, and obtaining internal and external camera parameters and a global triangular mesh model of the target indoor scene;

step S2, for each picture, projecting the global triangular mesh model into a corresponding depth map, aligning depth edges to color edges, converting the aligned depth map into a triangular mesh, and performing mesh simplification on the triangular mesh;

step S3, detecting a plane in the global triangular mesh model, and detecting whether the plane is a reflection plane by means of color consistency between adjacent images; when the plane is the reflection plane, constructing a double-layer expression on a reflection area for each picture in which the reflection plane is visible to correctly render a reflection effect of an object surface; wherein the double-layer expression comprises double-layer triangular meshes of foreground and background and two decomposed pictures of foreground and background, wherein a foreground triangular mesh is used for expressing object surface geometry, and a background triangular mesh is used for expressing a mirror image of a scene geometry on the reflection plane; a foreground picture is used for expressing object surface textures after removing reflection components, and a background picture is used for expressing the reflection components of the scene on the object surface; step S3 comprises the following sub-steps:

sub-step S31, detecting planes in the global triangular mesh model, reserving planes with an area larger than an area threshold, projecting the planes onto visible pictures, and recording a set of pictures in which the planes are visible as custom character

; for each picture I_kin custom character

, calculating a set custom character

_kof K neighboring pictures thereof, wherein a calculation of K neighbors is obtained according to an ordering of overlapping rate of vertices in the global triangular mesh model after plane reflection;

constructing a matching cost volume using custom character

_k, determining whether the plane has enough reflection components in the picture I_k, wherein a determining method is as follows: for each pixel, after mirroring the global triangular mesh model according to a plane equation, finding a cost corresponding to a mirrored depth value in the matching cost volume, and determining whether a cost position is a local minimum point; when a number of pixels of the local minimum points of a cost in the picture is greater than a pixel number threshold, determining that the plane has reflection components in the picture; when a number of visible pictures with reflection components in a certain plane is greater than a picture number threshold, determining the plane to be the reflection plane;

sub-step S32, for each reflection plane, calculating a two-dimensional reflection area β_kthereof on each visible picture, sub-step S32 comprises: projecting the reflection plane onto the visible picture to obtain a projected depth map, expanding the projected depth map, and comparing the expanded projected depth map with the aligned depth map to obtain an accurate two-dimensional reflection area; screening each pixel with a depth value in the projected depth map by three-dimensional point distances and normal included angles, and taking a screened pixel area as the two-dimensional reflection area β_kof the reflection plane on the picture;

sub-step S33, constructing the double-layer expression on the reflection area for each picture in which the reflection plane is visible, sub-step S33 comprises: taking the projected depth map as an initial foreground depth map, mirroring internal and external parameters of the camera of the picture into a virtual camera according to the plane equation, rendering an initial background depth map in the virtual camera by using the global triangular mesh model, and converting the initial foreground depth map and the initial background depth map into simplified two layers of triangular meshes M_k⁰and M_k¹;

calculating two layers of foreground and background pictures I_k⁰and I_k¹by an iterative optimization algorithm, and further optimizing M_k⁰and M_k¹, wherein all related original pictures are subjected to an inverse gamma correction in advance for subsequent decomposition;

an optimization objective is to minimize the following energy function:

where (R, T)_k¹in the optimization objective represents a rigid body transformation of a triangular mesh of a reflection layer, and initial values thereof are identity matrix and 0, respectively, and M_k⁰and M_k¹only optimize three-dimensional positions of mesh vertexes without changing topological structures; E_d, E_sand E_pare a data item, a smoothing term and a prior term, respectively, λ_s, λ_pare weights of respective items, and u represents a pixel in I_k^0,1; the following relations are satisfied:

where H is a Laplace matrix; a function ω⁻¹returns two-dimensional coordinates, and projects the point u in an image I_k′ to an image I_kaccording to the depth value and the internal and external camera parameters; D_k^0,1represents the depth map obtained by projection of M_k^0,1; v represents a vertex in M_k^0,1;

in order to minimize the energy function, an alternating optimization solutions is used, and for each round of optimization, (R, T)_k¹and M_k^0,1are fixed, I_k^0,1is optimized, wherein an initial value of I_k^0,1; is calculated by the following formula:

I_k⁰(u)=min({I_k⁰,(ω⁻¹(u,D_k⁰)|k′∈ custom character

_k})

I_k¹(u)=I_k(u)−I_k⁰(u)

the initial value is given, a nonlinear conjugate gradient method is used for optimization; I_k^0,1is fixed, and (R, T)_k¹and M_k^0,1are optimized, for which the conjugate gradient method is used; one alternation is one round of optimization, and two rounds of optimization in total are carried out for the whole optimization process, and after a first round of optimization, I_k^0,1after the first round of optimization is denoised by consistency constraint of foreground pictures among multiple viewports, specifically: I_k⁰, and I_k¹, after the first round of optimization are known, k′∈ custom character

_k, and denoised images Ĩ_k⁰and Ĩ_k¹are obtained by using the following formula:

a second round of optimization is continually carried out using Ĩ_k⁰(u) and Ĩ_k¹(u) as initial values of I_k^0,1, and further, a prior term is added to a total energy equation in the second round of optimization:

where λ_gis a weight of the prior term for constraining the second round of optimization;

after two rounds of optimization, final two layers of a simplified triangular mesh M_k^0,1and a decomposed I_k^0,1are obtained by converting M_k¹with (R, T)_k¹for correctly rendering the reflection effect of the object surface; and

step S4, giving a virtual viewport, drawing a virtual viewport picture by using neighborhood pictures and triangular meshes, and for the reflection area, performing drawing by using foreground and background pictures and foreground and background triangular meshes, specifically: drawing the reflection area β_kin a neighborhood picture to a current virtual viewpoint to obtain a reflection area β_nof the current virtual viewpoint, wherein for pixels in the reflection area, drawing is performed by using two layers of foreground and background pictures and simplified triangular meshes, and calculation of the depth map and color blending are performed for two layers of images, respectively; since the two layers of pictures I_k⁰and I_k¹are obtained by decomposition after inverse gamma correction, the two layers of blended images are added in a rendering stage, and a gamma correction is carried out once to obtain correct pictures with a reflection effect.