US 12,469,213 B2
	Method, system and apparatus for generating multi-dimensional scene graph with complex light field
Lu Fang, Beijing (CN); Zequn Chen, Beijing (CN); Haozhe Lin, Beijing (CN); and Jinzhi Zhang, Beijing (CN)
Assigned to TSINGHUA UNIVERSITY, Beijing (CN)
Filed by TSINGHUA UNIVERSITY, Beijing (CN)
Filed on Sep. 7, 2023, as Appl. No. 18/462,563.
Claims priority of application No. 202310227248.0 (CN), filed on Mar. 10, 2023.
Prior Publication US 2024/0303914 A1, Sep. 12, 2024
Int. Cl. G06T 17/00 (2006.01); G06T 15/50 (2011.01); G06V 10/77 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)

CPC G06T 15/506 (2013.01) [G06V 10/7715 (2022.01); G06V 10/806 (2022.01)]

12 Claims

1. A method for generating a multi-dimensional scene graph with a complex light field, comprising:

obtaining entity features of entities by inputting respective 2-Dimensional (2D) images captured in multiple view directions into an object detection model for object detection, and obtaining features of a respective single-view-direction scene graph by predicting, using an object relation prediction model, a semantic relation among the entities contained in a corresponding 2D image captured in each view direction;

determining, based on a multi-view-direction consistency and a feature comparison result of the entity features, an entity correlation for an entity among the multiple view directions, as an entity re-identification result;

establishing a multi-dimensional bounding box for the entity based on the entity re-identification result and a geometric constraint of camera parameters; and

obtaining a feature fusion result by fusing, using a multi-view-direction information fusion algorithm, the features of respective single-view-direction scene graphs in the multiple view directions, and establishing the multi-dimensional scene graph with the complex light field based on the feature fusion result and an inferred multi-dimensional semantic relation among the entities based on the multi-dimensional bounding box;

wherein determining, based on the multi-view-direction consistency and the feature comparison result of the entity features, the entity correlation for the entity among the multiple view directions, as the entity re-identification result comprises:

obtaining an internal parameter matrix, represented by K₁, of a first camera, obtaining an internal parameter matrix represented by K₂, of a second camera, obtaining an external parameter, represented by R, of a first coordinate system and an external parameter, represented by t, of a second coordinate system, obtaining a first image point, represented by P₁(u₁,v₁,1), of a world point P and a second image point, represented by P₂(u₂,v₂,1), of the world point P;

obtaining a polar cone by transforming midpoints of vertical edges of an object bounding box of the first camera into a coordinate system of the second camera, wherein the polar cone is represented by:

obtaining a final entity re-identification result by comparing features between the object bounding box in the first camera and an object bounding box in the second camera, wherein the polar cone of the object bounding box in the first camera intersects with the object bounding box in the second camera.