US 12,223,667 B2
Joint visual object detection and object mapping to a 3D model
Volodya Grancharov, Solna (SE); Sigurdur Sverrisson, Kungsängen (SE); Alfredo Fanghella, Stockholm (SE); and Manish Sonal, Sollentuna (SE)
Assigned to TELEFONAKTIEBOLAGET LM ERICSSON (PUBL), Stockholm (SE)
Appl. No. 17/783,136
Filed by Telefonaktiebolaget LM Ericsson (publ), Stockholm (SE)
PCT Filed Dec. 9, 2019, PCT No. PCT/EP2019/084245
§ 371(c)(1), (2) Date Jun. 7, 2022,
PCT Pub. No. WO2021/115557, PCT Pub. Date Jun. 17, 2021.
Prior Publication US 2023/0351621 A1, Nov. 2, 2023
Int. Cl. G06T 7/38 (2017.01)
CPC G06T 7/38 (2017.01) [G06T 2207/10016 (2013.01); G06T 2207/30242 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method for joint visual object detection and object mapping to a 3D model, the method being performed by an image processing device, the method comprising:
obtaining a first sequence of digital images of a scene as captured by a first image capturing unit, and obtaining a second sequence of digital images of the scene as captured by a second image capturing unit,
wherein the second sequence of digital images is time-wise synchronized with the first sequence of digital images by being captured time-wise in parallel with the first sequence of digital images, wherein the first image capturing unit has a narrower field of view than the field of view of the second image capturing unit, and wherein the first image capturing unit and the second image capturing unit have a known spatial relation; and
performing joint visual object detection and object mapping to the 3D model by:
extracting a set of objects from both the first sequence of digital images and the second sequence of digital images by performing visual object detection on both the first sequence of digital images and the second sequence of digital images;
mapping the extracted set of objects to the 3D model in accordance with the second sequence of digital images and the known spatial relation, and thereby registering the scene to the 3D model; and
providing, as a result of how many objects are detected either in the first sequence of digital images or in the second sequence of digital images, an indication to move the image capturing units closer towards, or farther from, the scene.