US 12,437,428 B2
	Method for training an image depth recognition model, method for recognizing image depth, and electronic device
Chieh Lee, New Taipei (TW); and Chin-Pin Kuo, New Taipei (TW)
Assigned to HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW)
Filed by HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW)
Filed on Mar. 23, 2023, as Appl. No. 18/125,675.
Claims priority of application No. 202210785650.6 (CN), filed on Jul. 4, 2022.
Prior Publication US 2024/0005535 A1, Jan. 4, 2024
Int. Cl. G06K 9/00 (2022.01); G06T 7/11 (2017.01); G06T 7/50 (2017.01); G06T 7/70 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06V 10/74 (2022.01); G06V 10/764 (2022.01); G06V 10/771 (2022.01)

CPC G06T 7/50 (2017.01) [G06T 7/11 (2017.01); G06T 7/70 (2017.01); G06V 10/25 (2022.01); G06V 10/44 (2022.01); G06V 10/761 (2022.01); G06V 10/764 (2022.01); G06V 10/771 (2022.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01)]

19 Claims

1. A method for training an image depth recognition model by using an electronic device, the method comprising:

obtaining a first image and a second image;

obtaining a first static object, a plurality of first dynamic objects and a first dynamic position of each first dynamic object by performing an instance segmentation on the first image, obtaining a second static object and a plurality of second dynamic objects by performing an instance segmentation on the second image, through an instance segmentation model;

selecting a plurality of target dynamic objects from the plurality of first dynamic objects based on a number of pixel points in each first dynamic object and preset positions, and selecting a plurality of feature dynamic objects from the plurality of second dynamic objects based on the number of pixel points in each second dynamic object and the preset positions;

recognizing whether each target dynamic object has a corresponding feature dynamic object and determining the target dynamic object and corresponding feature dynamic object as recognition objects;

recognizing an object state of the target dynamic object in the recognition objects according to a dynamic posture matrix corresponding to the recognition objects, a static posture matrix corresponding to the first static object, a static posture matrix corresponding to the second static object, and a preset threshold matrix;

generating a target image according to the object state, the first dynamic position and the first image, and generating a target projection image according to the object state, the first dynamic position and the target image;

obtaining an image depth recognition model by training a preset depth recognition network, based on a gradient error between an initial depth image corresponding to the first image and the target image and a photometric error between the target projection image and the target image.