US 11,756,231 B2
	Method and apparatus for scale calibration and optimization of a monocular visual-inertial localization system
Yi Chen, San Jose, CA (US); Ke Huang, San Jose, CA (US); and Wei Xi, San Jose, CA (US)
Assigned to MIDEA GROUP CO., LTD., Foshan (CN)
Filed by Midea Group Co., Ltd., Foshan (CN)
Filed on Jun. 29, 2021, as Appl. No. 17/362,832.
Prior Publication US 2022/0414932 A1, Dec. 29, 2022
Int. Cl. G06T 7/80 (2017.01); H04N 5/247 (2006.01); G01C 21/16 (2006.01); B25J 9/16 (2006.01); B25J 5/00 (2006.01)

CPC G06T 7/80 (2017.01) [B25J 5/007 (2013.01); B25J 9/1692 (2013.01); G01C 21/1656 (2020.08); G06T 2207/10016 (2013.01); G06T 2207/30244 (2013.01)]

17 Claims

1. A method, comprising:

capturing, by a camera disposed on a device moving in an environment, a plurality of image frames recorded in a first coordinate reference frame at respective locations within a portion of the environment in a first time period;

capturing, by an inertial measurement unit disposed on the device, sets of inertial odometry data recorded in a second coordinate reference frame, the sets of inertial odometry data corresponding to the plurality of image frames at the respective locations, in the first time period;

storing in a buffer, a matching pair of an image frame and a set of inertial odometry data that satisfies first criteria;

in accordance with a determination that a threshold number of matching pairs of image frames and inertial odometry data have been stored:

determining a rotational transformation matrix that corresponds to a relative rotation between the first coordinate reference frame and the second coordinate reference frame;

determining a scale factor from the matching pairs of image frames, wherein the rotational transformation matrix defines an orientation of the device, and the scale factor and the rotational transformation matrix calibrate the plurality of image frames captured by the camera, wherein the scale factor is solved as a parameter to a least square problem based on the matching pairs of image frames and inertial odometry data;

determining a translation transformation vector from the matching pairs of image frames, and wherein the rotational transformation matrix and the translation transformation vector define a transformation pose for transforming the first coordinate reference frame into the second coordinate reference frame, wherein the translation transformation vector is referenced to the first coordinate reference frame when a respective image of the matching pairs of image frames is taken to reduce scale ambiguity associated with the respective image,

multiplying the scale factor to the translation transformation vector, and

further multiplying a pose of a body frame of the device referenced to a camera frame when the respective image of the matching pairs of image frames is taken.