US 12,450,859 B2
	Model fitting using keypoint regression
Julien Pascal Christophe Valentin, Zurich (CH); Erroll William Wood, Cambridge (GB); Thomas Joseph Cashman, Cambridge (GB); Martin de La Gorce, Cambridge (GB); Tadas Baltrusaitis, Cambridge (GB); Daniel Stephen Wilde, Cambridge (GB); Jingjing Shen, Cambridge (GB); Matthew Alastair Johnson, Cambridge (GB); Charles Thomas Hewitt, Cambridge (GB); Nikola Milosavljevic, Belgrade (RS); Stephan Joachim Garbin, Cambridge (GB); Toby Sharp, Cambridge (GB); and Ivan Stojiljkovic, Cambridge (GB)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Jun. 28, 2022, as Appl. No. 17/852,175.
Claims priority of provisional application 63/317,436, filed on Mar. 7, 2022.
Prior Publication US 2023/0281863 A1, Sep. 7, 2023
Int. Cl. G06V 10/25 (2022.01); G06N 3/08 (2023.01); G06T 7/33 (2017.01); G06T 7/73 (2017.01); G06T 17/00 (2006.01); G06T 19/20 (2011.01); G06V 10/82 (2022.01)

CPC G06V 10/25 (2022.01) [G06N 3/08 (2013.01); G06T 7/344 (2017.01); G06T 7/73 (2017.01); G06T 17/00 (2013.01); G06T 19/20 (2013.01); G06V 10/82 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01); G06T 2219/2004 (2013.01)]

20 Claims

1. A method for predicting keypoints by a computing system, the method comprising:

receiving, by the computing system, data indicative of a plurality of images;

generating, by the computing system, predictions for keypoints of the plurality of images as 2D random variables, normally distributed with location (x, y) and standard deviation sigma;

training, by the computing system, a neural network to maximize a log-likelihood that samples from each of the predicted keypoints equal a ground truth by minimizing a sum of Gaussian negative log likelihoods; wherein the training comprises introducing a conjugate prior of a Gaussian distribution of uncertainty values for the predicted keypoints;

using the trained neural network to predict keypoints of a 3D image without generating a heatmap; and

based on the predicted keypoints of the 3D image, outputting a fitted 3D model for rendering on a display device.