US 12,444,097 B1
	2D-3D medical image registration method, device, computer device, and storage medium
Ying Hu, Guangdong (CN); Peijie Jiang, Guangdong (CN); Shaolin Lu, Guangdong (CN); Yefeng Liang, Guangdong (CN); Yuanyuan Yang, Guangdong (CN); and Lihai Zhang, Guangdong (CN)
Assigned to SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES, Shenzhen (CN)
Filed by SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES, Guangdong (CN)
Filed on Dec. 12, 2024, as Appl. No. 18/977,937.
Application 18/977,937 is a continuation of application No. PCT/CN2024/099919, filed on Jun. 18, 2024.
Int. Cl. G06T 11/00 (2006.01); G06T 7/33 (2017.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)

CPC G06T 11/003 (2013.01) [G06T 7/33 (2017.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06T 2207/10072 (2013.01); G06T 2207/10124 (2013.01)]

6 Claims

1. A 2D-3D medical image registration method, comprising:

obtaining a preoperative CT image and an intraoperative Xray image of a target bone block; inputting the preoperative CT image and the intraoperative Xray image into a regression network based on deep learning; and roughly estimating an initial spatial pose of the target bone block through the regression network;

adjusting a projection of the preoperative CT image based on the initial spatial pose to generate a digitally reconstructed radiograph (DRR) image;

inputting the DRR image, the intraoperative Xray image, and the preoperative CT image into a pre-trained corresponding point relationship estimation network, estimating a feature point corresponding relationship between the DRR image and the Xray image by using the corresponding point relationship estimation network, and optimizing and updating the initial spatial pose of the target bone block according to the feature point corresponding relationship to obtain an optimized spatial pose;

wherein the regression network comprises a CNN module, a transformer module and a feature fusion module, the CNN module comprises a feature extractor, two multi-layer perceptrons and a singular value (SVD) decomposition module, the feature extractor takes first six layers of the efficientnet-B0 network as a baseline method, the two multi-layer perceptrons are respectively a rotation regression head and a translation regression head, and both the rotation regression head and the translation regression head are three-layer multi-layer perceptrons;

wherein roughly estimating an initial spatial pose of the target bone block through the regression network, comprises:

inputting the intraoperative Xray image into a CNN module to extract a first feature;

inputting the first feature into two multi-layer perceptrons, the two multi-layer perceptrons respectively outputting a rotation component and a translation component of the spatial pose of the target bone block through the rotation regression head and the translation regression head;

converting an output of the rotational regression head into a matrix M of a set size, and inputting the matrix M into the SVD decomposition module to perform SVD decomposition to obtain UΣV^T, mapping a matrix U and a matrix V to SO(3) space, wherein SO(3) space refers to three dimensional rotations in Euclidean space, and obtaining a rotational component R of the spatial pose of the target bone block, wherein,

R=UΣ′V^T, where Σ′=diag(1, . . . ,1,det(UV^T)),

inputting the intraoperative Xray image into the transformer module, respectively a performing 3D position encoding and an image block encoding on the input image through the transformer module, and adding a 3D position encoding result and an image block encoding result to obtain a second feature of the intraoperative Xray image;

inputting the first feature and the second feature into the feature fusion module for modulation, and outputting the initial spatial pose of the target bone block;

wherein a 3D position encoding method of the transformer module comprises:

encoding a vertical position, a horizontal position and edge information of the intraoperative Xray image by using a sine position encoding, and

extending a two-dimensional spatial coordinate (x, y) of each pixel into a three-dimensional coordinate (x, y, e) by adding the edge information; wherein the coding formula is defined as follows:

wherein, PE represents three-dimensional encoder information, pos^Drepresents a position of an image block in dimension D, variable i represents i-th position encoding information.