US 12,131,466 B2
	Method and system for domain knowledge augmented multi-head attention based robust universal lesion detection
Manu Sheoran, Noida (IN); Meghal Dani, Noida (IN); Monika Sharma, Noida (IN); and Lovekesh Vig, Noida (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Jun. 10, 2022, as Appl. No. 17/806,402.
Claims priority of application No. 202121050603 (IN), filed on Nov. 3, 2021.
Prior Publication US 2023/0177678 A1, Jun. 8, 2023
Int. Cl. G06T 7/00 (2017.01); G06V 10/25 (2022.01); G06V 10/26 (2022.01); G06V 10/44 (2022.01); G06V 10/778 (2022.01)

CPC G06T 7/0012 (2013.01) [G06V 10/25 (2022.01); G06V 10/26 (2022.01); G06V 10/454 (2022.01); G06V 10/778 (2022.01); G06T 2207/10081 (2013.01); G06T 2207/30096 (2013.01); G06V 2201/032 (2022.01); G06V 2201/07 (2022.01)]

12 Claims

1. A processor implemented method for domain knowledge based Universal Lesion Detection (ULD), the method comprising:

receiving and preprocessing, by one or more hardware processors, a slice set, from amongst a plurality of slices of a Computed Tomography (CT) scan of a subject, the slice set comprising i) a key slice of a subject's Region of Interest (subRoI) and ii) a superior slice and an inferior slice in neighborhood of the key slice of the subRoI;

creating, by the one or more hardware processors, a 3-Dimensional context of the subRoI by defining a 3-channel input image based on each preprocessed slice of the slice set;

windowing using a windowing technique, by the one or more hardware processors, each preprocessed slice of the slice set of the subRoI in accordance with a plurality of heuristically determined organ agnostic Hounsfield Unit (HU) windows with varying pixel intensities and highlighting varying organs, wherein a set of HU windowed images is created for each pre-processed slice;

generating, by the one or more hardware processors, a feature map block corresponding to each of the organ agnostic HU windows by extracting a plurality of feature maps, using a shared feature extractor comprising a feature pyramid network (FPN) applied on HU windowed images from amongst the set of HU windowed images of each of the preprocessed slice that fall under same window range of an organ agnostic HU window, wherein the each feature map block corresponding to each of the organ agnostic HU window comprises a set of sub-level feature maps at a plurality of FPN levels, with each FPN level having receptive fields of different resolution to capture features of one or more lesions having varying sizes;

generating, by the one or more hardware processors, a fused feature map block (F′) using a convolution augmented attention module that applies feature fusion on the feature map block corresponding to each of the organ agnostic HU window, wherein the convolution augmented attention module:

i) concatenates the set of sub-level feature maps of the feature map block for each of the organ agnostic HU window to obtain a concatenated multi-view feature map block,

ii) utilizes a combination of a learnable 2D convolution layer for pooling multi-view features and a multi-headed self-attention module providing channel and spatial attention, wherein the learnable 2D convolution layer is augmented in parallel to the multi-headed self-attention module to reduce computational burden of the convolution augmented attention module, and number of output channels of the convolution augmented attention module are divided between the learnable 2D convolution layer and the multi-headed self-attention module based on allowed computational memory,

iii) convolutes down each of the sub-level feature maps to a lower dimension using the multi-headed self-attention module to provide a compressed channel information, and

iv) utilizes the compressed channel information to generate Key, Query and Value matrix to match a predefined number of output channels, wherein outputs from the learnable 2D convolution layer and the multi-headed self-attention module are concatenated; and

predicting, by the one or more hardware processors, one or more lesions of varying sizes in the preprocessed slice set by analyzing the fused feature map block (F′) using a Region proposal Network (RPN), wherein the RPN generates bounding boxes and corresponding probability values across the one or more lesions of varying sizes from amongst a set of customized lesion specific anchors sizes.