US 12,277,687 B2
	Systems, methods, and apparatuses for the use of transferable visual words for AI models through self-supervised learning in the absence of manual labeling for the processing of medical imaging
Fatemeh Haghighi, Tempe, AZ (US); Mohammad Reza Hosseinzadeh Taher, Tempe, AZ (US); Zongwei Zhou, Tempe, AZ (US); and Jianming Liang, Tempe, AZ (US)
Assigned to Arizona Board of Regents on Behalf of Arizona State University, Scottsdale, AZ (US)
Filed by Arizona Board of Regents on Behalf of Arizona State University, Scottsdale, AZ (US)
Filed on Apr. 30, 2021, as Appl. No. 17/246,032.
Claims priority of provisional application 63/110,265, filed on Nov. 5, 2020.
Claims priority of provisional application 63/018,335, filed on Apr. 30, 2020.
Prior Publication US 2021/0343014 A1, Nov. 4, 2021
Int. Cl. G06T 5/77 (2024.01); G06T 3/04 (2024.01); G06T 7/00 (2017.01); G06V 10/25 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)

CPC G06T 5/77 (2024.01) [G06T 3/04 (2024.01); G06T 7/0014 (2013.01); G06V 10/25 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)]

15 Claims

1. A system comprising:

a memory to store instructions;

a processor to execute the instructions stored in the memory;

a receive interface to receive a plurality of unlabeled medical images obtained from a plurality of human patients;

wherein the system is specially configured to perform self-supervised learning for an artificial intelligence (AI) model having a trained encoder-decoder structure with skip connections in between and a classification head at an output of the encoder portion, and preceding a decoder portion, of the trained encoder-decoder structure, by executing the instructions via the processor for:

performing a self-discovery operation that crops two-dimensional (2D) patches or crops three-dimensional (3D) cubes representing a plurality of unique anatomical patterns each reoccurring at a respective one of a plurality of unique fixed coordinates across the plurality of unlabeled medical images, and assigns one of a plurality of pseudo labels to each of the cropped 2D patches or 3D cubes based on their respective unique fixed coordinates across the plurality of unlabeled medical images;

transforming each of the cropped 2D patches or the cropped 3D cubes to generate transformed 2D anatomical patterns or transformed 3D anatomical patterns (hereinafter “the transformed anatomical patterns”);

performing a self-classification operation on the transformed anatomical patterns to learn semantically enriched visual representations of a human body derived from the plurality of unique anatomical patterns each reoccurring at the respective one of the plurality of unique fixed coordinates across the plurality of unlabeled medical images by formulating a multi-class classification task on the plurality of pseudo labels;

performing a self-restoration operation by recovering anatomical patterns from the transformed anatomical patterns to learn different sets of the semantically enriched visual representations of the human body.