| CPC G06V 10/761 (2022.01) [G06N 3/0455 (2023.01); G06N 3/08 (2013.01); G06T 7/174 (2017.01); G06V 10/26 (2022.01); G06V 10/28 (2022.01); G06V 10/34 (2022.01); G06V 10/443 (2022.01); G06V 10/751 (2022.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06T 2207/20036 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30176 (2013.01)] | 20 Claims |

|
1. A method of matching different depictions of objects in image data, the method comprising:
receiving first image data comprising a first representation of an object;
receiving second image data comprising a second representation of the object, wherein the second representation of the object includes a different view of the object in the second image data relative to the first image data;
processing, by an encoder network, the first image data and the second image data by interleaving signals from the first image data and the second image data to generate a first feature map representing the first image data and a second feature map representing the second image data;
concatenating the first feature map and the second feature map to generate a combined feature map, wherein the combined feature map spatially overlaps common features from the first feature map and the second feature map;
computing a set of correlation scores for the combined feature map;
determining, using the set of correlation scores, a co-salient region of the combined feature map;
generating, by inputting the combined feature map into a first segmentation head, a first segmentation mask representing foreground image data for the co-salient region detected in the first image data;
generating, by inputting the combined feature map into a second segmentation head, a second segmentation mask representing foreground image data for the co-salient region detected in the second image data;
filtering the first feature map using the first segmentation mask to generate first data comprising the first representation of the object as represented in the first feature map;
filtering the second feature map using the second segmentation mask to generate second data comprising the second representation of the object as represented in the second feature map;
comparing the first data and the second data using a cosine similarity metric to generate a similarity matrix; and
determining, using a first convolutional neural network and the similarity matrix, that the object represented in the first image data matches the object represented in the second image data.
|