US 12,307,748 B2
Category discovery using machine learning
Xuhui Jia, Seattle, WA (US); and Kai Han, Bristol (GB)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Apr. 26, 2022, as Appl. No. 17/729,878.
Prior Publication US 2023/0343073 A1, Oct. 26, 2023
Int. Cl. G06V 10/82 (2022.01); G06V 10/42 (2022.01); G06V 10/44 (2022.01); G06V 10/74 (2022.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01)
CPC G06V 10/774 (2022.01) [G06V 10/42 (2022.01); G06V 10/44 (2022.01); G06V 10/761 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)] 21 Claims
OG exemplary drawing
 
1. A method of training a neural network having a plurality of network parameters to process a network input representing an input image and to generate a network output representing a predicted class, from a set of classes, to which the input image belongs, the method comprising:
in a local feature extraction subnetwork:
generating one or more first local feature tensors from a first training image from a training image set, wherein each first local feature tensor corresponds to a particular spatial region of the first training image;
obtaining, for each of one or more previous training images previously processed by the neural network, one or more previous local feature tensors generated from the previous training image, wherein each previous local feature tensor corresponds to a particular spatial region of the respective previous training image;
generating a first similarity tensor representing a similarity between the first local feature tensors of the first training image and the previous local feature tensors of the previous training images;
obtaining, for a second training image from the training image set, a second similarity tensor representing a similarity between (i) one or more second local feature tensors generated from the second training image and (ii) the previous local feature tensors of the previous training images;
processing, using the neural network, a first network input determined from the first training image to generate a first training output representing a class prediction for the first training image;
obtaining a second training output representing a class prediction for the second training image, the second training output having been generated by the neural network in response to processing a second network input determined from the second training image; and
generating an update to the network parameters of the neural network from (i) a similarity between the first similarity tensor and the second similarity tensor and (ii) a similarity between the first training output and the second training output;
wherein a neural network trained on the local features generates a network output representing a predicted class, and the network output has a first precision that is greater than a second precision of a network output of a neural network trained only on global features.