CPC G06V 10/762 (2022.01) [G06N 3/045 (2023.01); G06V 20/58 (2022.01)] | 25 Claims |
1. A processor-implemented method with neural network training, comprising:
determining first backbone feature data corresponding to each input data by applying, to a first neural network model, two or more sets of the input data of the same scene, respectively;
determining second backbone feature data corresponding to each input data by applying, to a second neural network model, the two or more sets of the input data, respectively;
diversifying, using plural projection models and plural drop models, a view of the first backbone feature data output from the first neural network model and a view of the second backbone feature data output from the second neural network model, including:
determining, from the first backbone feature data, projection-based first embedded data using a first projection model and dropout-based first view data using a first drop model;
determining, from the second backbone feature data, projection-based second embedded data using a second projection model and dropout-based second view data using a second drop model; and
training either one or both of the first neural network model and the second neural network model based on a loss determined based on a combination of any two or more of the first embedded data, the first view data, the second embedded data, the second view data, and an embedded data clustering result.
|