CPC G06V 20/00 (2022.01) [G06F 18/213 (2023.01); G06F 18/241 (2023.01); G06V 10/751 (2022.01); G06V 40/10 (2022.01)] | 28 Claims |
1. A method, comprising:
receiving an image;
extracting a first set of human-object features from multiple positions of the image based on a learned embedding of fixed positional information;
predicting human-object pairs based on the extracted first set of human-object features; and
determining a human-object interaction based on a set of candidate interactions and the predicted human-object pairs, using a corresponding scoring matrix, the scoring matrix indicating an alignment between a candidate interaction of the set of candidate interactions and a human-object pair of the predicted human object pairs.
|