| CPC G06V 10/7747 (2022.01) [G06T 7/20 (2013.01); G06V 10/764 (2022.01)] | 20 Claims |

|
1. A computer implemented method of training machine learning (ML) models to classify activity of objects, comprising:
using at least one processor for:
selecting a set of frames depicting at least one object from at least one video sequence comprising a plurality of consecutive frames;
associating the at least one object with each of a plurality of pixels included in a bounding box of the at least one object identified in each of the frames of the set;
computing a motion mask for each frame of the set indicating whether each pixel associated with the at least one object in a respective bounding box in a respective frame is changed or unchanged compared to a corresponding pixel in a preceding frame of the set;
augmenting an image of the at least one object in each frame of a subset of frames selected from the set of frames to depict only the changed pixels within said respective bounding box and associated with the at least one object by cutting out the unchanged pixels associated with the at least one object; and
training at least one ML model, using the set of frames, to classify at least one activity of the at least one object based on identified moving portions of said at least one object according to said depicted changed pixels;
wherein the motion mask computation and pixel augmentation are performed to improve ML model training accuracy by emphasizing temporally changing object features.
|