CPC G06T 7/246 (2017.01) [G06F 16/81 (2019.01); G06F 18/214 (2023.01); G06F 40/143 (2020.01); G06F 40/279 (2020.01); G06N 3/08 (2013.01); G06T 7/11 (2017.01); G06T 7/20 (2013.01); G06T 11/00 (2013.01); G06V 10/25 (2022.01); G06V 10/7753 (2022.01); G10L 15/26 (2013.01); G10L 25/51 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20132 (2013.01); G06T 2210/12 (2013.01)] | 17 Claims |
1. A method of generating and labelling reference images, the method comprising:
tracking, by an image generation and labelling device, a plurality of highlighted objects in a set of input images along with audio data associated with the plurality of highlighted objects;
cropping, by the image generation and labelling device, each of the plurality of highlighted objects from each of the set of images based on tracking;
contemporaneously capturing, by the image generation and labelling device, an audio clip associated with each of the plurality of highlighted objects from the audio data based on tracking, wherein contemporaneously capturing comprises:
recording a portion of the audio data when an object is highlighted, while traversing from one highlighted object to another, as the audio clip; and
associating the audio clip with the object that is highlighted;
labelling, by the image generation and labelling device, each of the plurality of highlighted objects based on a text data generated from the audio clip associated with each of the plurality of objects to generate a labelled reference image; and
generating, by the image generation and labelling device, an Extensible Markup Language (XML) file corresponding to the labelled reference image.
|