US 11,989,890 B2
Method and system for generating and labelling reference images
Siva Sakthivel, Chennai (IN); Naveen Subramanian, Chennai (IN); Yuvarajan Shanmugasundaram, Chennai (IN); Sankareswari Amudhasidhanandham, Chennai (IN); and Akhilesh Chandra Singh, Noida (IN)
Assigned to HCL Technologies Limited, New Delhi (IN)
Filed by HCL Technologies Limited, New Delhi (IN)
Filed on Feb. 18, 2021, as Appl. No. 17/178,644.
Prior Publication US 2021/0303924 A1, Sep. 30, 2021
Int. Cl. G06T 7/246 (2017.01); G06F 16/81 (2019.01); G06F 18/214 (2023.01); G06F 40/143 (2020.01); G06F 40/279 (2020.01); G06N 3/08 (2023.01); G06T 7/11 (2017.01); G06T 7/20 (2017.01); G06T 11/00 (2006.01); G06V 10/25 (2022.01); G06V 10/774 (2022.01); G10L 15/26 (2006.01); G10L 25/51 (2013.01)
CPC G06T 7/246 (2017.01) [G06F 16/81 (2019.01); G06F 18/214 (2023.01); G06F 40/143 (2020.01); G06F 40/279 (2020.01); G06N 3/08 (2013.01); G06T 7/11 (2017.01); G06T 7/20 (2013.01); G06T 11/00 (2013.01); G06V 10/25 (2022.01); G06V 10/7753 (2022.01); G10L 15/26 (2013.01); G10L 25/51 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20132 (2013.01); G06T 2210/12 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method of generating and labelling reference images, the method comprising:
tracking, by an image generation and labelling device, a plurality of highlighted objects in a set of input images along with audio data associated with the plurality of highlighted objects;
cropping, by the image generation and labelling device, each of the plurality of highlighted objects from each of the set of images based on tracking;
contemporaneously capturing, by the image generation and labelling device, an audio clip associated with each of the plurality of highlighted objects from the audio data based on tracking, wherein contemporaneously capturing comprises:
recording a portion of the audio data when an object is highlighted, while traversing from one highlighted object to another, as the audio clip; and
associating the audio clip with the object that is highlighted;
labelling, by the image generation and labelling device, each of the plurality of highlighted objects based on a text data generated from the audio clip associated with each of the plurality of objects to generate a labelled reference image; and
generating, by the image generation and labelling device, an Extensible Markup Language (XML) file corresponding to the labelled reference image.