CPC A47L 9/2805 (2013.01) [A47L 11/40 (2013.01); G06F 18/214 (2023.01); G06V 10/768 (2022.01); G06V 20/10 (2022.01); A47L 2201/06 (2013.01)] | 25 Claims |
1. A method comprising:
obtaining, using at least one processor of a robot, an image-query understanding model;
obtaining, using the at least one processor, an image and a user query associated with the image, wherein the image comprises a target image area and the user query comprises a target phrase, and wherein the target image area is marked on the image by a user and the target phrase is identified within the user query by the user during operation of the robot; and
retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model;
wherein retraining the image-query understanding model comprises determining (i) one or more query-level contextual features, (ii) one or more question features, (iii) one or more image-level contextual features, and (iv) one or more post-processed image features based on at least one weighted attention function.
|