US 12,127,726 B2
	System and method for robust image-query understanding based on contextual features
Yu Wang, Bellevue, WA (US); Yilin Shen, Santa Clara, CA (US); and Hongxia Jin, San Jose, CA (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Apr. 15, 2021, as Appl. No. 17/231,958.
Claims priority of provisional application 63/017,887, filed on Apr. 30, 2020.
Prior Publication US 2021/0342624 A1, Nov. 4, 2021
Int. Cl. G06V 10/70 (2022.01); A47L 9/28 (2006.01); A47L 11/40 (2006.01); G06F 18/214 (2023.01); G06V 20/10 (2022.01)

CPC A47L 9/2805 (2013.01) [A47L 11/40 (2013.01); G06F 18/214 (2023.01); G06V 10/768 (2022.01); G06V 20/10 (2022.01); A47L 2201/06 (2013.01)]

25 Claims

1. A method comprising:

obtaining, using at least one processor of a robot, an image-query understanding model;

obtaining, using the at least one processor, an image and a user query associated with the image, wherein the image comprises a target image area and the user query comprises a target phrase, and wherein the target image area is marked on the image by a user and the target phrase is identified within the user query by the user during operation of the robot; and

retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model;

wherein retraining the image-query understanding model comprises determining (i) one or more query-level contextual features, (ii) one or more question features, (iii) one or more image-level contextual features, and (iv) one or more post-processed image features based on at least one weighted attention function.