US 12,482,253 B2
	Using grounded rationales to improve visual reasoning
Apratim Bhattacharyya, San Diego, CA (US); Roland Memisevic, Toronto (CA); Sunny Praful Kumar Panchal, Toronto (CA); Reza Pourreza, San Diego, CA (US); Mingu Lee, San Diego, CA (US); and Pulkit Madan, Toronto (CA)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Nov. 2, 2023, as Appl. No. 18/500,986.
Claims priority of provisional application 63/467,159, filed on May 17, 2023.
Prior Publication US 2024/0386712 A1, Nov. 21, 2024
Int. Cl. G06V 10/82 (2022.01); G06F 40/10 (2020.01); G06F 40/284 (2020.01)

CPC G06V 10/82 (2022.01) [G06F 40/10 (2020.01); G06F 40/284 (2020.01)]

30 Claims

9. A processor-implemented method performed by at least one processor, the processor-implemented method comprising:

receiving, by a first artificial neural network (ANN), an interleaved sequence of images and textual information;

extracting, by the first ANN, grid features of the images of the interleaved sequence of the images and the textual information to generate a representation of the interleaved sequence of the images and the textual information based on the grid features;

mapping, by a second ANN, the grid features to a textual domain;

extracting, by the second ANN, visual information of the interleaved sequence of the images and the textual information based on the grid features in the textual domain; and

determining, by the second ANN, a rationale based on the visual information of the interleaved sequence of images and the textual information, the visual information comprising one or more lower-level surrogate tasks.