US 12,293,577 B2
Systems and methods for image processing using natural language
Seunghyun Yoon, San Jose, CA (US); Trung Huu Bui, San Jose, CA (US); Franck Dernoncourt, San Jose, CA (US); Hyounghun Kim, Chapel Hill, NC (US); and Doo Soon Kim, San Jose, CA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Feb. 18, 2022, as Appl. No. 17/651,771.
Prior Publication US 2023/0267726 A1, Aug. 24, 2023
Int. Cl. G06V 10/86 (2022.01); G06F 40/284 (2020.01); G06N 3/044 (2023.01); G06N 3/088 (2023.01); G06V 10/77 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/86 (2022.01) [G06F 40/284 (2020.01); G06N 3/044 (2023.01); G06N 3/088 (2013.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable medium comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising:
receiving, via a user interface, an utterance indicating a request associated with an image;
generating an utterance feature vector based on utterance features extracted from the utterance;
accessing the image corresponding to the utterance;
generating a visual feature vector by extracting bounding box features and visual features extracted from the image, combining the bounding box features and the visual features, and applying positional encoding to the combined bounding box features and visual features;
generating a concept feature vector based on concept features extracted from the image;
generating a first fused feature vector based on aligning the utterance feature vector and the visual feature vector;
generating a second fused feature vector based on aligning the first fused feature vector and a current command feature vector; and
generating a segment of a predicted executable command corresponding to the request associated with image based on the second fused feature vector, the current command feature vector, the utterance feature vector, and the concept feature vector.