US 12,437,160 B2
	Image-to-text large language models (LLM)
Aliaksei Mikhailiuk, London (GB); Tianxiang Gao, Bothell, WA (US); Sergey Smetanin, London (GB); Pavel Savchenkov, London (GB); Hee Hun Kim, Los Angeles, CA (US); Neha Yadav, Seattle, WA (US); and Bingqian Lu, Ontario, CA (US)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Snap Inc., Santa Monica, CA (US)
Filed on Sep. 6, 2023, as Appl. No. 18/462,255.
Prior Publication US 2025/0077794 A1, Mar. 6, 2025
Int. Cl. G06F 40/40 (2020.01); G06F 3/0481 (2022.01); G06F 3/04845 (2022.01); G06V 10/774 (2022.01); G06V 20/40 (2022.01); G06V 20/50 (2022.01)

CPC G06F 40/40 (2020.01) [G06F 3/0481 (2013.01); G06F 3/04845 (2013.01); G06V 10/774 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/50 (2022.01)]

20 Claims

1. A system comprising:

at least one processor; and

at least one memory component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:

determining participation in an interaction function by a first user of an interaction system;

identifying an image associated with the participation;

processing data associated with the image using a first machine learning model to identify one or more features within the image;

generating a prompt based on the identified one or more features;

identifying one or more instructions for a second machine learning model;

processing data associated with a combination of the prompt and the identified one or more instructions using the second machine learning model to generate a textual response to the image, wherein the second machine learning model comprises a Large Language Model (LLM); and

causing display of the textual response within the interaction function to the first user.