| CPC G06F 18/256 (2023.01) [G06F 16/24556 (2019.01); G06F 16/24578 (2019.01); G06F 16/248 (2019.01); G06F 16/9032 (2019.01); G06F 16/9038 (2019.01); G06F 18/21355 (2023.01); G06F 18/24147 (2023.01); G06N 20/00 (2019.01)] | 21 Claims |

|
1. A data processing system comprising:
a processor; and
a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform:
receiving, via a query representation model, a search query for searching for one or more multimodal assets from among a plurality of candidate multimodal assets, wherein the one or more multimodal assets and the search query each includes multimodal content containing two or more different types of content including graphic or image content;
parsing, via the query representation model, the search query including the multimodal content;
identifying, based on the parsing, a first content type and a second content type in the search query, the second content type being a graphic or image content type;
transmitting the first content type to a first representation model to generate a first set of vector embeddings;
transmitting the second content type to a second representation model to generate a second set of vector embeddings;
transmitting the first and second sets of vector embeddings to a tensor generation unit to generate tensors based on the first and second sets of vector embeddings and to output a query tensor representation;
comparing, via a matching unit, the query tensor representation to a plurality of multimodal tensor representations, each of the plurality of multimodal tensor representations being a representation of one of the plurality of candidate multimodal assets; and
identifying, based on the comparing, at least one of the plurality of the candidate multimodal assets as a search result for the search query; and
providing the at least one of the plurality of the candidate multimodal assets for display as the search result.
|