CPC H04L 51/10 (2013.01) [G06N 3/08 (2013.01); G06V 10/40 (2022.01); G06V 10/82 (2022.01); G06V 30/19147 (2022.01); G06V 30/19173 (2022.01); H04L 67/10 (2013.01)] | 16 Claims |
1. A method comprising:
identifying a multimodal message comprising an image and a string, the string comprising one or more words;
generating, using an entity neural network, an indication that at least one of the one or more words is a named entity, the entity neural network comprising an attention neural network trained to increase emphasis on one of a plurality of embeddings based on relevance to the multimodal message, the plurality of embeddings comprising an image embedding from the image and a string embedding corresponding to the string in the multimodal message;
storing, using one or more processors of a machine, the named entity as being associated with the multimodal message; and
generating a combined embedding from the image embedding and the string embedding using the attention neural network,
wherein the entity neural network comprises a classification neural network that processes the combined embedding generated by the attention neural network.
|