US 11,687,720 B2
Named entity recognition visual context and caption data
Di Lu, Troy, NY (US); Leonardo Ribas Machado das Neves, Marina Del Rey, CA (US); Vitor Rocha de Carvalho, San Diego, CA (US); and Ning Zhang, Los Angeles, CA (US)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Snap Inc., Santa Monica, CA (US)
Filed on May 3, 2021, as Appl. No. 17/306,010.
Application 17/306,010 is a continuation of application No. 16/230,341, filed on Dec. 21, 2018, granted, now 11,017,173.
Claims priority of provisional application 62/610,051, filed on Dec. 22, 2017.
Prior Publication US 2021/0256213 A1, Aug. 19, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/295 (2020.01); G06N 20/00 (2019.01); G06N 3/08 (2023.01); G06F 40/30 (2020.01)
CPC G06F 40/295 (2020.01) [G06F 40/30 (2020.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
identifying, using one or more processors of a machine, a multimodal message that includes an image and a caption comprising words;
generating, using an attention neural network, a visual context vector from the caption and the image, the visual context vector emphasizing portions of the caption based on objects depicted in the image;
generating, using an entity recognition neural network, an indication that one or more words of the caption correspond to a named entity;
integrating, using a modulation layer, the visual context vector into the entity recognition neural network for each word in the caption; and
storing the one or more words as the named entity of the multimodal message.