US 11,853,399 B2
	Multimodal sentiment classification
Jianfei Yu, Los Angeles, CA (US); Luis Carlos Dos Santos Marujo, Culver City, CA (US); Venkata Satya Pradeep Karuturi, Marina del Rey, CA (US); Leonardo Ribas Machado das Neves, Marina Del Rey, CA (US); Ning Xu, Irvine, CA (US); and William Brendel, Los Angeles, CA (US)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Snap Inc., Santa Monica, CA (US)
Filed on Nov. 29, 2022, as Appl. No. 18/059,928.
Application 18/059,928 is a continuation of application No. 16/552,393, filed on Aug. 27, 2019, granted, now 11,551,042.
Claims priority of provisional application 62/723,412, filed on Aug. 27, 2018.
Prior Publication US 2023/0120887 A1, Apr. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/30 (2020.01); G06F 18/2431 (2023.01); G06N 3/08 (2023.01); G06N 20/20 (2019.01); G06F 40/284 (2020.01); G06N 3/045 (2023.01); G10L 25/30 (2013.01); G06F 40/295 (2020.01); G06F 40/279 (2020.01); G06F 40/289 (2020.01); G10L 15/18 (2013.01); G10L 15/06 (2013.01)

CPC G06F 18/2431 (2023.01) [G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 20/20 (2019.01); G06F 40/279 (2020.01); G06F 40/289 (2020.01); G06F 40/295 (2020.01); G10L 15/06 (2013.01); G10L 15/1807 (2013.01); G10L 15/1815 (2013.01); G10L 15/1822 (2013.01); G10L 25/30 (2013.01)]

20 Claims

1. A method comprising:

identifying a multimodal message comprising an image and a plurality of terms, the plurality of terms comprising an entity term and non-entity terms;

generating, using a convolutional neural network (CNN), an image representation from the image in the multimodal message;

generating, using a first recurrent neural network, a left representation from terms to the left of the entity term;

generating, using a second recurrent neural network, a target entity representation from the entity term;

generating, using a third recurrent neural network, a right representation from terms to the right of the entity term; and

generating a sentiment classification by combining the image representation, the left representation, the target entity representation, and the right representation.