| CPC G10L 25/84 (2013.01) [G06F 3/167 (2013.01); G06N 3/08 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01); G10L 2015/226 (2013.01)] | 20 Claims |

|
1. A method comprising:
receiving, at a server, a first message indicating an image and content data associated with a non-verbal audio interaction;
generating, at the server, a second message comprising the image, the content data associated with the non-verbal audio interaction, and an indication to perform the non-verbal audio interaction; and
sending the second message to a user device, the user device being configured to: display the image and the indication on a display of the user device, generate sound data from a microphone of the user device while the message is being displayed on the user device, generate a sound classification by applying a convolutional neural network that is trained to detect non-verbal sounds to the sound data, determine, using the sound classification, that the sound data corresponds to the non-verbal audio interaction, and display the content data on the display in response to determining that the sound data corresponds to the non-verbal audio interaction.
|