US 11,748,057 B2
System and method for personalization in intelligent multi-modal personal assistants
Caleb Ryan Phillips, Toronto (CA)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Oct. 23, 2020, as Appl. No. 17/79,111.
Claims priority of provisional application 62/981,850, filed on Feb. 26, 2020.
Prior Publication US 2021/0264134 A1, Aug. 26, 2021
Int. Cl. G06V 40/00 (2022.01); G06V 40/16 (2022.01); G10L 21/028 (2013.01); G10L 21/0308 (2013.01); G10L 25/30 (2013.01); G06F 3/16 (2006.01); G06F 16/2458 (2019.01); G06F 16/248 (2019.01); G06N 20/00 (2019.01); G10L 17/22 (2013.01); G06V 10/80 (2022.01); G06V 20/10 (2022.01)
CPC G06F 3/167 (2013.01) [G06N 20/00 (2019.01); G06V 10/80 (2022.01); G06V 20/10 (2022.01); G06V 40/172 (2022.01); G10L 17/22 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, by a user device, comprising:
receiving an input from a user;
obtaining relation information of the user, audio information of the user obtained via a microphone of the user device, and video information of the user obtained via camera of the user device;
identifying the user based on the audio information and the video information of the user and a set of facial embeddings and speech embeddings that is correlated with the user, the set of facial embeddings and speech embeddings being generated using a facial embedding model, a speech embedding model, and a sound source localization model; and
performing an action based on the input and the relation information of the user,
wherein the sound source localization model is a model that is configured to determine the video information and the audio information that belongs to a same user.