US 12,155,496 B2
Vehicle video conferencing system
Marcin Szelest, Cracow (PL); Kamil Klimowicz, Klaj (PL); Dariusz Marchewka, Cracow (PL); and Pawel Markiewicz, Cracow (PL)
Assigned to APTIV TECHNOLOGIES LIMITED, Schaffhausen (CH)
Filed by APTIV TECHNOLOGIES LIMITED, Dublin (IE)
Filed on Sep. 8, 2022, as Appl. No. 17/940,235.
Prior Publication US 2023/0093198 A1, Mar. 23, 2023
Int. Cl. H04L 12/18 (2006.01); G06V 10/82 (2022.01); G06V 20/59 (2022.01); G06V 40/16 (2022.01)
CPC H04L 12/1822 (2013.01) [G06V 10/82 (2022.01); G06V 20/59 (2022.01); G06V 40/161 (2022.01); G06V 40/171 (2022.01); G06V 40/174 (2022.01)] 10 Claims
OG exemplary drawing
 
1. A system for in-vehicle video conferencing including:
an imaging system including at least one image sensor configured to capture low-resolution images of a face of a user in the vehicle, and configured to determine data of facial characteristic points of the user by processing the captured low-resolution images;
an acquisition module for acquiring the determined data of facial characteristic points of the user in the vehicle during a video conference call in which the user participates;
a video synthesizer for producing artificial high-resolution video images of the face of the user in the vehicle from the acquired data of facial characteristic points;
a communication device for transmitting the artificial high-resolution video images of the face of the user through a communication network in the video conference call, wherein the video synthesizer includes a deepfake machine learning model for generating the artificial high-resolution video images of the face of the user from the acquired facial characteristic point data of the user, said machine learning model being preliminary trained to learn connection between facial characteristic points of the user acquired over time and high-resolution video images of the real face of the user;
a display unit for displaying a plurality of facial movements and requesting the user to repeat the displayed facial movements, during a learning process;
a first camera device for capturing images of the user repeating the displayed facial movements and determining facial characteristic points from the captured images;
a second camera device for simultaneously capturing facial videos of the face of the user repeating the displayed facial movements; and
a training data generator, connected to the first camera device and the second camera device, that generates a training dataset including facial characteristic points determined by the first camera device, as input training data, and corresponding facial videos determined by the second camera device, as output training data, and provides the training dataset to the machine learning model of the video synthesizer so as to fit said machine learning model to the user.