CPC G06V 40/28 (2022.01) [G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06T 7/248 (2017.01); G06T 7/73 (2017.01); G06V 20/46 (2022.01); G06V 40/107 (2022.01); G06V 40/168 (2022.01); G09B 21/009 (2013.01); H04N 5/278 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30196 (2013.01)] | 20 Claims |
1. A computer-implemented method comprising:
receiving an input video comprising a representation of one or more sign language gestures;
extracting landmark coordinates associated with a signer represented in the input video;
determining derivative information from the landmark coordinates;
generating a vector representation of the signer based on the landmark coordinates and the derivative information;
processing, by a gesture detection model, the generated vector representation of the signer to identify sign language gestures;
generating sentences based on the identified sign language gestures; and
encoding the generated sentences and timestamps associated with the identified sign language gestures into a subtitle track, wherein the subtitle track is synced to the input video for display of the generated sentences at appropriate times during playback based on the timestamps.
|