US 12,437,673 B2
System and method for bidirectional automatic sign language translation and production
Daryl Luciano Peralta, Metro Manila (PH); Shakira Arguelles, Rizal (PH); and Williard Joshua Decena Jose, Metro Manila (PH)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Gyeonggi-Do (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Jan. 26, 2023, as Appl. No. 18/101,904.
Application 18/101,904 is a continuation of application No. PCT/KR2023/000115, filed on Jan. 4, 2023.
Claims priority of application No. 12022050141 (PH), filed on Apr. 4, 2022.
Prior Publication US 2023/0316952 A1, Oct. 5, 2023
Int. Cl. G09B 21/00 (2006.01); G06T 9/00 (2006.01); G06T 13/40 (2011.01); G10L 13/00 (2006.01)
CPC G09B 21/009 (2013.01) [G06T 9/00 (2013.01); G06T 13/40 (2013.01); G10L 13/00 (2013.01)] 6 Claims
OG exemplary drawing
 
1. A system for bidirectional automatic sign language translation and production, the system comprising:
at least one communication-capable device in communication with another communication-capable device;
at least one visual sensor disposed on the at least one communication-capable device for acquiring input visual feed;
at least one audio sensor disposed on the at least one communication-capable device for acquiring input audio feed;
at least one text interface disposed on the at one least communication-capable device for acquiring input text feed;
the at least one communication-capable device further comprising:
at least one visual display; and
at least one auditory display;
a translation block for processing the input visual feed, the translation block comprising:
an input processing module;
a frame encoder in communication with the input processing module;
a sequence encoder in communication with the frame encoder;
a word-level decoder in communication with the sequence encoder;
a sentence-level decoder in communication with the sequence encoder;
a text-to-speech module in communication with the sentence-level decoder; and
a first output processor in communication with the word-level decoder, the sentence-level decoder, and the text-to-speech module;
a production block for processing the audio feed and text feed, the production block comprising:
a speech recognition module;
an input processor in communication with the speech recognition module;
an input-to-pose generator in communication with the input processor the input-to-pose genera of figured to e f poses;
a pose sequence buffer in communication with the input-to-pose generator, the pose sequence buffer being configured to store the sequence of poses, check when the pose sequence buffer is empty, and generate an end-of-pose signal indicating an end of the sequence of poses in the pose sequence buffer; and
a second output processor in communication with the pose sequence buffer to receive the sequence of poses, the second output processor being configured to receive the end-of-pose signal;
wherein a production model in the production block and a translation model in the translation block are trained simultaneously by machine learning methods.