US 12,437,673 B2
	System and method for bidirectional automatic sign language translation and production
Daryl Luciano Peralta, Metro Manila (PH); Shakira Arguelles, Rizal (PH); and Williard Joshua Decena Jose, Metro Manila (PH)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Gyeonggi-Do (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Jan. 26, 2023, as Appl. No. 18/101,904.
Application 18/101,904 is a continuation of application No. PCT/KR2023/000115, filed on Jan. 4, 2023.
Claims priority of application No. 12022050141 (PH), filed on Apr. 4, 2022.
Prior Publication US 2023/0316952 A1, Oct. 5, 2023
Int. Cl. G09B 21/00 (2006.01); G06T 9/00 (2006.01); G06T 13/40 (2011.01); G10L 13/00 (2006.01)

CPC G09B 21/009 (2013.01) [G06T 9/00 (2013.01); G06T 13/40 (2013.01); G10L 13/00 (2013.01)]

6 Claims

1. A system for bidirectional automatic sign language translation and production, the system comprising:

at least one communication-capable device in communication with another communication-capable device;

at least one visual sensor disposed on the at least one communication-capable device for acquiring input visual feed;

at least one audio sensor disposed on the at least one communication-capable device for acquiring input audio feed;

at least one text interface disposed on the at one least communication-capable device for acquiring input text feed;

the at least one communication-capable device further comprising:

at least one visual display; and

at least one auditory display;

a translation block for processing the input visual feed, the translation block comprising:

an input processing module;

a frame encoder in communication with the input processing module;

a sequence encoder in communication with the frame encoder;

a word-level decoder in communication with the sequence encoder;

a sentence-level decoder in communication with the sequence encoder;

a text-to-speech module in communication with the sentence-level decoder; and

a first output processor in communication with the word-level decoder, the sentence-level decoder, and the text-to-speech module;

a production block for processing the audio feed and text feed, the production block comprising:

a speech recognition module;

an input processor in communication with the speech recognition module;

an input-to-pose generator in communication with the input processor the input-to-pose genera of figured to e f poses;

a pose sequence buffer in communication with the input-to-pose generator, the pose sequence buffer being configured to store the sequence of poses, check when the pose sequence buffer is empty, and generate an end-of-pose signal indicating an end of the sequence of poses in the pose sequence buffer; and

a second output processor in communication with the pose sequence buffer to receive the sequence of poses, the second output processor being configured to receive the end-of-pose signal;

wherein a production model in the production block and a translation model in the translation block are trained simultaneously by machine learning methods.