US 11,741,755 B2
	Method and apparatus for recognizing sign language or gesture using 3D EDM
Sang Ki Ko, Suwon-si (KR); Hye Dong Jung, Seoul (KR); Han Mu Park, Seongnam-si (KR); and Chang Jo Kim, Suwon-si (KR)
Assigned to Korea Electronics Technology Institute, Seongnam-si (KR)
Filed by Korea Electronics Technology Institute, Seongnam-si (KR)
Filed on Jul. 30, 2020, as Appl. No. 16/942,985.
Claims priority of application No. 10-2019-0093807 (KR), filed on Aug. 1, 2019.
Prior Publication US 2021/0034846 A1, Feb. 4, 2021
Int. Cl. G06V 40/00 (2022.01); G06V 40/20 (2022.01); G06F 3/01 (2006.01); G06N 3/084 (2023.01); G10L 13/00 (2006.01); G06N 3/045 (2023.01); G10L 15/24 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 10/44 (2022.01)

CPC G06V 40/28 (2022.01) [G06F 3/011 (2013.01); G06F 3/017 (2013.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01); G06V 10/454 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G10L 13/00 (2013.01); G10L 15/24 (2013.01)]

14 Claims

1. A method of recognizing a sign language or a gesture by using a three-dimensional (3D) Euclidean distance matrix (EDM), the method comprising:

generating a two-dimensional (2D) EDM including information about distances between feature points of a body recognized in image information by a 2D EDM generation processor;

converting the 2D EDM to a 3D EDM to generate the EDM by inputting the 2D EDM to an input layer of a first deep learning neural network, trained with training data in which input data is a 2D EDM and correct answer data is a 3D EDM by a 3D EDM generation processor, configured to provide the 3D EDM, corresponding to the 2D EDM being input, from an output layer of the first deep learning neural network; and

recognizing a sign language or a gesture based on the 3D EDM,

wherein the image information is image data including frame images generated by capturing a user expressing the sign language or gesture with the 2D camera,

wherein the feature points refer to body parts of the user shown in the frame image, and are selected for sign language or gesture recognition, and have the coordinate values in the form of (X, Y) according to the width (X-axis) and height (Y-axis) of the frame image,

wherein the 2D EDM represents 2D position relationship between any feature point and the other feature points in the frame image, which is a two-dimensional space, and

wherein the 3D EDM represents 3D position relationship between any feature point and the other feature points in a three-dimensional space, which further include depth direction (Z-axis) of the frame image.