US 12,217,014 B2
Method, apparatus, and system for providing interpretation result using visual information
Jinxia Huang, Daejeon (KR); and Jong Hun Shin, Daejeon (KR)
Assigned to ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, Daejeon (KR)
Filed by Electronics and Telecommunications Research Institute, Daejeon (KR)
Filed on Jan. 7, 2022, as Appl. No. 17/570,879.
Claims priority of application No. 10-2021-0002716 (KR), filed on Jan. 8, 2021.
Prior Publication US 2022/0222448 A1, Jul. 14, 2022
Int. Cl. G06F 40/58 (2020.01); G06F 3/01 (2006.01); G06T 7/11 (2017.01); G06V 20/62 (2022.01); G06V 30/14 (2022.01); G06V 40/20 (2022.01)
CPC G06F 40/58 (2020.01) [G06F 3/013 (2013.01); G06T 7/11 (2017.01); G06V 20/63 (2022.01); G06V 30/1444 (2022.01); G06V 40/20 (2022.01); G06T 2207/20021 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method of providing an interpretation result using visual information, which is performed by an apparatus for providing an interpretation result using visual information, the method comprising:
acquiring a spatial domain image including line-of-sight information of a user and gaze position information in the spatial domain image;
segmenting the acquired spatial domain image into a plurality of images;
detecting text areas including text for each of the segmented images;
generating text blocks, each of which is a text recognition result for each of the detected text areas, and determining the text block corresponding to the gaze position information;
converting a first language included in the determined text block into a second language that is a target language; and
providing the user with a conversion result of the second language,
wherein the generating of the text blocks, each of which is the text recognition result for each of the detected text areas, and the determining of the text block corresponding to the gaze position information includes:
combining text blocks consecutively located within predetermined sections adjacent to each other into one text block; and
assigning a unique number to the text block or the combined text block, and
wherein the generating of the text blocks, each of which is the text recognition result for each of the detected text areas, and the determining of the text block corresponding to the gaze position information includes
recognizing the text block corresponding to the unique number assigned in previous visual information as the same text block for a certain period of time.