US 12,282,633 B2
Method and system for predicting touch interaction position on large display based on binocular camera
Gangyong Jia, Hangzhou (CN); Yumiao Zhao, Hangzhou (CN); Huanle Rao, Hangzhou (CN); Ziwei Song, Hangzhou (CN); Minghui Yu, Hangzhou (CN); and Hong Xu, Hangzhou (CN)
Assigned to HANGZHOU DIANZI UNIVERSITY, (CN)
Filed by Hangzhou Dianzi University, Hangzhou (CN)
Filed on Sep. 5, 2023, as Appl. No. 18/242,040.
Claims priority of application No. 202211095073.4 (CN), filed on Sep. 5, 2022.
Prior Publication US 2024/0077977 A1, Mar. 7, 2024
Int. Cl. G06F 3/042 (2006.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01)
CPC G06F 3/0425 (2013.01) [G06V 10/7715 (2022.01); G06V 10/774 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 20/46 (2022.01); G06V 40/28 (2022.01)] 7 Claims
OG exemplary drawing
 
1. A method for predicting a touch interaction position on a large display based on a binocular camera, comprising the following steps:
S1, separately acquiring arm movement video frames of a user and facial and eye movement video frames of the user by a binocular camera;
S2, extracting a video clip of each tapping action from the arm movement video frames and the facial and eye movement video frames and obtaining a key frame by screening;
S3, marking the key frame of each tapping action with coordinates to indicate coordinates of a finger in a display screen;
S4, inputting the marked key frame to an efficient convolutional network for online video understanding (ECO)-Lite neural network for training to obtain a predictive network model; and
S5, inputting a video frame of a current operation to be predicted to the predictive network model and outputting a touch interaction position predicted for the current operation.