US 12,014,645 B2
	Virtual tutorials for musical instruments with finger tracking in augmented reality
Ilteris Canberk, Marina Del Rey, CA (US); and Dmytro Kucher, Odessa (UA)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Ilteris Canberk, Marina Del Rey, CA (US); and Dmytro Kucher, Odessa (UA)
Filed on Aug. 3, 2022, as Appl. No. 17/880,425.
Application 17/880,425 is a continuation of application No. 16/865,995, filed on May 4, 2020, granted, now 11,798,429.
Prior Publication US 2022/0375362 A1, Nov. 24, 2022
Int. Cl. G09B 15/02 (2006.01); G06F 3/01 (2006.01); G06T 7/50 (2017.01); G06T 7/70 (2017.01); G06T 11/00 (2006.01); G10H 1/00 (2006.01)

CPC G09B 15/023 (2013.01) [G06F 3/011 (2013.01); G06T 7/50 (2017.01); G06T 7/70 (2017.01); G06T 11/00 (2013.01); G10H 1/0016 (2013.01); G06T 2200/24 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/10028 (2013.01); G06T 2207/30244 (2013.01); G10H 2210/076 (2013.01); G10H 2220/091 (2013.01)]

15 Claims

1. A method of presenting a tutorial using an eyewear device, the eyewear device comprising a processor, a memory, a camera, and a display, the method comprising the steps of:

capturing frames of video data with the camera;

registering, using the processor, a marker location associated with a musical instrument in a physical environment;

estimating a local position of the eyewear device relative to the marker location based on the frames of video data;

retrieving from the memory a song file associated with the musical instrument, wherein the song file comprises a tempo, a sequence of notes and note values, and a series of virtual tutorial objects;

presenting on the display, based on the local position, the series of virtual tutorial objects relative to the marker location in accordance with the song file;

retrieving from the memory a set of sounds associated with the musical instrument, wherein each sound is associated with a set of finger engagements with one or more of a plurality of actuator locations, wherein the set of finger engagements comprises correct fingertip positions, and wherein each correct fingertip position is associated with an expected sound selected from the set of sounds;

correlating the sequence of notes in the song file with the set of finger engagements, such that a first note is correlated with a first correct fingertip position to produce a first expected sound;

detecting a hand shape in a first frame of the frames of video data, wherein the first frame includes first depth information for a first plurality of pixels;

calculating a set of expected fingertip coordinates based on the hand shape;

for each note in the sequence of notes in the song file, calculating a sum of the geodesic distances between the set of expected fingertip coordinates and the correct fingertip positions; and

in response to determining that the sum is greater than a threshold accuracy value, presenting a failure indicator on the display.