US 12,334,096 B2
	Directional voice sensing using coherent optical detection
Eran Tal, Petach Tikva (IL); and Ariel Lipson, Tel Aviv (IL)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Oct. 3, 2023, as Appl. No. 18/376,262.
Application 18/376,262 is a continuation of application No. 17/477,382, filed on Sep. 16, 2021, granted, now 11,854,568.
Prior Publication US 2024/0029752 A1, Jan. 25, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/0232 (2013.01); G01H 9/00 (2006.01); G01S 7/52 (2006.01); G01S 15/89 (2006.01); G10L 25/84 (2013.01); H04R 1/22 (2006.01); H04R 23/00 (2006.01)

CPC G10L 21/0232 (2013.01) [G01H 9/00 (2013.01); G01S 7/52046 (2013.01); G01S 15/8993 (2013.01); G10L 25/84 (2013.01); H04R 1/222 (2013.01); H04R 23/008 (2013.01)]

15 Claims

1. An electronic device, comprising:

a microphone;

an array of coherent optical emitters;

an array of balanced coherent optical vibration sensors, each balanced coherent optical vibration sensor in the array of balanced coherent optical vibration sensors paired with a coherent optical emitter in the array of coherent optical emitters;

a camera positioned to capture an image of a field of view; and

a processor configured to,

analyze a set of waveforms acquired by the array of balanced coherent optical vibration sensors;

identify, using the analysis of the set of waveforms, a set of one or more voices in the field of view;

identify a set of one or more voice sources in the image;

map the set of one or more voices to the set of one or more voice sources; and

adjust an output of the microphone to accentuate a particular voice in the set of one or more voices, wherein:

the processor is configured to map the set of one or more voices to the set of one or more voice sources by:

determining, based at least partly on the image, a first direction to a voice source in the set of one or more voice sources, the voice source producing the particular voice in the set of one or more voices;

determining, based at least partly on a subset of waveforms including the particular voice, and based at least partly on a directionality of a subset of balanced coherent optical vibration sensors that generated the subset of waveforms, a second direction to the voice source; and

correlating the first direction with the second direction to map the particular voice to the voice source.