US 11,810,578 B2
	Device arbitration for digital assistant-based intercom systems
Benjamin S. Phipps, San Francisco, CA (US); Sachin Kajarekar, Sunnyvale, CA (US); Eugene Ray, Cupertino, CA (US); Mahesh Ramaray Shanbhag, Santa Clara, CA (US); Kisun You, Campbell, CA (US); and Patrick L. Coffman, San Francisco, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Oct. 16, 2020, as Appl. No. 17/073,092.
Claims priority of provisional application 63/023,188, filed on May 11, 2020.
Prior Publication US 2021/0350810 A1, Nov. 11, 2021
Int. Cl. G10L 17/22 (2013.01); G10L 17/18 (2013.01)

CPC G10L 17/22 (2013.01) [G10L 17/18 (2013.01)]

54 Claims

1. A method for providing an intercom service, via a first electronic device, a second electronic device, and a third electronic device, wherein each of the first, second, and third electronic devices includes one or more processors and memory, the method comprising:

receiving, at the first electronic device, a first speech input including a message and a trigger phrase, wherein the message was spoken by a first speaker and the trigger phrase indicates a request to provide the intercom service;

in response to receiving the trigger phrase, causing each of the second electronic device and the third electronic device to provide an audible representation of the message;

receiving, from the second electronic device, a second speech input including a first reply to the message and a first acoustic fingerprint including a first acoustic transmission metric associated with a first spatial relationship between the second electronic device and a speaker of the first reply to the message and a first set of one or more embeddings that emphasize speaker specific characteristics of the speaker of the first reply;

receiving, from the third electronic device, a third speech input including a second reply to the message and a second acoustic fingerprint including a second acoustic transmission metric associated with a second spatial relationship between the third electronic device and a speaker of the second reply to the message and a second set of one or more embeddings that emphasize speaker specific characteristics of the speaker of the second reply;

comparing the first acoustic fingerprint to the second acoustic fingerprint to determine whether the speaker of the first reply and the speaker of the second reply are the same;

in response to determining that the speaker of the first reply to the message and the speaker of the second reply to the message are the same speaker, comparing the first acoustic transmission metric to the second acoustic transmission metric to determine whether an acoustic quality of the first reply message is greater than an acoustic quality of the second reply message; and

in response to determining that the acoustic quality of the first reply message is greater than the acoustic quality of the second reply message, generating a device-user mapping wherein the first speaker is mapped to the first electronic device and the speaker of the second speech input and the third speech input is mapped to the second electronic device.