US 12,451,138 B2
	Cross-lingual speaker recognition
Elie Khoury, Atlanta, GA (US); Tianxiang Chen, Atlanta, GA (US); Avrosh Kumar, Atlanta, GA (US); Ganesh Sivaraman, Atlanta, GA (US); and Kedar Phatak, Atlanta, GA (US)
Assigned to Pindrop Security, Inc., Atlanta, GA (US)
Filed by Pindrop Security, Inc., Atlanta, GA (US)
Filed on Oct. 31, 2022, as Appl. No. 17/977,521.
Claims priority of provisional application 63/274,909, filed on Nov. 2, 2021.
Claims priority of provisional application 63/274,460, filed on Nov. 1, 2021.
Prior Publication US 2023/0137652 A1, May 4, 2023
Int. Cl. G10L 15/00 (2013.01); G10L 17/00 (2013.01); G10L 17/04 (2013.01); G10L 17/10 (2013.01)

CPC G10L 17/04 (2013.01) [G10L 17/10 (2013.01)]

15 Claims

1. A computer-implemented method comprising: extracting, by a computer, an enrolled voiceprint for an enrolled speaker by applying an embedding extraction engine on one or more enrollment signals of the enrolled speaker, the enrolled voiceprint representing a plurality of enrollment acoustic features of the one or more enrollment signals; extracting, by the computer, an inbound voiceprint for an inbound speaker by applying the embedding extraction engine on one or more inbound signals of the inbound speaker, the inbound voiceprint representing a plurality of inbound acoustic features of the one or more inbound signals; generating, by the computer, one or more language likelihood scores by applying a language classifier on the enrolled voiceprint and the inbound voiceprint indicating a likelihood that an enrollment signal and a paired inbound signal include one or more languages; and generating, by the computer, a cross-lingual quality measure based upon one or more differences of the one or more language likelihood scores generated for the one or more enrollment signals and the one or more inbound signals, the cross-lingual quality measure indicating whether the enrollment signal and the paired inbound signal include a same language of the one or more languages.