US 12,437,554 B2
	Detecting at least one emergency vehicle using a perception algorithm
Arindam Das, Mumbai (IN); Sudarshan Paul, Mumbai (IN); Sanjoy Das, Mumbai (IN); and Deep Doshi, Troy, MI (US)
Assigned to Connaught Electronics Ltd., Tuam (IE)
Filed by Connaught Electronics Ltd., Tuam (IE)
Filed on Feb. 17, 2023, as Appl. No. 18/170,642.
Prior Publication US 2024/0282116 A1, Aug. 22, 2024
Int. Cl. G06V 20/58 (2022.01); G06N 3/0464 (2023.01)

CPC G06V 20/58 (2022.01) [G06N 3/0464 (2023.01); B60W 2420/403 (2013.01); B60W 2420/54 (2013.01); B60W 2422/95 (2013.01)]

20 Claims

1. A computer-implemented method for training a perception algorithm of a vehicle, the computer-implemented method comprising:

providing a convolutional recurrent neural network (CRNN) and at least one further artificial neural network (ANN) to detect at least one emergency vehicle;

receiving at least two time-dependent audio datasets from two microphones mounted at different positions on the vehicle;

generating at least two spectrograms based on the at least two time-dependent audio datasets;

generating at least one interaural difference map based on the at least two spectrograms, which contains at least one of an interaural phase difference map or an interaural time difference map or an interaural level difference map;

generating audio source localization data for at least one grid cell of a predefined spatial grid in an environment of the vehicle;

specifying a number of audio sources in the at least one grid cell;

applying the CRNN to a first input data containing the at least two spectrograms and the at least one interaural difference map;

receiving at least one camera image from at least one camera mounted to the vehicle;

predicting output data of at least one bounding box for the at least one emergency vehicle by applying the at least one further ANN to a second input data containing the at least one camera image and the at least two spectrograms; and

adapting network parameters of the CRNN and the at least one further ANN by depending on the output data and the audio source localization data.