| CPC B60W 60/0027 (2020.02) [B60W 10/18 (2013.01); B60W 10/20 (2013.01); B60W 30/18163 (2013.01); G05B 13/027 (2013.01); G06F 18/21 (2023.01); G06N 3/045 (2023.01); G06V 10/22 (2022.01); G06V 20/41 (2022.01); G06V 20/584 (2022.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01); H04N 23/90 (2023.01); H04R 1/08 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01); B60W 2420/40 (2013.01); B60W 2420/54 (2013.01); B60W 2554/402 (2020.02); B60W 2554/4041 (2020.02); B60W 2554/4044 (2020.02); H04R 2499/13 (2013.01)] | 18 Claims |

|
1. A computer-implemented method of operating an autonomous driving vehicle (ADV), the method comprising:
receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV;
determining, by the ADS using a first neural network model, a first detection result including a first probability that at least a portion of the stream of captured audio signals is from a siren sound, and a moving direction of the siren sound including a moving direction indicator indicating whether a source of the siren sound is moving towards the ADV or moving away from the ADV;
determining, by the ADS using a second neural network model, a second detection result including a second probability that at least one image frame of the sequence of image frames is from an emergency vehicle, and a distance between the ADV and the emergency vehicle, wherein the distance between the ADV and the emergency vehicle is determined based on a size of a bounding box surrounding the at least one image frame and one or more extrinsic parameters of an image capturing device used to capture the at least one image frame, wherein the size of bounding box and the one or more extrinsic parameters of the image capturing device is used as part of labeling data of the at least one image frame, and wherein the one or more extrinsic parameters include a relative rotation and translation between cameras in a multi-camera arrangement;
determining that an emergency vehicle is present in the surrounding environment in response to at least one of the first probability of the first neural network model exceeds a first predefined threshold or the second probability of the second neural network model exceeds a second predefined threshold;
determining a position of the emergency vehicle and a moving direction of the emergency vehicle by fusing the first detection result of the first neural network model and the second detection result of the second neural network model;
determining whether the emergency vehicle is moving towards the ADV based on the position of the emergency vehicle and the moving direction of the emergency vehicle; and
controlling the ADV by steering out of a current driving lane, braking to decelerate, or steering to a side of a road, in response to determining that the emergency vehicle is moving towards the ADV.
|