US 12,243,328 B2
	Scalable road sign interpretation system for autonomous driving
Jiejun Xu, Diamond Bar, CA (US); Kenji Yamada, Los Angeles, CA (US); Michael J. Daily, Thousand Oaks, CA (US); Alireza Esna Ashari Esfahani, Novi, MI (US); Hyukseong Kwon, Thousand Oaks, CA (US); Darren Michael Chan, Thousand Oaks, CA (US); Alan Perry, Canoga Park, CA (US); and Joshua Lampkins, Los Angeles, CA (US)
Assigned to GM GLOBAL TECHNOLOGY OPERATIONS LLC, Detroit, MI (US)
Filed by GM Global Technology Operations LLC, Detroit, MI (US)
Filed on Aug. 17, 2022, as Appl. No. 17/820,317.
Prior Publication US 2024/0062555 A1, Feb. 22, 2024
Int. Cl. G06V 20/58 (2022.01); G06N 3/045 (2023.01); G06V 10/82 (2022.01); G06V 20/62 (2022.01); G06V 30/262 (2022.01)

CPC G06V 20/582 (2022.01) [G06N 3/045 (2023.01); G06V 10/82 (2022.01); G06V 20/63 (2022.01); G06V 30/274 (2022.01)]

13 Claims

1. A road sign interpretation system, comprising:

a camera mounted on or in a vehicle, the camera collecting a data set having image data of multiple road signs;

a first convolutional neural network (CNN) receiving the image data from the camera and yielding a set of sign predictions including multiple sign instances;

a second CNN defining a text extractor receiving the image data from the camera and extracting text candidates including multiple text instances;

a text location from the multiple text instances computed in the second CNN to provide sign and sign data localization;

a sign text synthesizer module receiving individual sign instances of the multiple sign instances from the first CNN and individual ones of the multiple text instances in digitized forms from an optical character recognizer (OCR) wherein the sign text synthesizer:

evaluates a text-sign membership including whether or not sign text lies within a bounding region of one of the sign instances;

rearranges and configures the sign text instances into a logical reading order, including left-to-right and top-to-bottom as individual text instances or as synthesized text instances by determining two-dimensional eigenvectors of individual ones of the text instances using <x,y> coordination that form segment contours, x-directional eigenvectors are then extended to form line segments, such that line segment endpoints intersect inside the corresponding sign bounding box, and when any two-or-more line segments intersect, the corresponding text is appended to a list and reordered by an increasing <x> position, which determines left-to-right text ordering, and when multiple text lines exist within a sign instance, they are ordered by an increasing <y> position, which determines top-to-bottom text ordering; and

determines sign-text membership by computing an overlapping region between the text instances and the bounding region of the sign instance, wherein when the text instance is fully encapsulated by the bounding region, the text instance is assigned a member of the corresponding sign instance; and

a semantic encoding and interpretation module receiving the multiple text instances and identifying semantics of the multiple road signs by feeding each text instance into a Universal Sentence Encoder (USE) to generate a fixed-length feature vector for each text instance that is based on semantic features, converting the fixed-length feature vector into a data point in a fixed size space, and calculating how close or distance two data points representing two different text instances are in order to measure a semantic relatedness between the two different text instances.