US 12,406,583 B2
	Methods for spatio-temporal scene-graph embedding for autonomous vehicle applications
Mohammad Abdullah Al Faruque, Irvine, CA (US); Shih-Yuan Yu, Irvine, CA (US); Arnav Vaibhav Malawade, Irvine, CA (US); Deepan Muthirayan, Irvine, CA (US); and Pramod P. Khargonekar, Irvine, CA (US)
Assigned to THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, Oakland, CA (US)
Filed by THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, Oakland, CA (US)
Filed on Jan. 16, 2023, as Appl. No. 18/154,987.
Claims priority of provisional application 63/300,386, filed on Jan. 18, 2022.
Prior Publication US 2023/0230484 A1, Jul. 20, 2023
Int. Cl. G08G 1/16 (2006.01); B60W 30/095 (2012.01); B60W 60/00 (2020.01)

CPC G08G 1/166 (2013.01) [B60W 30/0956 (2013.01); B60W 60/0015 (2020.02); G08G 1/167 (2013.01); B60W 2420/403 (2013.01); B60W 2552/00 (2020.02); B60W 2554/402 (2020.02); B60W 2554/4041 (2020.02)]

20 Claims

1. A computing system (100) implemented into an autonomous vehicle for generating one or more scene-graphs based on one or more images and calculating a probability of collision based on the one or more scene-graphs, the system (100) comprising:

a. a processor (110) capable of executing computer-readable instructions; and

b. a memory component (120) communicatively coupled to the display component and the processor (110), the memory component (120) comprising:

i. a data generation module (121) comprising computer-readable instructions for:

A. accepting the one or more images;

B. identifying one or more objects in each image;

C. identifying an ego-object in each image; and

D. extracting the one or more objects and the ego-object from the one or more images as an object dataset for each image;

ii. a scene-graph extraction module (122) comprising computer-readable instructions for:

A. calculating a bounding box for each object of each object dataset;

B. computing an inverse-perspective mapping transformation of the image to generate a bird's-eye view (BEV) representation of each image;

C. projecting the one or more bounding boxes onto the BEV representation of each corresponding image;

D. estimating a position of the one or more objects relative to the ego-object based on the one or more bounding boxes of each BEV representation;

E. identifying a proximity relation between the ego-object and each object of the object dataset for each image by measuring a distance between the ego-object and each object;

F. identifying a directional relation between the ego-object and each object of the object dataset for each image by determining a relative orientation of the ego-object and each object; and

G. generating a scene-graph for each image based on the BEV representation, the one or more proximity relations, the one or more directional relations, and the one or more belonging relations;

wherein each scene-graph comprises one or more nodes representing the corresponding ego-object and the corresponding object dataset; and

iii. a collision prediction module (123) comprising computer-readable instructions for:

A. embedding each node of each scene-graph through use of a machine learning model;

B. assigning each node a score based on a potential for collision with the ego-object;

C. condensing the one or more scene-graphs into a spatial graph embedding;

D. generating a spatio-temporal graph embedding from the spatial graph embedding through use of a sequence model;

E. calculating a confidence value for whether or not a collision will occur; and

F. actuating, based on the confidence value, the autonomous vehicle to execute an evasive maneuver such that the autonomous vehicle avoids the collision;

wherein all modules of the memory component (120) are communicatively coupled to each other.