US 12,236,673 B2
	System and method for training a model to perform semantic segmentation on low visibility images using high visibility images having a close camera view
Wim Abbeloos, Brussels (BE); Christos Sakaridis, Zurich (CH); Luc Van Gool, Zurich (CH); and Dengxin Dai, Zurich (CH)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP); and ETH ZURICH, Zurich (CH)
Appl. No. 17/625,590
Filed by TOYOTA MOTOR EUROPE, Brussels (BE); and ETH ZURICH, Zurich (CH)
PCT Filed Jul. 10, 2019, PCT No. PCT/EP2019/068603 § 371(c)(1), (2) Date Jan. 7, 2022, PCT Pub. No. WO2021/004633, PCT Pub. Date Jan. 14, 2021.
Prior Publication US 2022/0284696 A1, Sep. 8, 2022
Int. Cl. G06V 10/774 (2022.01); G06T 7/11 (2017.01); G06V 20/56 (2022.01)

CPC G06V 10/774 (2022.01) [G06T 7/11 (2017.01); G06V 20/56 (2022.01); G06T 2207/20028 (2013.01); G06T 2207/20081 (2013.01)]

7 Claims

1. A method for training a model to be used for semantic segmentation of images taken under low visibility conditions, comprising:

obtaining a plurality of sets of images, each set of images being associated with an index z comprised between 1 and Z, the index z indicating a level of visibility of the images of the set of images, 1 corresponding to the highest level of visibility and Z corresponding to the lowest level of visibility,

wherein the model is initially trained to perform semantic segmentation using an annotated set of images having a level of visibility of 1 and the associated semantic segmentation labels, and

for z being greater than or equal to 2, iteratively training the model comprising:

a—obtaining, for at least a first image of the set of images of index z, a preliminary semantic segmentation label by applying the model trained on the set of images of index z−1 to the first image,

b—obtaining, for the at least the first image, a processed preliminary semantic segmentation label using the preliminary semantic segmentation label and a semantic segmentation label obtained by applying the initially trained model on a selected image of the set of images of index 1, the selected image being selected as the image having a camera view which is closest to a camera view of the first image by:

using a cross bilateral filter between the semantic segmentation label associated with the selected image of the set of images of index 1 and the semantic segmentation label associated with the first image, and

performing a fusion of an output of the cross bilateral filter with the semantic segmentation label associated with the first image,

wherein the cross bilateral filter performs:

in which:

p and q are pixel positions,

Ŝ¹(p) is the output of the cross bilateral filter for a pixel p,

(p) is a neighborhood of p,

G_σ_{_s}is a spatial-domain Gaussian kernel,

G_σ_{_r}is a color-domain kernel,

I^z(q) and I^z(p) respectively designate a color value at pixel q and pixel p in the first image from the set of index z, and

S¹(q) is a semantic segmentation label at pixel q for the selected image of the set of images of index 1,

c—training the model using:

the set of images of index z and the associated processed semantic segmentation labels, and

a synthetic set of images of index z and the associated semantic segmentation labels both generated from the annotated set of images having a level of visibility of 1 and the associated semantic segmentation labels; and

d—performing steps a to c for z+1.