US 12,136,185 B2
	Multi-scale distillation for low-resolution detection
Jason Kuen, Santa Clara, CA (US); Jiuxiang Gu, Baltimore, MD (US); and Zhe Lin, Clyde Hill, WA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Nov. 16, 2021, as Appl. No. 17/455,134.
Prior Publication US 2023/0153943 A1, May 18, 2023
Int. Cl. G06T 3/4046 (2024.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06V 10/75 (2022.01)

CPC G06T 3/4046 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06V 10/751 (2022.01)]

19 Claims

1. A method of training a neural network, comprising:

receiving a high-resolution version of a student training image and a low-resolution version of the student training image;

generating a first feature map based on the high-resolution version of the student training image using a high-resolution encoder of a teacher network;

generating a second feature map based on the low-resolution version of the student training image using a low-resolution encoder of the teacher network;

generating a fused feature map based on the first feature map and the second feature map using a crossing feature-level fusion module of the teacher network;

generating a third feature map based on the low-resolution version of the student training image using an encoder of a student network;

computing a knowledge distillation (KD) loss based on a comparison of the third feature map from the student network and the fused feature map from the teacher network; and

updating parameters of the student network based on the KD loss.