US 12,112,514 B2
	Device for generating prediction image on basis of generator including concentration layer, and control method therefor
Junik Jang, Gyeonggi-do (KR); Jaeil Jung, Gyeonggi-do (KR); and Jonghee Hong, Gyeonggi-do (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Gyeonggi-do (KR)
Filed on Jun. 29, 2021, as Appl. No. 17/361,556.
Application 17/361,556 is a continuation of application No. PCT/KR2020/006356, filed on May 14, 2020.
Claims priority of application No. 10-2019-0058189 (KR), filed on May 17, 2019; and application No. 10-2020-0020271 (KR), filed on Feb. 19, 2020.
Prior Publication US 2021/0326650 A1, Oct. 21, 2021
Int. Cl. G06V 10/40 (2022.01); G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06T 9/00 (2006.01); G06V 10/75 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/40 (2022.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06T 9/002 (2013.01); G06V 10/75 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)]

9 Claims

1. An electronic apparatus comprising

a memory storing a generator previously trained to generate a prediction image based on one or more input images;

wherein the generator is configured to include a first neural network for performing encoding with respect to plurality of inputted image frames, and a second neural network which is connected with the first neural network and configured to perform decoding with respect to data encoded through the first neural network,

wherein the first neural network includes a first attention layer and the second neural network includes a second attention layer, and

a processor configured to:

acquire feature data from a plurality of image frames input through at least one layer included in the generator,

extract feature data corresponding to change over time from the feature data acquired through an attention layer included in the generator,

acquire a first image data block by performing max-pooling and deconvolution through the first attention layer,

acquire a second image data block by performing max-pooling and deconvolution through the second attention layer, and

acquire a third image data block by performing concatenation of connecting the first image data block to the second image data block by inputting the first image data block in the second attention layer,

wherein the third image data block includes feature data for a resolution which is smaller than the existing number of pixels and image data corresponding to a change in motion over time,

acquire a first prediction image frame by inputting the extracted feature data and the third image data block to at least one other layer included in the generator,

wherein each of the plurality of image frames consists of a plurality of pixels, and

based on a result of comparing an image frame inputted after the plurality of image frames and the first prediction image frame, train the generator to extract feature data of pixels predicted to change over time from feature data for each of the plurality of pixels outputted from the at least one layer, and

acquire a second prediction image frame based on the trained generator.