US 12,340,580 B2
Method and electronic device for recognizing image context
Kiran Nanjunda Iyer, Bengaluru (IN); Biplab Ch Das, Bengaluru (IN); and Sathish Chalasani, Bengaluru (IN)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Nov. 8, 2022, as Appl. No. 17/983,135.
Claims priority of application No. 202141051048 (IN), filed on Nov. 8, 2021; and application No. 202141051048 (IN), filed on Aug. 24, 2022.
Prior Publication US 2023/0147843 A1, May 11, 2023
Int. Cl. G06V 20/40 (2022.01); G06V 10/44 (2022.01); G06V 10/62 (2022.01); G06V 20/50 (2022.01); G06V 20/70 (2022.01)
CPC G06V 20/41 (2022.01) [G06V 10/44 (2022.01); G06V 10/62 (2022.01); G06V 20/50 (2022.01); G06V 20/70 (2022.01); G06V 2201/07 (2022.01)] 12 Claims
OG exemplary drawing
 
1. A method for recognizing image context by an electronic-device, device, the method comprising:
capturing a first image frame from a preview of an imaging sensor of the electronic device;
recognizing a first scene that is captured in the first image frame;
recognizing at least one second scene in a plurality of image frames that is not captured in the first image frame; and
determining contextual information of the first image frame based on the first scene and the at least one second scene,
wherein the determining the contextual information of the first image frame based on the first scene and the at least one second scene comprises:
identifying objects in the first image frame,
extracting visual features from the first scene and the at least one second scene,
performing bidirectional temporal shifting of the visual features in temporal dimension,
determining attention weights for each visual feature of the at least one second scene corresponding to each visual feature of the first scene by applying a contextual attention on the temporally shifted features,
determining context of the first scene and the at least one second scene by averaging the temporal shifted visual features using the attention weights,
determining contextual stable visual features by concatenating the context of the first scene and the at least one second scene with each visual feature of the first scene and the at least one second scene,
reducing a dimension of the contextual stable visual features,
updating the dimensionally reduced contextual stable features and the objects in the first image frame,
performing an assignment of objects in the first image frame with reference to the objects in the plurality of image frames for identifying the objects that disappeared in the first image frame with reference to the plurality of image frames,
recovering the objects that disappeared in the first image frame with reference to the plurality of image frames using a heuristics based linear constraint and a linear cost function, and
determining the contextual information of the first image frame based on the objects in the first image frame and the recovered objects.