US 11,790,644 B2
	Techniques for dense video descriptions
Yurong Chen, Beijing (CN); Jianguo Li, Beijing (CN); Zhou Su, Beijing (CN); and Zhiqiang Shen, Beijing (CN)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by INTEL CORPORATION, Santa Clara, CA (US)
Filed on Jan. 6, 2022, as Appl. No. 17/569,725.
Application 17/569,725 is a continuation of application No. 16/616,533, granted, now 11,263,489, previously published as PCT/CN2017/090686, filed on Jun. 29, 2017.
Prior Publication US 2022/0180127 A1, Jun. 9, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 10/00 (2022.01); G06V 10/82 (2022.01); G06F 40/169 (2020.01); G06N 3/08 (2023.01); G06V 20/40 (2022.01); G06F 18/214 (2023.01); G06V 30/19 (2022.01); G06V 30/194 (2022.01); G06V 20/70 (2022.01); G06V 20/10 (2022.01)

CPC G06V 10/82 (2022.01) [G06F 18/2155 (2023.01); G06F 40/169 (2020.01); G06N 3/08 (2013.01); G06V 20/10 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/47 (2022.01); G06V 20/70 (2022.01); G06V 30/194 (2022.01); G06V 30/19173 (2022.01)]

20 Claims

1. An apparatus, comprising:

at least one memory; and

logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to:

receive a source video comprising a plurality of frames;

determine a plurality of regions for the plurality of frames;

generate at least one region-sequence connecting the determined plurality of regions based on at least one selection criterion, the at least one selection criterion comprises a coherency selection criterion configured to maximize a cosine similarity between the plurality of regions of the at least one-region sequence; and

apply a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video.