US 12,008,794 B2
	Systems and methods for intelligent video surveillance
Zhong Zhang, Great Falls, VA (US)
Assigned to SHANGHAI TRUTHVISION INFORMATION TECHNOLOGY CO., LTD., Shanghai (CN)
Filed by SHANGHAI TRUTHVISION INFORMATION TECHNOLOGY CO., LTD., Shanghai (CN)
Filed on Apr. 23, 2021, as Appl. No. 17/239,536.
Application 17/239,536 is a continuation of application No. PCT/CN2019/113176, filed on Oct. 25, 2019.
Claims priority of provisional application 62/750,795, filed on Oct. 25, 2018.
Claims priority of provisional application 62/750,797, filed on Oct. 25, 2018.
Prior Publication US 2021/0241468 A1, Aug. 5, 2021
Int. Cl. G06T 7/292 (2017.01); G06F 18/214 (2023.01); G06K 9/00 (2022.01); G06K 9/62 (2022.01); G06N 3/08 (2023.01); G06T 7/246 (2017.01); G06T 7/80 (2017.01); G06V 10/20 (2022.01); G06V 20/10 (2022.01); G06V 20/52 (2022.01)

CPC G06V 10/255 (2022.01) [G06F 18/214 (2023.01); G06N 3/08 (2013.01); G06T 7/246 (2017.01); G06T 7/292 (2017.01); G06T 7/80 (2017.01); G06V 20/10 (2022.01); G06V 20/52 (2022.01)]

20 Claims

1. A system, comprising:

a storage device storing a set of instructions; and

at least one processor configured to communicate with the storage device, wherein when executing the set of instructions, the at least one processor is directed to cause the system to perform operations including:

obtaining a video collected by a visual sensor, the video including a plurality of frames;

detecting, in at least a portion of the plurality of frames, one or more objects from the video;

determining, with a trained self-learning model, a first detection result associated with the one or more objects;

determining, based on the at least a portion of the plurality of frames, one or more behavior features associated with each of the one or more objects;

determining, based on the one or more behavior features associated with each of the one or more objects, a second detection result associated with each of the one or more objects; and

determining, based on the first detection result and the second detection result, a tartlet moving object of interest from the one or more objects, wherein the trained self-learning model is provided based on a plurality of training samples collected by the visual sensor.