US 12,482,114 B2
Method for detecting and/or tracking moving objects within a certain zone and sports video production system in which such a method is implemented
Michael Stuurman, Arnhem (NL); and Michel Alexander Bais, Enkhuizen (NL)
Assigned to MOBILE VIEWPOINT B.V., Alkmaar (NL)
Appl. No. 18/003,322
Filed by MOBILE VIEWPOINT B.V., Alkmaar (NL)
PCT Filed Jun. 22, 2021, PCT No. PCT/NL2021/050393
§ 371(c)(1), (2) Date Dec. 26, 2022,
PCT Pub. No. WO2021/261997, PCT Pub. Date Dec. 30, 2021.
Claims priority of application No. 2025923 (NL), filed on Jun. 26, 2020.
Prior Publication US 2023/0252653 A1, Aug. 10, 2023
Int. Cl. G06T 7/292 (2017.01); G06T 7/215 (2017.01)
CPC G06T 7/292 (2017.01) [G06T 7/215 (2017.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01)] 12 Claims
OG exemplary drawing
 
1. A method for detecting and/or tracking moving objects within a certain zone, such as a ball and/or one or more players on a sports pitch comprising:
providing multiple physical cameras around the zone which are synchronised for, successively at regular instances after one another, taking simultaneously at every such instance a set of video frame images of the zone which form input video streams when put in a video sequence after one another, wherein the totality of video frame images of a set made at such an instance jointly cover at least the total area of the zone;
composing a sequence of panoramic views of the zone so to form a panoramic video stream by cutting away overlapping parts of the video frame images of each set and stitching together remaining parts of the video frame images of each set;
defining one or more virtual camera view(s) by selecting for each virtual camera view a corresponding partial or entire view of the panoramic views and by de-warping the selected, partial or entire views into square views or views with another geometry, which form a projection of a corresponding part of the zone;
feeding each of the square, de-warped views or de-warped views with another geometry to a deep learning neural network or an AI-network so to form a corresponding virtual detector; and,
performing a detection with the virtual detectors so to determine the presence or absence of objects in the corresponding part of the zone and possibly their type or class and their location in that part,
wherein a total number (N) bigger than one, of virtual camera views is chosen for covering the complete zone, and
wherein detection of objects in the entire zone is realised by feeding at each of the instances a restricted number (M), including one or a number (M) smaller than the total number (N), of virtual camera view leaning neural network or AI-network, hereby ensuring that after a certain number of instances the total number (N) of virtual camera views has been fed to the deep learning neural network or AI-network, so that detection is performed on the total zone, while the feed to the deep learning neural network of AI-network at every instance is reduced.