US 12,333,807 B2
Object data generation for remote image processing
Moshe David, Giv'atayim (IL); Aviv Hurvitz, Tel Aviv (IL); Eyal Krupka, Shimshit (IL); Qingfen Lin, Redmond, WA (US); and Arash Ghanaie-Sichanie, Woodinville, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 24, 2021, as Appl. No. 17/328,592.
Prior Publication US 2022/0374636 A1, Nov. 24, 2022
Int. Cl. G06V 20/40 (2022.01); G06T 7/20 (2017.01); G06T 7/70 (2017.01); G06V 10/22 (2022.01); G06V 40/16 (2022.01); G10L 25/57 (2013.01); H04N 7/15 (2006.01)
CPC G06V 20/41 (2022.01) [G06T 7/20 (2013.01); G06T 7/70 (2017.01); G06V 10/22 (2022.01); G06V 40/161 (2022.01); G10L 25/57 (2013.01); H04N 7/15 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30201 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for reducing an amount of data transmitted to a remote system for remote image processing to identify a plurality of objects in a scene, the system comprising:
a processor; and
a non-transitory computer-readable medium in communication with the processor, the computer-readable medium comprising instructions that, when executed by the processor, cause the processor to control the system to perform functions of:
receiving a video stream capturing the scene including the plurality of objects that are independently movable;
identifying, within the scene captured in the received video stream, a plurality of object areas respectively corresponding to the plurality of objects, each object area capturing a visual feature of the corresponding object;
tracking the plurality of object areas within the scene captured in the received video stream over a time;
generating, based on the tracking of the plurality of object areas, a plurality of visual data sets respectively representing visual characteristics of the plurality of object areas in the scene captured, wherein generating the plurality of visual data sets is repeated at a plurality of different times such that the plurality of visual data sets is newly generated at each different time to respectively represent the visual characteristics of the plurality of object areas in the scene captured at each different time; and
in response to the plurality of visual data sets being newly generated at each different time, performing functions of:
determining a transmission priority of each newly generated visual data set based on at least one of:
a confidence value of each visual data set previously transmitted to the remote system via a communication network, the confidence value determined by the remote system and indicating a confidence level of an identity of the object corresponding to the object area represented by each newly generated visual data set;
a most recent time that the visual data set representing each object area has been transmitted to the remote system for the remote image processing; and
an occurrence of a new object area due to a new object appearing in the scene captured in the received video stream;
determining, based on the transmission priority of each newly generated visual data set, whether each newly generated visual data set needs to be included in a subset for transmission to the remote system for the remote image processing, the subset including less than all of the plurality of newly generated visual data sets; and
transmitting, to the remote system via the communication network, only the subset for the remote image processing to remotely identify, at the remote system, the object corresponding to each visual data set included in the transmitted subset.