US 11,671,561 B1
Video conference background cleanup using reference image
Shihwei Chang, Sammamish, WA (US); Robert Aaron Klegon, Chicago, IL (US); Cynthia Eshiuan Lee, Austin, TX (US); Nicholas Mueller, Fitchburg, WI (US); and Shane Paul Springer, Manchester, MI (US)
Assigned to Zoom Video Communications, Inc., San Jose, CA (US)
Filed by Zoom Video Communications, Inc., San Jose, CA (US)
Filed on Jul. 29, 2022, as Appl. No. 17/877,789.
Int. Cl. H04N 7/14 (2006.01); G06T 7/194 (2017.01); G06V 40/16 (2022.01)
CPC H04N 7/147 (2013.01) [G06T 7/194 (2017.01); G06V 40/161 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/30196 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
storing a reference image representing a physical background within a field of view of a camera of a client device;
receiving, via the camera and during a video conference to which the client device is connected, camera-generated visual data for output to at least one remote device connected to the video conference;
identifying, based on facial recognition applied to the camera-generated visual data, foreground imagery representing at least one person and background imagery representing content of the camera-generated visual data other than the foreground imagery;
identifying a difference between the background imagery and the reference image by identifying an item in the background imagery and determining that the item is not present at a co-located part of the reference image, the co-located part of the reference image being identified based on non-movable fixtures depicted in the background imagery and the reference image;
generating a composite image by replacing, within the background imagery of the camera-generated visual data, an item represented within the background imagery and within the identified difference with a co-located part of the reference image; and
transmitting the composite image to the at least one remote device during the video conference.
 
10. A non-transitory computer readable medium storing instructions operable to cause one or more processors to perform operations comprising:
storing a reference image representing a physical background within a field of view of a camera of a client device;
receiving, via the camera and during a video conference to which the client device is connected, camera-generated visual data for output to at least one remote device connected to the video conference;
identifying, based on facial recognition applied to the camera-generated visual data, foreground imagery representing at least one person and background imagery representing content of the camera-generated visual data other than the foreground imagery;
identifying a difference between the background imagery and the reference image by identifying an item in the background imagery and determining that the item is not present at a co-located part of the reference image, the co-located part of the reference image being identified based on non-movable fixtures depicted in the background imagery and the reference image;
generating a composite image by replacing, within the background imagery of the camera-generated visual data, an item represented within the background imagery and within the identified difference with a co-located part of the reference image; and
transmitting the composite image to the at least one remote device during the video conference.
 
19. An apparatus, comprising:
a memory; and
a processor configured to execute instructions stored in the memory to:
store a reference image representing a physical background within a field of view of a camera of a client device;
receive, via the camera and during a video conference to which the client device is connected, camera-generated visual data for output to at least one remote device connected to the video conference;
identify, based on facial recognition applied to the camera-generated visual data, foreground imagery representing at least one person and background imagery representing content of the camera-generated visual data other than the foreground imagery;
identify a difference between the background imagery and the reference image by identifying an item in the background imagery and determining that the item is not present at a co-located part of the reference image, the co-located part of the reference image being identified based on non-movable fixtures depicted in the background imagery and the reference image;
generate a composite image by replacing, within the background imagery of the camera-generated visual data, an item represented within the background imagery and within the identified difference with a co-located part of the reference image; and
transmit the composite image to the at least one remote device during the video conference.