US 12,262,117 B2
	Sensor cropped video image stabilization (VIS)
Patrick A. Carroll, Scotts Valley, CA (US); Ajay Ramesh, San Jose, CA (US); Ashwini Dwarakanath, Cupertino, CA (US); David A. Silverstein, Palo Alto, CA (US); David R. Pope, Campbell, CA (US); Michael W. Tao, San Jose, CA (US); Terence N. Tam, Santa Clara, CA (US); and Vitanshu Sharma, Cupertino, CA (US)
Assigned to Apple Inc., Cupertino, CA (US)
Filed by Apple Inc., Cupertino, CA (US)
Filed on Sep. 21, 2022, as Appl. No. 17/933,941.
Prior Publication US 2024/0098368 A1, Mar. 21, 2024
Int. Cl. H04N 23/68 (2023.01); G06T 7/38 (2017.01)

CPC H04N 23/6845 (2023.01) [G06T 7/38 (2017.01)]

18 Claims

1. A device, comprising:

a memory;

a first image capture device having a first image sensor with a first field view of view (FOV) and a first resolution;

a positional sensor;

a second image capture device having a second image sensor with a second FOV and a second resolution, wherein the second FOV is different than the first FOV, and wherein the second resolution is different than the first resolution; and

one or more processors operatively coupled to the memory, wherein the one or more processors are configured to execute instructions causing the one or more processors to:

receive a first request to begin capturing a first video;

cause the first image capture device to begin to capture a first video image stream, wherein the first video image stream comprises a first plurality of images captured with the first resolution;

for each of one or more respective images of the first plurality of images:

obtain image information corresponding to one or more images in the first plurality of images captured prior to the respective image;

predict, for the respective image, and based, at least in part, on the obtained image information, an image sensor cropping region to be read out from the first image sensor; and

read out, into the memory, a first cropped version of the respective image, wherein the first cropped version of the respective image comprises only the predicted image sensor cropping region for the respective image; and

cause the second image capture device to begin to capture a second video image stream, wherein the second video image stream comprises a second plurality of images captured with the second resolution;

for at least a first image of the second plurality of images:

obtain positional image information from the positional sensor corresponding to a last image captured in the first plurality of images;

predict, based at least in part, on the obtained positional information and a projection operation of the first FOV into the second FOV, an image sensor cropping region for the first image of the second plurality of images; and

read out, into the memory, a second cropped version of the first image of the second plurality of images, wherein the second cropped version of the first image of the second plurality of images comprises only the predicted image sensor cropping region for the first image of the second plurality of images; and

produce the first video based, at least in part, on the first cropped versions of the one or more respective images of the first plurality of images and the second cropped version of the first image of the second plurality of images.

14. An image processing method, comprising:

receiving a first request to begin capturing a first video with a first image capture device having a first image sensor with a first field view of view (FOV) and a first resolution, wherein the first image capture device is electronically coupled to a memory;

causing the first image capture device to begin to capture a first video image stream, wherein the first video image stream comprises a first plurality of images captured with the first resolution;

for each of one or more respective images of the first plurality of images:

obtaining image information corresponding to one or more images in the first plurality of images captured prior to the respective image;

predicting, for the respective image, and based, at least in part, on the obtained image information, an image sensor cropping region to be read out from the first image sensor; and

reading out, into the memory, a first cropped version of the respective image, wherein the first cropped version of the respective image comprises only the predicted image sensor cropping region for the respective image;

predicting, for at least a first image of the first plurality of images, a second cropped version of the first image, wherein the second cropped version of the first image is based, at least in part, on image information obtained from one or more images in the first plurality of images captured subsequently to the first image; and

producing the first video based, at least in part, on the first cropped versions of the one or more respective images of the first plurality of images and the second cropped version of the first image.