US 12,462,334 B2
Neural radiance fields for orthographic imagery
Yuanming Shu, Toronto (CA); and Weiguang Ding, Toronto (CA)
Assigned to 1000786269 ONTARIO INC., Toronto (CA)
Filed by 1000786269 ONTARIO INC., Toronto (CA)
Filed on May 5, 2025, as Appl. No. 19/198,205.
Application 19/198,205 is a continuation of application No. 19/052,473, filed on Feb. 13, 2025.
Claims priority of provisional application 63/746,311, filed on Jan. 17, 2025.
Claims priority of provisional application 63/554,620, filed on Feb. 16, 2024.
Prior Publication US 2025/0322486 A1, Oct. 16, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 3/06 (2024.01); G06T 3/04 (2024.01); G06T 3/4038 (2024.01); G06T 7/50 (2017.01)
CPC G06T 3/06 (2024.01) [G06T 3/04 (2024.01); G06T 3/4038 (2013.01); G06T 7/50 (2017.01); G06T 2200/32 (2013.01); G06T 2207/10032 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20221 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
accessing a plurality of source images depicting a scene from multiple points of view;
encoding at least some of the plurality of source images each into a corresponding series of multiscale feature maps;
receiving a definition of an orthographic projection of the scene; and
generating an orthographic image of the scene corresponding to the orthographic projection, wherein the generating involves:
applying global attention, based on the definition of the orthographic projection, across a set of higher-level features of the series of multiscale feature maps of at least some of the encoded source images, to produce a first set of decoded features;
applying a convolutional and upsampling layer to the first set of decoded features to produce a second set of decoded features;
generating a depth map for the scene corresponding to the orthographic projection of the scene based on the second set of decoded features;
for each point on the depth map corresponding to a pixel that should be rendered in the orthographic image, back-projecting the point through the orthographic projection to determine a set of lower-level features of the series of multiscale feature maps to be used to decode the pixel; and
applying local attention across the set of lower-level features to produce a third set of decoded features from which pixel information can be determined.