US 12,380,569 B1
	Layout extraction system for regional annotation of images
Taesung Park, Palo Alto, CA (US); and Michaël Yanis Gharbi, Palo Alto, CA (US)
Assigned to REVE AI, INC., Palo Alto, CA (US)
Filed by Reve AI, Inc., Palo Mill Rd., CA (US)
Filed on Nov. 19, 2024, as Appl. No. 18/952,835.
Int. Cl. G06T 7/11 (2017.01); G06F 40/40 (2020.01); G06T 5/70 (2024.01); G06T 7/50 (2017.01)

CPC G06T 7/11 (2017.01) [G06F 40/40 (2020.01); G06T 5/70 (2024.01); G06T 7/50 (2017.01); G06T 2210/12 (2013.01)]

23 Claims

21. A system, comprising:

a processor programmed to:

access an input image;

generate a plurality of segments based on one or more segmentation models and the input image, each segment from among the plurality of segments representing a corresponding salient object;

generate a depth map based on a depth estimation model;

layer the plurality of segments, based on the depth map and border regions between pairs of segments, to generate a plurality of ordered segments, wherein to layer the plurality of segments, the processor is programmed to:

generate a pairwise depth ordering of the plurality of segments that considers only the border region between each pair of segments and provides a relative ordering of segments in each pair with respect to one another; and

perform global topological sorting based on the pairwise depth ordering, wherein the respective depth value of each is based on the global topological sorting; and

execute a vision-language model to generate a text annotation of the image based on the plurality of ordered segments.