US 12,008,057 B2
	Determining a visual theme in a collection of media items
Kristina Bohl, Boulder, CO (US); Ivan Oropeza, Santa Clara, CA (US); Lily Berg, Los Angeles, CA (US); Tracy Gu, Mountain View, CA (US); Ethan Schreiber, New York City, NY (US); Shanfeng Zhang, Mountain View, CA (US); Howard Zhou, Mountain View, CA (US); David Hendon, San Francisco, CA (US); Zhen Li, Mountain View, CA (US); Futang Peng, Mountain View, CA (US); Teresa Ko, Los Angeles, CA (US); and Jason Chang, Cupertino, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 25, 2021, as Appl. No. 17/509,767.
Claims priority of provisional application 63/189,658, filed on May 17, 2021.
Claims priority of provisional application 63/187,390, filed on May 11, 2021.
Prior Publication US 2022/0365990 A1, Nov. 17, 2022
Int. Cl. G06F 16/9535 (2019.01); G06F 16/906 (2019.01); G06F 16/9538 (2019.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)

CPC G06F 16/9535 (2019.01) [G06F 16/906 (2019.01); G06F 16/9538 (2019.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method comprising:

determining, based on pixels of images or videos from a collection of media items, image embeddings for clusters of media items such that the media items in each cluster have a visual similarity, wherein:

each media item is associated with a location and an associated timestamp,

media items captured within a predetermined time period are associated with an episode, and

the collection of media items is associated with a user account;

selecting a subset of the clusters of media items based on:

corresponding media items in each cluster having the visual similarity within a range of threshold visual similarity values; and

corresponding associated timestamps such that the corresponding media items in the subset of the clusters of media items meet a temporal diversity criteria that excludes more than a first predetermined number of the corresponding media items from the episode;

responsive to a number of the corresponding media items in the subset of clusters including more than a second predetermined number of media items removing one or more of the corresponding media items based on location such that the subset of the clusters of media items meets a location diversity criteria;

causing a user interface to be displayed that includes the subset of the clusters of media items;

receiving aggregated feedback from users for aggregated subsets of clusters of media items;

providing the aggregated feedback to a machine-learning model, wherein parameters of the machine-learning model are updated based on the aggregated feedback; and

modifying the image embeddings for the clusters of media items using the parameters of the machine-learning model with the updated parameters.