US 12,488,034 B2
Searching editing components based on text using a machine learning model
Sijie Zhu, Los Angeles, CA (US); Lu Xu, Los Angeles, CA (US); Fan Chen, Los Angeles, CA (US); and Longyin Wen, Los Angeles, CA (US)
Assigned to Lemon Inc., Grand Cayman (KY)
Filed by Lemon Inc., Grand Cayman (KY)
Filed on Mar. 7, 2024, as Appl. No. 18/599,097.
Prior Publication US 2025/0284723 A1, Sep. 11, 2025
Int. Cl. G06F 17/00 (2019.01); G06F 16/334 (2025.01); G06F 16/338 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/338 (2019.01) [G06F 16/334 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for searching editing components based on text using a machine learning model, comprising:
acquiring a plurality of visual embeddings indicative of a plurality of visual editing components by the machine learning model, wherein the machine learning model is trained to align visual embeddings with text embeddings by projecting the visual embeddings and the text embeddings into a common space, and wherein the plurality of visual editing components comprise effects configured to be applied to videos;
projecting the plurality of visual embeddings indicative of the plurality of visual editing components into the common space by a first sub-model of the machine learning model;
receiving a text query input by a user;
generating a text embedding indicative of the text query;
projecting the text embedding into the common space by a second sub-model of the machine learning model;
determining at least one visual editing component among the plurality of visual editing components based on the projected text embedding and the plurality of projected visual embeddings in the common space;
displaying information indicative of the at least one visual editing component via a user interface; and
applying the at least one visual editing component to a video in response to user input selecting the at least one visual editing component.