| CPC G06F 16/338 (2019.01) [G06F 16/334 (2019.01); G06N 20/00 (2019.01)] | 20 Claims |

|
1. A method for searching editing components based on text using a machine learning model, comprising:
acquiring a plurality of visual embeddings indicative of a plurality of visual editing components by the machine learning model, wherein the machine learning model is trained to align visual embeddings with text embeddings by projecting the visual embeddings and the text embeddings into a common space, and wherein the plurality of visual editing components comprise effects configured to be applied to videos;
projecting the plurality of visual embeddings indicative of the plurality of visual editing components into the common space by a first sub-model of the machine learning model;
receiving a text query input by a user;
generating a text embedding indicative of the text query;
projecting the text embedding into the common space by a second sub-model of the machine learning model;
determining at least one visual editing component among the plurality of visual editing components based on the projected text embedding and the plurality of projected visual embeddings in the common space;
displaying information indicative of the at least one visual editing component via a user interface; and
applying the at least one visual editing component to a video in response to user input selecting the at least one visual editing component.
|