US 12,277,504 B1
Resource-level classification using language models
Andrey Nikitin, Kiryat Ono (IL); Guye Vered, Rishon Letzion (IL); Netta Simhi, Tel Aviv (IL); Inbar Polad, Ramat Gan (IL); Hadas Daniel, Tel Aviv (IL); Yuval Goldberg, Ramat Gan (IL); Dvir Horovitz, Givaatayim (IL); Michal Shaked, Nofit (IL); Itay Rutman, Tel Aviv (IL); Shiran Bareli, Tel Aviv (IL); Yotam Segev, New York, NY (US); Itamar Bar-Ilan, New York, NY (US); and Yonatan Itai, Tel Aviv (IL)
Assigned to Cyera, Ltd., Tel Aviv (IL)
Filed by Cyera, Ltd., Tel Aviv (IL)
Filed on Nov. 22, 2024, as Appl. No. 18/956,386.
Int. Cl. G06N 3/02 (2006.01); G06N 3/091 (2023.01)
CPC G06N 3/091 (2023.01) 8 Claims
OG exemplary drawing
 
5. A system for machine learning classifier training, comprising:
a processing circuitry; and
a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to:
refine outputs of a language model by providing a prompt and a set of first resources to the language model over a plurality of iterations, wherein outputs of the language model generated by the language model at each iteration of the plurality of iterations include a plurality of classifications for the set of first resources input to the language model at the iteration, wherein each iteration of the plurality of iterations further includes determining an accuracy for the plurality of classifications output by the language model at the iteration based on a semantic similarity between the plurality of classifications output by the language model at the iteration for the set of first resources and a plurality of corresponding reference classifications for the set of first resources;
apply the language model to a set of second resources when the outputs of the language model have been refined, wherein the language model outputs a set of classifications for the set of second resources;
label training data with respect to the set of second resources based on the set of classifications output by the language model for the set of second resources in order to create a set of labeled training data; and
train a classifier machine learning model via supervised machine learning based on the set of labeled training data in order to produce a trained classifier machine learning model.