US 12,265,890 B2
Extracted model adversaries for improved black box attacks
Naveen Jafer Nizar, Chennai (IN); and Ariel Gedaliah Kobren, Cambridge, MA (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Dec. 9, 2020, as Appl. No. 17/115,941.
Prior Publication US 2022/0051134 A1, Feb. 17, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 18/2113 (2023.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 18/2113 (2023.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01)] 22 Claims
OG exemplary drawing
 
1. One or more non-transitory machine-readable media storing instructions which, when executed by one or more processors, cause:
executing a first plurality of queries on a first model to obtain a first set of results corresponding to the first plurality of queries;
generating training data comprising the first plurality of queries and the first set of results corresponding to the first plurality of queries;
applying the training data to train a second model to generate a second set of results in response to a second plurality of queries, the second set of results meeting one or more similarity criteria to a third set of results for the second plurality of queries generated by the first model;
modifying informational content for query execution to include a first set of one or more adversarial perturbations;
executing a query on the second model to generate a fourth set of results based on the modified informational content comprising the first set of one or more adversarial perturbations;
determining that k highest ranked results in the fourth set of results are incorrect; and
responsive to determining that the k highest ranked results in the fourth set of results are incorrect, identifying the query for analysis or improvement of the first model.