US 11,748,350 B2
System and method for machine learning for system deployments without performance regressions
Irene Rogan Shaffer, Cambridge, MA (US); Remmelt Herbert Lieve Ammerlaan, Cambridge, MA (US); Gilbert Antonius, Cambridge, MA (US); Marc T. Friedman, Seattle, WA (US); Abhishek Roy, Bellevue, WA (US); Lucas Rosenblatt, Somerville, MA (US); Vijay Kumar Ramani, Boston, MA (US); Shi Qiao, Mercer Island, WA (US); Alekh Jindal, Sammamish, WA (US); Peter Orenberg, Braintree, MA (US); H M Sajjad Hossain, Waltham, MA (US); Soundararajan Srinivasan, Cambridge, MA (US); Hiren Shantilal Patel, Bothell, WA (US); and Markus Weimer, Kirkland, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Apr. 3, 2020, as Appl. No. 16/840,205.
Claims priority of provisional application 62/979,808, filed on Feb. 21, 2020.
Prior Publication US 2021/0263932 A1, Aug. 26, 2021
Int. Cl. G06F 11/34 (2006.01); G06F 16/2453 (2019.01); G06F 16/901 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/24542 (2019.01) [G06F 11/3466 (2013.01); G06F 16/9024 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by a computing system, the method comprising:
determining, for each of a plurality of query sets in a plurality of query jobs, a query set feature representation, based on a first intermediate representation of features corresponding to each of a plurality of default query plan graphs and a second intermediate representation of features corresponding to each of a plurality of respective optimized query plan graphs;
executing a first subset of the query sets;
generating execution data of each query set of the first subset, the first subset selected based on a difference, determined by a difference model, between a default value of a performance metric and an optimized value of the performance metric associated with each of the query sets with respective ones of the query set feature representations; and
deploying, to a query optimizer in a database host, optimized query models corresponding to a second subset of the query sets that is selected from the first subset based on a retrained version of the difference model that is retrained according to the execution data.