US 11,914,993 B1
Example-based synthesis of rules for detecting violations of software coding practices
Pranav Garg, Secaucus, NJ (US); Sengamedu Hanumantha Rao Srinivasan, Seattle, WA (US); Benjamin Robert Liblit, Leesburg, VA (US); Rajdeep Mukherjee, San Jose, CA (US); Omer Tripp, San Jose, CA (US); and Neela Sawant, Bangalore (IN)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2021, as Appl. No. 17/364,768.
Int. Cl. G06F 8/77 (2018.01); G06N 20/00 (2019.01)
CPC G06F 8/77 (2013.01) [G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
one or more computing devices;
wherein the one or more computing devices include instructions that upon execution on or across the one or more computing devices cause the one or more computing devices to:
obtain a first collection of code example pairs, wherein a particular code example pair of the first collection comprises a positive source code example and a negative source code example, wherein the positive source code example utilizes a first recommended coding technique to achieve a programming objective, and wherein the negative source code example is directed to the programming objective and does not utilize the first recommended coding technique;
generate respective per-example transformed representations of individual ones of the source code examples of the first collection, wherein an individual per-example transformed representation indicates (a) at least a portion of a data flow within a source code example and (b) at least a portion of a control flow within the source code example;
construct, using at least the respective per-example transformed representations, an aggregate representation of the first collection, wherein the aggregate representation includes a plurality of nodes and a plurality of edges, wherein individual ones of the nodes correspond to respective source code elements present in at least some code example pairs of the first collection of code example pairs, wherein an edge linking a first node of the plurality of nodes to a second node of the plurality of nodes represents a dependency detected between a source code element corresponding to the first node and a source code element corresponding to the second node;
determine, using one or more machine learning models to which at least a portion of the aggregate representation is provided as input, a rule to automatically detect whether a target set of source code includes one or more code examples which do not utilize the first recommended coding technique, wherein the rule comprises a plurality of predicates associated with respective nodes of the aggregate representation, including at least one predicate identified after a split is introduced into a decision tree for the rule, and wherein the split is introduced based at least in part on a determination that a proposed version of the rule is insufficient to distinguish at least some positive source code examples from corresponding negative source code examples; and
provide, based at least in part on a result of applying the rule to a particular set of source code, an indication via one or more programmatic interfaces that the particular set of source code does not utilize the first recommended coding technique.