US 11,861,331 B1
Scaling high-level statistical languages to large, distributed datasets
Murray M. Stokely, Mountain View, CA (US); and Karl Millar, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 1, 2019, as Appl. No. 16/264,867.
Application 16/264,867 is a continuation of application No. 15/367,326, filed on Dec. 2, 2016, granted, now 10,203,936.
Application 15/367,326 is a continuation of application No. 13/918,615, filed on Jun. 14, 2013, granted, now 9,542,462, issued on Jan. 10, 2017.
Claims priority of provisional application 61/659,731, filed on Jun. 14, 2012.
Int. Cl. G06F 8/30 (2018.01); G06F 16/25 (2019.01); G06F 8/41 (2018.01)
CPC G06F 8/314 (2013.01) [G06F 8/443 (2013.01); G06F 8/4435 (2013.01); G06F 16/254 (2019.01); G06F 16/258 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method abstracting an implementation of parallel operations, the method comprising:
accessing, by one or more computers in communication with a distributed computing system, a parallel data collection class, wherein the parallel data collection class is configured to abstract away (i) details of how data is represented for the parallel data collection class and (ii) an implementation strategy fora set of parallel operations;
determining, by the one or more computers, an internal execution plan based on the set of parallel operations;
generating, by the one or more computers, a revised execution plan by integrating two or more parallel operations of the set of parallel operations together by:
applying one or more graph combination transformations that combine parallel operations of the set of parallel operations together into a smaller number of combined operations; or
applying one or more graph fusing transformations that fuse sequential parallel operations of the set of parallel operations together into a smaller number of fused operations;
determining, by the one or more computers, a total size of the set of parallel operations;
determining, by the one or more computers, whether the total size of the set of parallel operations satisfies a threshold;
when the total size of the set of parallel operations satisfies a threshold, executing, by the one or more computers, each parallel operation of the internal execution plan based on the implementation strategy; and
when the total size of the set of parallel operations fails to satisfy the threshold, causing, by the one or more computers, the distributed computing system to execute each parallel operation of the revised execution plan based on the implementation strategy.