US 12,455,810 B2
Systems and methods for emulating and testing data flows in distributed computing systems
Winston Wencheng Liu, Woodland Hills, CA (US); Konstantin Belov, Thousand Oaks, CA (US); Ankur Pankaj Sheth, Agoura Hills, CA (US); and Dan Mihailescu, Mogosoaia (RO)
Assigned to KEYSIGHT TECHNOLOGIES, INC., Santa Rosa, CA (US)
Filed by Keysight Technologies, Inc., Santa Rosa, CA (US)
Filed on Apr. 5, 2023, as Appl. No. 18/131,276.
Claims priority of application No. a 2023 00166 (RO), filed on Apr. 4, 2023.
Prior Publication US 2024/0338307 A1, Oct. 10, 2024
Int. Cl. G06F 11/34 (2006.01); G06F 9/50 (2006.01); G06F 11/3668 (2025.01)
CPC G06F 11/3684 (2013.01) [G06F 11/3688 (2013.01); G06F 11/3692 (2013.01); G06F 9/5077 (2013.01); G06F 9/5083 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A system comprising:
at least one hardware processor;
a workload abstractor implemented using the at least one hardware processor and configured for:
receiving monitored traffic in a distributed computing system performing a machine learning task, wherein the machine learning task includes using the distributed computing system to train a machine learning model or perform inferencing using the machine learning model;
generating, using the monitored traffic, a test environment-agnostic workload model for the machine learning task, wherein generating, using the monitored traffic, the test environment-agnostic workload model for the machine learning task comprises removing one or more deployment-specific dependencies and attributes from the monitored traffic, wherein removing the one or more deployment-specific dependencies includes removing attributes relating to network configuration used by the distributed computing system; and
storing the test environment-agnostic workload model in a workload model repository with one or more other workload models; and
a test controller implemented using the at least one hardware processor and configured for:
selecting a test case for the machine learning task and a testbed mode for the test case;
executing the test case by translating the test environment-agnostic workload model into a testbed-specific workload model for the testbed mode, including generating an input feed stream and providing the input feed stream to a testbed corresponding to the testbed mode to test at least one aspect of a machine learning cluster that executes the machine learning task and uses a different transport or network topology than the distributed computing system; and
reporting, based on executing the test case, one or more performance metrics for the machine learning task.