US 12,353,921 B2
Massively parallel in-network compute
William Brad Matthews, Los Gatos, CA (US); Puneet Agarwal, Santa Clara, CA (US); and Bruce Hui Kwan, Santa Clara, CA (US)
Assigned to Innovium, Inc., Santa Clara, CA (US)
Filed by Innovium, Inc., Santa Clara, CA (US)
Filed on Dec. 11, 2023, as Appl. No. 18/535,810.
Application 18/535,810 is a continuation of application No. 17/742,354, filed on May 11, 2022, granted, now 11,888,931.
Application 17/742,354 is a continuation of application No. 17/200,463, filed on Mar. 12, 2021, granted, now 11,425,195, issued on Aug. 23, 2022.
Prior Publication US 2025/0190273 A1, Jun. 12, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/50 (2006.01); G06N 20/00 (2019.01); H04L 49/15 (2022.01); H04L 67/10 (2022.01)
CPC G06F 9/5072 (2013.01) [H04L 67/10 (2013.01); G06N 20/00 (2019.01); H04L 49/15 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
generating, by a plurality of compute nodes implemented at least in part with one or more hardware processors, vector chunks having values for a common set of vector elements, the common set of vector elements being common to all vectors generated for a common distributed application, wherein the common set of vector elements includes a plurality of subsets of vector elements, wherein each vector chunk of the vector chunks has a subset of the values for a respective subset in the plurality subsets of vector elements;
sending, to a network switch, the vector chunks over a plurality of switch ports, wherein each vector chunk of the vector chunks is sent to a respective switch port of the plurality of switch ports by a respective compute node in the plurality of compute nodes; and
receiving, from the network switch by each compute node of the plurality of compute nodes, a single result chunk, the single result chunk being formed at the network switch from a subset of vector chunks that were generated by the plurality of compute nodes and sent to the network switch.