US 12,153,539 B1
	GPU-optimized append operation with latch-free write combining on shared memory
Kangnyeon Kim, Dublin, CA (US); Weiwei Gong, Belmont, CA (US); James Kearney, Bolton, MA (US); and Harshada Chavan, Redwood Shores, CA (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on May 23, 2023, as Appl. No. 18/201,073.
Int. Cl. G06F 15/16 (2006.01); G06F 15/167 (2006.01)

CPC G06F 15/167 (2013.01)

20 Claims

1. A computer-implemented method comprising:

executing an append operation using a plurality of threads on a plurality of streaming multiprocessors of a graphical processing unit, wherein:

each streaming multiprocessor within the plurality of streaming multiprocessors has an associated shared memory;

each shared memory is partitioned into a plurality of write combine buffers (WCBs);

a global memory is accessible by the plurality of streaming multiprocessors;

the append operation writes results into a result buffer in the global memory;

executing the append operation comprises:

claiming, by a given thread within the plurality of threads having a result to write, a portion of a selected WCB in shared memory;

writing, by the given thread, the result to the portion of the selected WCB; and

in response to a flush condition being met for the selected WCB, copying contents of the selected WCB to the result buffer in global memory.