US 11,989,580 B2
	System and method to accelerate reduce operations in graphics processor
Yong Jiang, Shanghai (CN); Yuanyuan Li, Shanghai (CN); Jianghong Du, Shanghai (CN); Kuilin Chen, Hillsboro, OR (US); and Thomas A. Tetzlaff, Portland, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Mar. 10, 2021, as Appl. No. 17/197,304.
Application 17/197,304 is a continuation of application No. 16/066,652, granted, now 10,949,251, previously published as PCT/CN2016/078265, filed on Apr. 1, 2016.
Prior Publication US 2021/0334127 A1, Oct. 28, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/48 (2006.01); G06F 9/30 (2018.01); G06F 9/52 (2006.01); G06T 1/20 (2006.01); G06F 8/41 (2018.01)

CPC G06F 9/4843 (2013.01) [G06F 9/3009 (2013.01); G06F 9/522 (2013.01); G06T 1/20 (2013.01); G06F 8/458 (2013.01); G06F 9/30087 (2013.01)]

20 Claims

1. An apparatus comprising:

one or more processors to:

receive a barrier message associated with a barrier synchronization request, the barrier message including one or more source operands and an identifier of a specified operation; and

perform an operation including a merged write-barrier-read operation in response to the barrier synchronization request from a set of threads in a work group and synchronize the set of threads, wherein the merged write-barrier-read operation is performed to enable accelerated reduction operations associated with a set of map operations to write local data for global calculation, and the merged write-barrier-read operation is further performed to facilitate completion of atomic operations and ready the global calculations to determine one or more convergence conditions.