US 12,111,778 B2
Image processing accelerator
Mihir Mody, Bengaluru (IN); Niraj Nandan, Plano, TX (US); Hetul Sanghvi, Richardson, TX (US); Brian Chae, Duluth, GA (US); Rajasekhar Reddy Allu, Plano, TX (US); Jason A. T. Jones, Richmond, TX (US); Anthony Lell, San Antonio, TX (US); and Anish Reghunath, Plano, TX (US)
Assigned to TEXAS INSTRUMENTS INCORPORATED, Dallas, TX (US)
Filed by TEXAS INSTRUMENTS INCORPORATED, Dallas, TX (US)
Filed on Dec. 21, 2021, as Appl. No. 17/558,252.
Application 17/558,252 is a continuation of application No. 16/995,364, filed on Aug. 17, 2020, granted, now 11,237,991.
Application 16/995,364 is a continuation of application No. 16/234,508, filed on Dec. 27, 2018, granted, now 10,747,692, issued on Aug. 18, 2020.
Prior Publication US 2022/0114120 A1, Apr. 14, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 13/16 (2006.01); G06F 13/28 (2006.01); G06T 1/20 (2006.01); H04N 5/765 (2006.01)
CPC G06F 13/1668 (2013.01) [G06F 13/28 (2013.01); G06T 1/20 (2013.01); H04N 5/765 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A processing accelerator, comprising:
a thread scheduler;
a stream accelerator coupled to the thread scheduler;
a shared memory; and
a memory controller coupled to the shared memory;
wherein the stream accelerator includes processing circuitry, and a load/store engine coupled to the processing circuitry, the load/store engine including a buffer, and shared memory access circuitry.
 
17. A processing accelerator, comprising:
a thread scheduler;
a hardware accelerator coupled to the thread scheduler;
a shared memory having a depth that is configurable; and
a memory controller coupled to the shared memory;
wherein the depth of the shared memory is configurable based on one of a size of data, a format of data, and a transfer latency.
 
18. A method, comprising:
receiving, by a first hardware accelerator, a first set of data;
processing, by the first hardware accelerator, the first set of data to generate a first output data;
storing, by the first hardware accelerator, the first output data in a shared memory coupled to a second hardware accelerator;
retrieving, by the second hardware accelerator, the first output data from the shared memory;
processing, by the second hardware accelerator, the first output data to produce a second output data;
storing, by the second accelerator, the second output data in the shared memory; and
synchronizing, by a scheduler coupled to the first hardware accelerator and the second hardware accelerator, the retrieving of the first output data stored in the shared memory by the second hardware accelerator based on availability of the first output data in the shared memory.
 
20. A method, comprising:
receiving by a first hardware accelerator, a first set of data;
processing, by the first hardware accelerator, the first set of data to generate first output data;
storing, by the first hardware accelerator, the first output data in a shared memory coupled to a second hardware accelerator;
retrieving, by the second hardware accelerator, the first output data from the shared memory;
processing, by the second hardware accelerator, the first output data to produce second output data; and
storing, by the second accelerator, the second output data in the shared memory;
wherein the shared memory includes a first variable depth circular buffer accessible by the first hardware accelerator; and
wherein the shared memory includes a second variable depth circular buffer accessible by the second hardware accelerator.