US 12,020,001 B2
	Vector operation acceleration with convolution computation unit
Xiaoqian Zhang, San Jose, CA (US); Zhibin Xiao, Los Altos, CA (US); Changxu Zhang, Santa Clara, CA (US); and Renjie Chen, Mountain View, CA (US)
Assigned to Moffett International Co., Limited, Hong Kong (HK)
Filed by MOFFETT INTERNATIONAL CO., LIMITED, Hong Kong (HK)
Filed on Apr. 3, 2023, as Appl. No. 18/130,311.
Application 18/130,311 is a continuation of application No. 17/944,772, filed on Sep. 14, 2022, granted, now 11,726,746.
Prior Publication US 2024/0086151 A1, Mar. 14, 2024
Int. Cl. G06F 7/544 (2006.01); G06F 7/50 (2006.01)

CPC G06F 7/5443 (2013.01) [G06F 7/50 (2013.01)]

18 Claims

1. A neural network accelerator, comprising:

an instruction decoder configured to decode a neural network computation instruction from a processor into a weight load control signal, an activation load control signal, and a compute control signal;

a plurality of weight selectors configured to obtain weights according to the weight load control signal, wherein the weight load control signal indicates whether to obtain the weights from a weight cache or from a weight generator;

a plurality of activation selectors configured to obtain activations or vectors from a memory according to the activation load control signal, wherein the activation load control signal indicates whether to obtain the activations or the vectors; and

a plurality of lanes of circuits, each lane of circuits being configured to:

receive the weights obtained by the plurality of weight selectors and the activations or the vectors obtained by the plurality of activation selectors,

determine whether to perform convolution operations or vector operations according to the compute control signal, and

perform the convolution operations or vector operations based on the weights and the activations or the vectors to generate output data; and

wherein the instruction decoder is further configured to, in response to the weights have a pattern, instruct the plurality of weight selectors to obtain the weights from the weight generator rather than obtaining the weights from the weight cache to reduce memory access.