

For chips that are capable of performing both operations, the two phases are still generally performed independently. Generally speaking, NPUs are classified as either training or inference. NPUs may be part of a large SoC, a plurality of NPUs may be instantiated on a single chip, or they may be part of a dedicated neural-network accelerator.

NPUs are designed to accelerate the performance of common machine learning tasks such as image classification, machine translation, object detection, and various other predictive models. For those reasons, many custom-designed dedicated neural processors have been developed.Ī neural processing unit (NPU) is a well-partitioned circuit that comprises all the control and arithmetic logic components necessary to execute machine learning algorithms. Moreover, unlike a GPU, NPUs can benefit from vastly simpler logic because their workloads tend to exhibit high regularity in the computational patterns of deep neural networks. Unlike traditional central processing units which are great at processing highly serialized instruction streams, machine learning workloads tend to be highly parallelizable, much like a graphics processing unit. Many such convolutions are found in a single model and the model itself must be executed on each new input (e.g., every camera frame capture). The large number of iterations comes from the fact that for each given input (e.g., image), a single convolution comprises of iterating over every channel, and then every pixel, and then performing a very large number of MAC operations. Executing deep neural networks such as convolutional neural networks means performing a very large amount of multiply-accumulate operations, typically in the billions and trillions of iterations.
