[Embedded] Edge AI chip forgoes multiply-accumulate array to reach 55 TOPS/W

By Sally Ward-Foxton 2020-04-02

Source: https://www.embedded.com/edge-ai-chip-forgoes-multiply-accumulate-array-to-reach-55-tops-w/

A silicon valley startup claims it has reinvented the mathematics of neural networks and has produced a complementary edge AI chip, already sampling, which does not use the usual large array of multiply-accumulate units. The chip can run the equivalent of 4 TOPS, with impressive power consumption of 55 TOPS/W, and according to the company, achieves data-center class inference in under 20mW (YOLOv3 at 30fps).

San Jose-based Perceive has been in super-stealth mode until now — as a spin-out from Xperi, it has been funded entirely by its parent since officially forming two years ago. The team is 41 people, with a similar number within Xperi working on apps for the chip. Founding CEO Steve Teig is also CTO of Xperi; he was previously founder and CTO of Tabula, the 3D programmable logic startup that closed its doors five years ago, and prior to that, CTO of Cadence.

Teig explained that the initial idea was to combine Xperi’s classical knowledge of image and audio processing with machine learning. Xperi owns brands such as DTS, IMAX Enhanced and HD Radio — its technology portfolio includes image processing software for features like photo red-eye and image stabilization which are widely used in digital cameras, plus audio processing software for Blu-Ray disc players.
Steve Teig (Image: Perceive)

“We started with a clean sheet of paper, and used information theory to ask: what computations are neural networks actually doing? And is there a different way of approaching that computation that could change what is possible [at the edge]?” Teig said. “After a couple of years of doing this work, we discovered it was, and then decided… we should make a chip that embodies these ideas.”

The idea Teig presented to the Xperi board was to spin out a company to make a chip that could do meaningful inference in edge devices with a power budget of 20mW. The result, a 7x7mm chip named Ergo, can run 4 TOPS without external RAM (in fact, it is running the equivalent of what a GPU rated at 4 TOPS can achieve, Teig explained). Ergo supports many styles of neural networks, including convolutional networks (CNNs) and recurrent networks (RNNs), in contrast with many solutions on the market which are tailored for CNNs. Ergo can even run several heterogeneous networks simultaneously.

“The only thing that limits how many networks we can run is the total memory that’s required for the combination,” Teig said, adding that Perceive has demonstrated simultaneously running YOLOv3 or M2Det — with 60 or 70 million parameters — plus ResNet 28 with several million parameters, plus an LSTM or RNN to do speech and audio processing. In an application, this might correspond to imaging and audio inference at the same time.

Perceive also claims its Ergo chip is extraordinarily power efficient, achieving 55 TOPS/W. This figure is an order of magnitude above what some competitors are claiming. Perceive’s figures have it running YOLOv3, a large network with 64 million parameters, at 30 frames per second while consuming just 20mW.

Perceive claims its Ergo chip’s efficiency is up to 55 TOPS/W, running YOLOv3 at 30fps with just 20mW (Image: Perceive)

This power efficiency is down to some aggressive power gating and clock gating techniques, which exploit the deterministic nature of neural network processing – unlike other types of code, there are no branches, so timings are known at compile time. This allows Perceive to be precise about what needs to be switched on and when.

“In a battery powered setting, [the chip] can be literally off — zero milliwatts — and have some kind of microwatt motion sensor or analog microphone to detect something that might be of interest,” Teig said. “We can wake up from off, load a giant neural network of data center class, and be running it in about 50 milliseconds, including decryption. So we leave only about two frames of video on the floor.”

But careful hardware design is only part of the picture.

Information theory

“We’ve come up with a different way of representing the underlying computation itself and the arithmetic that goes with it,” Teig said. “We are representing the network itself in a new way, and that’s where our advantage comes from.”

Perceive started with information theory — a branch of science that includes mathematical ways to distinguish signal from noise — and used its concepts to look at how much computation is required to pull the signal from the noise. Teig uses an object detection network as an example.

“You hand the network millions of pixels and all you want to know is, is there a dog in this picture or not?” he said. “Everything else in the picture is noise, except dog-ness [the signal]. Information theory makes it quantifiable — how much do you have to know [to tell whether there is a dog in the picture]? You can actually make it precise, mathematically.”

As Teig describes it, mainstream neural networks are able to generalise based on seeing many pictures of dogs because they have found at least some of the signal in the noise, but this has been done in an empirical way rather than with a mathematically rigorous approach. This means noise is carried with the signal, making mainstream neural networks very large, and making them susceptible to adversarial examples and other tricks.

“The more you can be mathematical about figuring out which parts need to be kept and which parts are just noise, the better job you can do at generalization, and the less other overhead you have to carry with you,” Teig said. “I would claim even current neural networks are extracting signal from noise, they’re just not doing it in as rigorous a way and as a result they’re carrying extra weight with them.”

This information-theoretic point of view is the basis for Perceive’s machine learning strategy, which represents neural networks in a new way.

“Really this is a marriage between an information theoretic perspective on how to do machine learning and a chip that embodies those ideas,” Teig said.

Chip Architecture

With Teig’s background as CTO of Tabula, you might expect hardware based on programmable logic, but that’s not the case here.

“I’ve been strongly influenced by thinking about programmable logic for a decade and how to build rich interconnect architectures to enable high-performance, very parallel computation, because much of what happens on an FPGA is also massively parallel, and very intensive in its interaction between computation and memory,” Teig said. “That work has definitely influenced my work at Perceive, but what we have is not programmable logic per se. It’s been influenced by that way of thinking, but the architecture itself is around neural networks.”

Perceive’s neural network fabric is scalable, with initial chip Ergo having four compute clusters, each with its own memory. While exact details are still under wraps, Teig did say that these clusters are significantly different to anything found in other AI accelerators, which typically use arrays of multiply-accumulate units (MACs) to compute dot products of vectors and matrices.

Perceive’s technology is based on reinventing neural network maths using techniques from information theory (Image: Perceive)

“We are not doing that,” Teig said. “We do not have an array of MACs. As a result …we are 20 to 100 X as power efficient as anything else on the market, the reason for that is that everybody else is doing the same thing and we’re not. Our representation of the networks is quite new, and that’s what’s allowed us to achieve such great efficiency. That, plus the machine learning technology that’s able to find this representation of the networks, and to train the networks in a way that makes them compatible with what the chip wants to see.”

Image and Audio

Ergo can support two cameras and includes an image processing unit which works as a pre-processor, handling things like dewarping fisheye lens pictures, gamma correction, white balancing and cropping.

“It’s not fancy, but the pre-processing that’s obviously useful to do in hardware, we do in hardware,” Teig said. “And we have the audio equivalent too — we can take multiple stereo microphones and do beam forming, for example.”

There is also a Synopsis ARC microprocessor with a DSP block that can also be used for pre-processing, plus a security block, also from Synopsis.

“One of the things we’ve done is to encrypt absolutely everything in order to maintain a level of security in an IoT setting. We encrypt the networks, encrypt the code that runs on the microprocessor, encrypt the interfaces, encrypt everything,” Teig said.

The chip features appropriate I/Os for sensors outside image and audio, and it supports an external Flash memory and/or microprocessor which enables over-the-air updates. This could be used to update the neural networks loaded on the chip, or load different networks as required.

Ergo is sampling now along with an accompanying reference board. Mass production is expected in Q2 2020.