By Karl Freund 2020-03-31
Millions of smart surveillance cameras and other edge devices collect massive amounts of data. After performing some local processing, such as compression, the data is streamed to data centers for tasks such as object detection and facial recognition. While this approach typically consumes a ton of bandwidth, adds latency and risks security, it was historically necessitated by the massive amount of computation required for AI. Now, startup Perceive Corporation, a spinout from audio and image chip company Xperi, claims to have put data center class processing into a low cost, 20 milliwatt 7×7 mm chip, delivering 55 TOPS/Watt for edge device AI. Since 2-5 TOPS/Watt is now considered world-class for edge devices, I was initially skeptical. Upon further examination though, it looks like these guys may be on to something interesting here.
What could one do with such a chip?
The company’s CEO, Steve Teig, is a veteran serial entrepreneur. He talks about a wide range of applications that just aren’t possible or affordable today; my favorite revolves around the microwave oven. We are all familiar with today’s complex interface of buttons, most of which we never use, if we even know what they do in the first place. Imagine walking up to your microwave with a plate of spaghetti. After you slide it in, the oven’s camera would see the image, identify the food and the approximate weight, and voila—perfectly cooked pasta! Refrigerators could build your shopping list without having to scan bar codes. Home security sensors could identify the sound of broken glass, alerting authorities to a break-in. Further, the opportunities for wearable devices just boggle the mind.
How good is the chip?
Basically, it is potentially off the charts. Check out the graph in Figure 1, where Perceive plotted the chip’s performance on the Y axis (logarithmic), and the efficiency (in terms of billions of operations per watt) on the X axis. As you can see, the Ergo chip is in a class by itself. But look at how all the competitors are clustered well under 10 TOP/Watt, differentiating on absolute performance—that’s because they all use matrix multiplication, a similar approach to the math that underlies AI deep neural networks today.
How did they do that?
From Figure 1, it’s clear that Perceive must not be playing by the same rules as the rest of the players on the field. Everyone else (with the exception of Movidius) is using some sort of Multiply-Accumulate core to perform the dot-product matrix operations used in training and inference processing of deep neural networks. Mr. Teig acknowledged that Perceive took a completely different approach to DNN processing, although he understandably demurred when asked to describe that secret sauce.
Beyond simple TOPS, though, can you really run a neural network at speed with this chip? The only benchmark the company quoted was its ability to run YoloV3 (You Only Look Once) object detection at up to 246 frames per second. YoloV3 requires some 63 million weights, and the chip runs that without any external memory devices. Since a standard display is 30-60 Hz, the additional bandwidth could be used for a variety of other, simultaneous tasks, such as voice recognition and production. Further more, it could do all of this on the edge device without cloud connectivity.
So, where’s the catch? Since there is no such thing as magic, there must be a catch, right? Well, since the Ergo chip is not executing a sea of MACs, the enabling software must be very different. This starts with training effort—since it must be aware of the nature of the processing to be done at inference time, you can’t just take standard Tensorflow training output, plug it into the chip and go. You need to use a Perceive-enhanced tool chain to train the network (using GPUs) to produce the output needed by the Ergo chip. Users will depend on Perceive to do that work, and hand over the tools to run the models.
All this does not mean that the chip is a one-trick pony, like some of the CNN chips on the market now for smart cameras and automotive sensors. Perceive shared some of the target network models it has been testing, and, as one would expect, it is initially going after the big markets of imaging and audio. Its ability to run multiple networks concurrently will enable newer applications such as conversational queries.
As I suggested in the beginning, this approach looks very interesting. The “catch” of course is the software angle, requiring a custom training workflow. But for many consumer applications, this additional complexity is not a show-stopper. Moreover, I suspect Perceive’s novel approach to DNN processing is only the beginning. Other companies are trying to solve this problem as well, including companies using analog computing approaches and other out-of-the-box thinking.
AI at the edge demands a different approach if it is going to realize its potential. It looks like we are beginning to see some avenues that could lead to a pretty large pot of gold.