Neural Networks Take on Embedded Vision

Synopsys EV52 EV54

Bernard Cole

EE Times

Synopsys convolutional neural network coprocessor lowers power for vision processing.

The growth in embedded vision systems – systems that extract meaning from visual inputs – is driving demand for more performance- and power-efficient vision-processing capabilities. Many companies have risen to respond to this demand: AMD, CEVA, Imagination, Intel, Nvidia, and various ARM licensees. They use a variety of hardware: FPGAs, FPGA/MPU combinations, graphics processing units, and specialized heterogeneous multicore designs optimized for the task.

Now Synopsys Inc. (Mountain View, CA) has released its alternative solution, the DesignWare EV processor core family (shown below), designed to be integrated into an SoC with any of a number of host CPUs, including those from ARM, Intel, Imagination MIPS, PowerPC and others. It currently includes two members, the EV52 and EV54, optimized for vision computing applications. Fabricated using a 28-nanometer process, the EV52 features a dual-core RISC processor based on the company's ARC instruction set, operating at up to 1GHz. The EV54 features a quad-core implementation offering higher performance than the EV52. Both incorporate anywhere from two to eight programmer configurable object detection engine processing elements (PEs).

Neural Networks Take on Embedded Vision
Synopsys vision processor combines ARC-based RISC cores with convolutional
neural network detection engine processing elements. (Source: Synopsys).

The EV52 and EV54 are optimized for vision computing applications using convolutional neural network (CNN) algorithms, which draw their inspiration from the way humans process visual information. CNNs make use of feed-forward artificial neural networks in which individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. Such overlap is key to the way the human eye tracks movement, recognizes changes in the environment, discriminates between objects, and responds to subtle changes in facial expressions.

In an interview with EE Times, Mike Thompson, Synopsys's senior manager of product marketing for the DesignWare ARC processors, said the EV processor family is designed to perform the CNN calculations at more than 1,000 GOPS/W (giga, or billions, of operations per second per watt), enabling fast and accurate detection of a wide range of objects at a fraction of the power consumption of competing vision solutions.

"While there are a number of vision recognition algorithms competing for attention, CNN in our view has been the one that has been most substantially improved and is currently best at the kind of object recognition we see in target applications such as cameras, wearables, home automation, DTV, virtual reality, games, robotics, digital signage, medical, and auto infotainment," said Thompson.

In that conclusion, Synopsys is in agreement with most other players in the embedded vision market, including Nvidia, CEVA, Microsoft, and others. But while it is possible to get accuracy results that exceed 95 percent with CNN, the problem is getting that accuracy in a power/performance range that is acceptable in the marketplace. Thompson said general purpose processors (GPPs) can be used for vision processing but they are very slow because they lack complex math resources, while graphics processing units (GPUs) have the prerequisite math resources but lack the ability to efficiently move vision data. So their vision performance is relatively low and their power consumption is very high.

Neural Networks Take on Embedded Vision
Measured in terms of billions of operations per watt, performance efficiency of EV
processor (far right) outstrips other alternatives for visual object detection
and analysis. (Source: Synopsys).

"The coprocessor approach we have come up will bring CNN down to the affordable range and at power consumption levels that are manageable in a range of consumer applications," said Thompson, pointing to comparisons the company has made using a suite of typical object and gesture recognition applications (shown in chart above). In their comparisons the EV processors performed roughly equivalent vision tasks at power consumption levels that were 5X lower than other vision solutions. Performing a face detection task on a 30-frames-per-second video requires only 175 milliwatts with an EV processor enabled SoC, versus at least 8 to 10 times that required by a GPU, he said.

How Synopsys does visionary CNN

Designed to integrate into a system-on-chip, the Synopsys makes use of one or more EV processors operating in parallel with the host and synchronizing with it. It does this via a sophisticated and efficient set of message-passing and interrupt mechanisms that allow communications between various convolution object detection engine-processing elements and the other processor cores (see chart below). The ARC EV processor can be programmed to operate autonomously from the host processor, or the developer can chose to exercise as much control and function sharing between the EV processor and the host as the application needs to meet specific power/performance constraints.

Neural Networks Take on Embedded Vision
At the heart of an EV processor are object detection engines containing two to eight
specialize processing elements. (Source: Synopsys).

"The number of PEs is configured by the user at build time, as is a streaming interconnect network between the PEs, which feature flexible point-to-point connections between all of the PEs. Each can be changed dynamically, depending on the CNN graph being executed, on the object detection engine", said Thompson.

The architecture has been implemented such that the EV processor memory maps are completely accessible to the host. This allows the host to maintain control while allowing all vision processing to be offloaded to the EV units both to reduce power and to accelerate key vision tasks.

Just as important, said Thompson, this approach makes it possible for the various vision processing elements to communicate results in real time back to the host. To make the communications to and from the host processor and between each other even more efficient, each of the EV processors can access image data stored in a memory-mapped area of the SoC or from off-chip sources independently via a built-in AMBA AXI standard system interface if required.

Providing software for CNN development

Given the complex nature of the CNN approach, coming up with the right algorithmic mix for a specific vision processing application is still a difficult task even with the EV hardware, said Thompson. To relieve developers of some of that responsibility, Synopsys has provided a full suite of tools and libraries, along with available reference designs, to allow developers to efficiently build, debug, profile and optimize their embedded vision systems using the two industry-standard, open-source, embedded vision tool chains: OpenCV and OpenVX.

Included in the package a developer gets with the ARC EV processor is an optimized library of more than 2500 OpenCV functions for real-time computer vision. Also supplied is an OpenVX framework with 43 standard computer-vision kernels for such things as edge detection, image pyramid creation, and optical flow estimation, all optimized to run on the EV Processors.

Thompson also said that because the EV processors are programmable, they can be trained to support any object detection graph as well as allow definition of new OpenVX kernels. An OpenVX runtime distributes tiled kernel execution over the EV Processors’ multiple execution units, simplifying the programming of the processor.

The ARC EV processors are delivered and configured with the ARChitect tool for use in designing the EV cores. The tool synthesizes RTL that can be incorporated into any SoC design in support of any host processor including ARM, Intel, Imagination MIPS, PowerPC and others. To further accelerate software development, virtual prototyping models will be available for the EV processors as well as support for FPGA-based prototyping to allow hardware and software co-design well ahead of chip fabrication.

“Embedded vision is a fast changing environment," Thompson said. "Right now, it seems as if CNN is the best way to go. But that could change tomorrow. In addition to getting a solution that met the costs and power requirements of today's applications, we wanted to provide developers a way to change their designs in mid-stream, without going back to zero."

eetimes.com