Domain range

Ambarella targets AV domain controllers with next-generation AI engine

// php echo do_shortcode (‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’)?>

Illustrating the trend towards domain controllers in autonomous vehicles, Ambarella has launched its CV3 family of AV domain controllers designed to process up to 20 streams of image data at a time. The new SoC family is based on Ambarella’s third generation CVFlow AI engine IP designed for perception, multi-sensor fusion and path planning in L2 + to L4 vehicles.

As vehicle architectures move away from a single electronic control unit per feature towards zonal and larger centralized domain controllers, and more and more vehicle features rely on intensive AI processing. computing, vehicle processors are developing rapidly. Ambarella’s new CV3 family flagship SoC includes an AI accelerator that the company rates at 500 eTOPS (meaning its performance is equivalent to a 500-TOPS GPU). It also includes a vision processor, 16 ARM cores, a GPU and other hardware.

The CV3 can connect and merge information from multiple long-range rooftop cameras, multiple close-range panoramic view cameras, and multiple radar sensors with processing in reserve for other vision processing tasks like surveillance of the driver.

CV3-High supports up to 20 high resolution camera inputs. (Source: Ambarella)
Ambarella CV3 Demo
CV3-High can also process multiple large neural networks simultaneously, including object detection, segmentation, and path planning. (Source: Ambarella)

Ambarella calls its design philosophy “algorithm first”. CTO Les Kohn said EE time the company has studied hundreds of open source networks, its own internal networks, and the client algorithms used with its previous platforms to design the latest generation.

“We have looked at hundreds of networks in all types of architectures, and in doing so, we make sure that the architecture is flexible enough to handle all of these different networks while operating very efficiently,” Kohn said. “Of course, the challenge is how to reconcile flexibility and efficiency? But I think the key is to really study in detail how these networks work.

Overall, the clients’ algorithms were similar enough to allow acceleration with the same engine, he said.

Ambarella CV3 functional diagram
The CVFlow engine features 16 Arm Cortex A78 cores, stereo and dense optical flow processor, image signal processor, video codecs and GPU. (Source: Ambarella)

Ambarella’s CV3-High SoC has an image signal processor capable of operating in difficult lighting and driving conditions. Also included are a stereo and dense optical flow accelerator for processing stereo cameras, 16-core Arm A78AE, including a security island, and video codecs. Finally, a GPU is mainly used for rendering visual representations of the sensor output for parking assistance.

A third generation of the CVFlow accelerator motor is implemented in the series for the first time. Unlike previous generations of the CVFlow engine, it consists of two blocks: a Neural Vector Processor (NVP) to handle AI workloads, and a General Vector Processor (GVP) with floating point support. Computer vision workloads are offloaded from NVP and floating point workloads from Arm processors. For example, the radar processing is managed by the GVP; the perception is then carried out by the NVP. Both blocks are based on an internal IP.

The distribution of workloads between the NVP and the new GVP allows the former to be further optimized for convolution and matrix processing.

The Kohns
The Kohns (Source: Ambarella)

“We have optimized the internal memory system and the interconnection between these systems to eliminate bottlenecks and improve efficiency,” Kohn said. “We have also re-optimized all the data paths inside. So it’s not so much a fundamental change in architecture as it is reworking the details to remove bottlenecks and optimize core network processing.

The NVP version also adds operations common to advanced networks that are just beginning to be used for real-time applications, including graph networks and transformers.

NVP also offers 500 8-bit eTOPS performance or 1000 4-bit eTOPS performance (a more realistic scenario is a mixture of precision used for different network layers, Kohn said). This represents a 42-fold increase in performance compared to Ambarella’s second-generation SoC.

Future devices in the family will adapt the size of the CVFlow engine, the encoding of the image pipeline, and a mix of devices. The software will be transferable throughout the CV3 family for use in entry-level, mid-range and high-end vehicles.

Overall, CV3-High consumes around 50W of power, which is four times the performance per watt of previous generations. These gains were achieved in part through a transition to 5nm processing technology.

The first SoCs of the Ambarella CV3 family are expected to be available for sampling in the first half of 2022.