In the past two years, artificial intelligence has morphed from academic marvel to global megatrend. Machine learning in some form is set to revolutionize almost everything — consumer, automotive, industrial, every area of electronics — and, beyond that, to affect society and our lives in ways we don’t yet know about.
What this means for the industry is that practically every processor vendor has identified machine learning as a goose that will lay golden eggs. The race is on to position one’s own approach as the right solution to accelerate specific workloads in the area that holds the most potential: machine learning outside the data center, or AI at the edge.
AI at the edge holds so much promise because it can be applied to practically every electronic device, from self-driving vehicles that see pedestrians in the road to coffee makers that respond to voice commands. Applications that require any combination of latency, data privacy, low power, and low cost will eventually migrate to AI inference at the edge (note that “Edge AI” is the sole AI point on Gartner’s graph that is closer than five years out).
The workloads for AI inference are specific: They require massively parallel processing on huge amounts of low-precision data, with access to memory becoming the bottleneck. Most, if not all, types of processors out there are trying to bend themselves to fit these requirements.
The incumbent technology is the GPU — effectively a one-company segment today. By pure luck, the GPU’s single-instruction, multiple-data (SIMD) architecture, developed to accelerate computer graphics, turned out to be a decent fit for AI workloads. Nvidia is riding this wave as far as it can, developing AI supercomputers for data centers and autonomous driving, plus scaled-down versions for edge devices.
Everyone else wants a piece of this market, too. Vendors of FPGAs, which have long been used to accelerate mathematical algorithms, are refining their offerings to suit edge AI processing. Xilinx is taking the concept of the domain-specific-architecture and running with it, combining programmable logic with other compute types to allow customization of the data flow to new workloads. Lattice, meanwhile, is targeting image processing in low-power devices.
There are also scores of startups pitching their novel architectures as the next big thing. They range from processor-in-memory techniques (Mythic, Syntiant, Gyrfalcon) to processor-near-memory (Hailo); from programmable logic (Flex Logix) to RISC-V cores (Esperanto, GreenWaves); and from the very tiny (Eta Compute) to the hyperscale (Cerebras, Graphcore). The majority are for AI at the edge. Are there enough niches to support them all, when they are up against the likes of Nvidia and Intel? Time will tell.
There is also a breed of startups approaching the problem from the other direction: adapting AI workloads to run more efficiently on traditional hardware such as microcontrollers. Companies like PicoVoice and Xnor are finding new ways to utilize the instruction sets of existing devices to perform matrix multiplication.
Combined with Google’s work on TensorFlow Lite — a compiler that shrinks machine-learning models down to the point where they can fit onto microcontrollers — this will no doubt open the floodgates for things like voice-activated appliances that don’t need to be internet-connected to do inference.
Embedded developers faced with novel accelerator chipsets will have to learn how to use them. Software is a huge piece of this puzzle. The more traditional CPUs, MPUs, and MCUs clearly have a head start here.
While it’s not impossible to build a community of developers around a new software platform, it’s not easy. Nvidia spent a decade building its GPU software platform, CUDA, into the success it is today. Any entrants into this space will need to build libraries and toolkits, as well as work to educate developers through conferences and forums. All of that can strain the limited resources of a startup.
Flexibility is another important ingredient in the recipe for success. While today’s image-processing models rely on convolutional neural networks (CNNs), different types of neural networks are suited to other applications, such as speech recognition, and the academic community is coming up with new neural network concepts all the time. Advanced networks may require more complex data flow schemes. There’s a danger that hardware developed to accelerate today’s CNNs will be too specialized to accelerate future network platforms. The nature of AI workloads should therefore be considered a moving target, with the right balance of flexibility and performance essential for future-proofing.
The battle for this space is just beginning. The winners will be companies that choose the right niche and go after it hard, invest in software stacks and educating the industry, and maintain an element of flexibility for this rapidly evolving sector.
And do it all at the right price, of course. ■