Back in May 2017, Google announced their 2nd generation of the company’s TensorFlow Processing Unit (TPU), now called the Clout TPU. Unlike Google’s first TPU ASIC (application specific integrated circuit), this new chip is designed to support the training of neural networks for AI, as well as the used of trained networks, or inference. Strategically, the technology provides a computation platform tailored to enable the company’s AI-centric global business. It was the first ASIC of its kind, surpassing technological advancements of the other Super 7 companies.
The 4 chip Cloud TPU board forms the building block node for interconnecting 1000s of TPUs in a cluster for researchand cloud services.
There were no visible signs of active cooling, and the company did not disclose power consumption details. (Source: Google)
Current landscape of silicon for AI
There are four major types of technology that can be used to accelerate the training and use of deep neural networks: CPUs, GPUs, FPGAs, and ASICs. The good old standby CPU has the advantage of being infinitely programmable, with decent but not stellar performance. It is used primarily in inference workloads where the trained Neural Network guides the computation to make accurate predictions about the input data item. FPGAs from Intel and Xilinx, on the other hand, offer excellent performance at very low power, but also offer more flexibility by allowing the designer to change the underlying hardware to best support changing software. FPGAs are used primarily in Machine Learning inference, video algorithms, and thousands of small-volume specialized applications. However, the skills needed to program the FPGA hardware are fairly hard to come by, and the performance of an FPGA will not approach that of a high-end GPU for certain workloads.
There are many types of hardware accelerators that are used in Machine Learning today, in training and inference, and in the cloud and
at the edge. (Source: Moor Insights & Strategy)
Technically, a GPU is an ASIC used for processing graphics algorithms. The difference is an ASIC offers an instruction set and libraries to allow the GPU to be programmed to operate on locally stored data—as an accelerator for many parallel algorithms. GPUs excel at performing matrix operations (primarily matrix multiplications, if you remember your high school math) that underlie graphics, AI, and many scientific algorithms. Basically, GPUs are very fast and relatively flexible.
The alternative is to design a custom ASIC dedicated to performing fixed operations extremely fast since the entire chip’s logic area can be dedicated to a set of narrow functions. In the case of the Google TPU, they lend themselves well to a high degree of parallelism, and processing neural networks is an “embarrassingly parallel” workload.
Here’s the catch: designing an ASIC can be an expensive endeavor, costing many hundreds of thousands of dollars and requiring a team of competent engineers. However, an ASIC can provide significant unit price savings over the same function implemented with discrete components on a PCB. This savings simply considers the package and silicon inefficiencies of implementing a function with several discrete components.
Much more innovation to come
Though we have been hearing about almost daily breakthroughs in AI, it is important to remember that the science is still in its infancy and new developments will likely continue at a rapid pace. These advances provide for more efficient systems and lay the foundation for future progress in the field. These necessary advances will propel future innovation but are difficult to quantify in terms of dollars and cents, as well as the potential effects on future revenue and profitability. In the future, we may see more and more companies develop their own ASICs in order to adapt to the rapidly changing AI environment. The AI development boom will continue to foster healthy competition between companies around the world who are hoping to build the chips that will power the AI devices of the future.