Machine learning breakthroughs from Google’s TPU

Pro

(Image: Stockfresh)

7 April 2017

Google is nothing if not ambitious about its machine learning plans. Around this time last year, it unveiled its custom Tensor Processing Unit (TPU) hardware accelerator designed to run its TensorFlow machine learning framework at world-beating speeds.

Now, the company is providing details of exactly how much juice a TPU can provide for machine learning, courtesy of a paper that delves into the technical aspects. The info shows how Google’s approach will influence future development of machine learning powered by custom silicon.

Google’s TPUs, like GPUs, address division of machine learning labour
Machine learning generally happens in a few phases. First you gather data, then you train a model with that data, and eventually you make predictions with that model. The first phase does not typically require specialised hardware. Phase two is where GPUs come into play; in theory you can use a GPU for phase three as well.With Google’s TPU, phase three is handled by an application-specific integrated circuit (ASIC), which is a custom piece of silicon designed to run a specific program. ASICs are good at integer calculations, which are needed when making predictions from models, while GPUs are better at floating-point math, which is vital when training models. The idea is to have specialised silicon for each aspect of the machine learning process, so each specific step can go as fast as possible.

Strictly speaking, this is not a new approach, it is an extension of the pattern developed when GPUs were brought into the mix to speed up training. Google demonstrates a method to take the next steps with that paradigm, especially as hardware becomes more flexible and redefinable.
Google’s TPU hardware is secret—for now
For an operation with Google’s scale and finances, custom hardware provides three advantages: it is faster, it solves the right problem at the right level, and it provides a competitive edge the company can share, albeit on its own terms.Right now, Google is using this custom TPU hardware to accelerate its internal systems. The feature is not yet available through any of its cloud services, and do not expect to be able to buy the ASICs and deploy them in your boxes.

The reasons are straightforward enough. Firstly, anything that provides Google with a distinct competitive advantage is going to be kept as close to the vest as possible. TPUs allow machine learning models to run orders of magnitude faster and more efficiently, so why give away or even sell the secret sauce?

Secondly, Google offers items to the public only after they have been given a rigorous internal shakedown. It took years for Kubernetes and TensorFlow to become publicly available, both of which Google had used extensively inside the company (though in somewhat different forms).

If anything from the TPU efforts makes it to public use, it will be through the rent-in-the-cloud model—and odds are it will be a generation behind whatever the company is working on internally.
Google’s custom-silicon approach isn’t the only one
Google elected to create its own ASICs, but there’s another possible approach to custom silicon for running machine learning models: FPGAs, processors that can be reprogrammed on the fly.FPGAs can perform math at high speed and with high levels of parallelism, both of which machine learning needs at most any stage of its execution. FPGAs are also cheaper and faster to work with than ASICs out of the box, since ASICs have to be custom-manufactured to a spec.

Microsoft too, has realised the possibilities provided by FPGAs and unveiled server designs that employ them. Machine learning acceleration is one of the many duties that hardware could take on.

That said, FPGAs are not a one-to-one solution for ASICs, and they cannot be dropped into a machine learning pipeline as-is. Also, there are not as many programming tools for FPGAs in a machine learning context as there are for GPUs.

It is likely that the best steps in this direction will not be toolkits that enable machine learning FPGA programming specifically, but general frameworks that can perform code generation for FPGAs, GPUs, CPUs, or custom silicon alike. Such frameworks would have more to work on if Google offers its TPUs as a cloud resource, but there are already plenty of targets to be addressed immediately.
Barely scratching the surface with custom machine learning silicon
Google claims in its paper the speedups possible with its ASIC could be further bolstered by using GPU-grade memory and memory systems, with results anywhere from 30 to 200 times faster than a conventional CPU/GPU mix. That is without addressing what could be achieved by, say, melding CPUs with FPGAs, or any of the other tricks being hatched outside of Google.It ought to be clear by now that custom silicon for machine learning will drive the development of both the hardware and software sides of the equation. It’s also clear Google and others have barely begun exploring what’s possible.

IDG News Service

Machine learning breakthroughs from Google’s TPU

Sign up for the Technology Minute

Support our advertisers

Listen to Tech Radio

Most Popular