High performance computing: the time is now

As costs drop and use cases multiply, HPC is attracting new adopters of all types and sizes, with options such as supercomputer and cluster-based HPC, or cloud services

Pro

The Irish sumpercomputer at NUI Galway, 'Kay'. (Image: NUI Galway)

8 October 2019

In today’s data-driven world, high performance computing (HPC) is emerging as the go-to platform for enterprises looking to gain deep insights into areas as diverse as genomics, computational chemistry, financial risk modelling and seismic imaging. Initially embraced by research scientists who needed to perform complex mathematical calculations, HPC is now gaining the attention of a wider number of enterprises spanning an array of fields.

“Environments that thrive on the collection, analysis and distribution of data – and depend on reliable systems to support streamlined workflow with immense computational power – need HPC,” says Dale Brantly, director of systems engineering at Panasas, an HPC data storage systems provider.

Although adoption by small- and medium-size enterprises remains relatively scarce, the technology holds great potential for organisations that are willing to make the investment in technology and staff expertise.

Typically, HPC use cases are focused on some type of simulation. “The simulation of airflow over a wing, combustion in an engine, planetary weather systems, a nuclear reaction or the valuation of an investment portfolio,” says Kevin Kissell, technical director for HPC and quantum computing in the office of the CTO at Google Cloud. Other uses cases target analytical goals, such as measuring advertising ROI or evaluating a business unit’s performance. Still other use cases can be categorised as translational or transformational. “Like film and video rendering,” he notes.

HPC sans supercomputer

A misconception held by many business and IT leaders is that all HPC systems are supercomputer-based. In fact, while supercomputers produced by firms such as Atos, IBM, HPE/Cray and Fujitsu lie at the heart of numerous specialised HPC systems, a more widely used approach is integrating multiple small computers into an interconnected cluster to provide HPC capabilities. Under such an arrangement, each computer within the cluster serves as a node. Each node is typically equipped with multiple processors, called compute cores, that handle computation tasks. The processors, graphical processing units (GPU) and memory resources within each node are interconnected to create an HPC system.

Since the cost of obtaining and operating a supercomputer and its custom software can easily run into the millions of dollars, the technology remains far beyond the financial reach of most enterprises. Cluster-type HPCs, using relatively inexpensive interconnected computers running off-the-shelf software, are generally more affordable to deploy and operate. Still, even a modestly sized cluster-based HPC can represent a significant investment for most enterprises, particularly those with only limited HPC needs.

This situation is now changing. Enterprises looking to gain HPC access without breaking their IT budgets now have the option of turning to public cloud services, such as Google Cloud, Microsoft Azure, Amazon Web Services (AWS) and IBM Cloud.

“These services enable businesses to have access to HPC capabilities to serve their business needs without investing heavily in the hardware infrastructure of an HPC cluster,” says Maksym Pavlov, .NET technical lead at Ciklum, a digital services and software engineering company. “The emergence of the cloud has sort of leveled the playing field to a certain extent between small companies and big companies,” adds David Turek, IBM’s vice president of exascale computing.

Moving from HPC cluster to cloud HPC

The University of North Carolina at Chapel Hill (UNC-Chapel Hill) has long relied on its on-premises HPC cluster to support research activities in multiple scientific, engineering and medical areas. Yet as research computing needs continue growing, user demand is beginning to outstrip the current system’s compute resources and capacity. Rather than expanding its existing HPC investment, the university decided to turn to the cloud to provide users with an on-demand HPC environment.

The approach proved to be both cost-effective and highly flexible. “With the cloud, we can provision the compute that’s necessary to do the work that’s requested and have that compute for exactly as long as the jobs are required,” says Michael Barker, UNC-Chapel Hill’s interim CIO. “It’s a very effective way to deliver the requirements to run computational work.”

The move to the cloud was both necessary and welcome, says Jeff Roach, a UNC-Chapel Hill senior research associate. “We have a very traditional on-premises cluster,” he says. Yet it was becoming apparent over time that the system was gradually failing to keep pace with a growing number of users requiring leading-edge computing power and faster performance. “We’re finding that our on-premises cluster works really well for the people it was designed for, but some of their edge cases are becoming less edge case,” he says.

With compute-demanding use cases rapidly becoming the norm, UNC-Chapel Hill began working with Google Cloud and simulation and analysis software provider Techila Technologies to map out its journey into cloud HPC. The first step after planning was a proof of concept evaluation. “We took one of the researchers on campus who was doing just a ton of high memory, interactive compute, and we tried to test out his workload,” Roach says. The result was an unqualified success, he notes. “The researcher really enjoyed it; he got his work done.” The same task could have taken up to a week to run on the university’s on-premises cluster HPC. “He was able to get a lot of his run done in just a few hours,” Roach says.

On this side of the Atlantic, the University of York also decided to take a cloud-based HPC approach. James Chong, a Royal Society Industry Fellow and a professor in the University of York’s Department of Biology, notes that HPC is widely used by faculty and students in science departments such as biology, physics, chemistry and computer science, as well as in linguistics and several other disciplines.

Chong’s department is currently using Google Cloud to analyse DNA sequence data. “Specifically, my group is interested in microbiomes, mixed microbial communities that are involved in converting waste material – in our case, sewage sludge – into bio-gas,” he explains. “We use HPC for jig-sawing short DNA sequences back together into a metagenome and then separating out the genomes of the different microbes so that we can understand how these organisms respond to changes in their growth conditions.”

Like his UNC-Chapel Hill counterparts, Chong appreciates the power and flexibility an HPC cloud service can provide. “Our HPC needs to be able to cope with a range of requirements – some users want lots of processors, others need high memory machines,” he says. “As biologists, some of the applications we use become I/O bound very quickly, so ultra-fast disk access is also useful.”

The cloud HPC the university uses also has the ability to adapt to evolving needs. “A number of us are starting to use machine learning techniques and want to be able to leverage different architectures,” Chong notes. “The [university’s] wide range of users means that we also require access to a range of different packages,” he adds. Like most cloud HPCs, the service York uses allows various types of researchers to move between software tools easily and quickly, without wasting time on acquisition, deployment or configuration issues.

HPC with supercomputers

While cloud HPC services offer certain advantages, it’s not always the best or most logical choice for enterprises concerned about security and privacy. “There’s a great sensitivity about where data sits,” Turek observes. “Especially when you look at the GDPR constraints in Europe, for example.”

Addressing both privacy and the need for massive computing power, University of Miami recently opted to invest in a new, on-premises, supercomputer-based HPC system. Most critically, the university believes that research projects with massive multi-dimensional datasets can run much faster on specially designed high-performance supercomputers.

Last August, the school unveiled its new IBM Triton supercomputer, based on Power Systems AC922 servers. More than 2,000 students and faculty are already using the system to work on projects such as climate prediction, genomics, bioinformatics, computer vision and AI work, notes Nicholas Tsinoremas, director of the University of Miami’s Center for Computational Science and vice provost for data and research computing.

The deployment, while successful, ran into some initial road bumps that almost any HPC adopter can expect, regardless of its size, field or computing needs. “Migration issues are always a problem,” Tsinoremas says. The issue of user training and retraining also had to be addressed. “Integration of the new system with legacy storage systems was another challenge,” he notes.

All of these concerns highlight the fact that whether an HPC system is based on-premises or in the cloud, pain-free adoption requires a great deal of planning and preparation. “In-house expertise is necessary, and the institution must have a plan,” Tsinoremas warns. Understanding the nature and requirements of workloads is also important. “In other words, [adopters] need to understand what problems they are trying to solve and how they expect HPC to help solve them,” he says.

Getting started with HPC workloads

Another takeaway is the importance of selecting the right resource management tools, which enable an organisation to access and optimise HPC environments. “Whether you’re purchasing a traditional HPC hardware environment, leveraging HPC in the cloud, or a mix of both, selecting the right HPC workload manager for your job types and throughput requirements is paramount,” says Jérémie Bourdoncle, a senior product management director at Altair, a provider of simulation software and other HPC-related tools and services. A workload manager can automate job scheduling, as well as management, monitoring and reporting functions.

Kissell suggests an adoption strategy that focuses on knowledge, simplicity, options and caution. “It can be a long journey, so plan your trip but give yourself opportunities for course correction,” he advises. Pick a test case that’s simple but representative, and where the knowledge and insights gained from HPC simulations or analysis can be clearly identified. “Then select a short list of software packages designed for your class of problem and try them.”

IDG News Service