Cray brings Hadoop to supercomputing

Pro

20 November 2013

Helping scientific supercomputing take advantage of emerging big-data technologies, high-performance computing manufacturer Cray is releasing a set of packages promising to optimise the process of running Hadoop on the company’s XC30 machines.

The Cray Framework for Hadoop, along with the Cray Performance Pack for Hadoop, provides a set of tools and best practices for configuring and optimising an XC30 to run Hadoop for scientific big-data-style projects, according to the company.

Hadoop’s Java-based MapReduce model of data analysis could bring a number of benefits to supercomputing, though it has not found widespread acceptance in that community yet, even though both deploy parallel processing and extremely large data sets.

Cray has seen some interest in Hadoop from its users, though the open-source data processing platform was not set up to meet most scientific supercomputing use cases, said Bill Blake, chief technical architect of Cray, in a statement.

Hadoop’s approach of bringing the computation to the data differs from the traditional supercomputing approach of moving the data to the processors.

Traditional supercomputing scientific number-crunching tends to rely on large hierarchical file formats and libraries for boosting rates of input/output (I/O), neither of which Hadoop was geared well for handling. Scientific computing relies on parallel file systems and fast interconnects typically not found in Hadoop deployments.

Scientific workloads also tend to have more complex workflows, incorporating both scientific compute and analytics workloads. Data models are also co-mingled with math models in scientific computing, also not the norm for Hadoop.

The Cray Framework for Hadoop and the Cray Performance Pack for Hadoop will address these issues, allowing users to get the most computational power out of the XC30s for Hadoop jobs, according to the company.

An update to the performance pack, to be made available in early 2014, will also include additional system code to optimise the XC30’s use of the Lustre file system library and the Aries system interconnect used on Cray machines.

The XC30 is Cray’s premier supercomputer, featuring integrated servers and switches, the Lustre parallel file system, Aries high-speed interconnects, an innovative cooling system, and the Dragonfly network topology for minimising locality constraints.

Cray announced the packages at the SC2013 supercomputing conference, being held this week in Denver.

Cray also announced that it is upgrading the University of Stuttgart’s XC30, nicknamed “Hornet,” so it will offer more than 7 petaflops (quadrillion mathematical calculations per second) of processing power.

 

Techworld.com

Read More:


Back to Top ↑

TechCentral.ie