SQL-powered MapD 3.0 woos enterprise developers
27 April 2017 | 0
MapD, the SQL database and analytics platform that uses GPU acceleration for performance orders of magnitude ahead of CPU-based solutions, has been updated to version 3.0.
The update provides a mix of high-end and mundane additions. The high-end goodies consist of deep architectural changes that enable even greater performance gains in clustered environments. But the mundane items are no less important, as they are aimed at making life easier for enterprise database developers—those most likely to use MapD.
Previous versions of MapD (not to be confused with Hadoop/Spark vendor MapR) were able to scale vertically but not horizontally. Users could add more GPUs to a box, but they could not scale MapD across multiple GPU-equipped servers. An online demo shows version 3 allowing users to explore in real time an 11-billion-row database of ship movements across the continental United States using MapD’s web-based graphical dashboard app.
Version 3 adds a native shared-nothing distributed architecture to the database—a natural extension of the existing shared-nothing architecture MapD used to split processing across GPUs. Data is automatically sharded in round-robin fashion between physical nodes. MapD founder Todd Mostak noted in a phone call that it ought to be possible in the future to manually adjust sharding based on a given database key.
The big advantage to using multiple shared-nothing nodes, according to Mostak, is not only a linear speed-up in processing—although that happens. It also means a linear acceleration for ingesting data into the cluster, which is useful in lowering the bar to entry for database developers who want to try their data out on MapD.
Other features in MapD 3.0, chief among them high availability, are what would be expected from a database aimed at enterprise customers. Nodes can be clustered into HA groups, with data synchronised between them via a distributed file system (typically GlusterFS) and a distributed log (through an Apache Kafka record stream or “topic”).
Another addition aimed at attracting a general database audience is a native ODBC driver. Third-party tools such as Tableau or Qlik Sense can now plug into MapD without the overhead of the previous JDBC-to-ODBC solution.
A hybrid architecture is not yet possible with MapD’s scale-out system. MapD has cloud instances available in Amazon Web Services, IBM Softlayer, and Google Cloud, but Mostak pointed out that MapD does not currently support a scenario where nodes in an on-premises installation of MapD can be mixed with nodes from a cloud instance.
Most of MapD’s customers, he explained, have “either-or” setups—either entirely on-premises or entirely in-cloud—with little to no demand to mix the two yet.
IDG News Service