Beyond the mainstream database offerings
26 March 2020 | 0
By and large, if you need a database, you can reach for one of the big names – MySQL/MariaDB, PostgreSQL, SQLite, MongoDB – and get to work. But sometimes the one-size-fits-all approach doesn’t fit all. Every now and then your use case falls down between barstools, and you need to reach for something more specialised. Here are nine offbeat databases that run the gamut from in-memory analytics to key-value stores and time-series systems.
The phrase “SQL OLAP system” generally conjures images of data-crunching monoliths or sprawling data warehouse clusters. DuckDB is to analytical databases what SQLlite is to MySQL and PostgreSQL. It isn’t designed to run at the same scale as full-blown OLAP solutions, but to provide fast, in-memory analytical processing for local datasets.
Many of DuckDB’s features are counterparts to what’s found in bigger OLAP products, even if smaller in scale. Data is stored as columns rather than rows, and query processing is vectorised to make the best use of CPU caching. You won’t find much in the way of native connectivity to reporting solutions like Tableau, but it shouldn’t be difficult to roll such a solution manually. Aside from bindings for C++, DuckDB also connects natively to two of the most common programming environments for analytics, Python and R.
“Edge” is a term used in graph databases to refer to the connection or relationship between two entities or nodes (such as between a customer and an order, or between an order and a product, etc.) of a highly connected dataset. EdgeDB uses the PostgreSQL core and all the properties it provides (like ACID transactions and industrial-strength reliability) to build what its makers call an “object-relational database” with strong field types and a SQL-like query language.
Thus, EdgeDB combines NoSQL-like ease of use and immediacy, the relational modelling power of a graph database, and the guarantees and consistency of SQL. Even though EdgeDB is not formally a document database, you can use it to store data that way. And you can use the GraphQL query language to easily retrieve data from EdgeDB, just as you can with native graph databases such as Neo4j.
An open source project spearheaded by Apple, FoundationDB is a “multi-model” database that stores data internally as key-value pairs (essentially the NoSQL model), but can be organised into relational tables, graphs, documents, and many other data structures. ACID transactions guarantee data integrity, and horizontal scaling and replication are both available out of the box. FoundationDB’s design comes with some stiff restrictions, though: keys, values, and transactions all have hard size limits, and transactions have hard time limits as well.
The goal behind HarperDB is to provide a single database for handling structured and unstructured data in an enterprise – somewhere between a multi-model database like FoundationDB and a data warehouse or OLAP solution. Ingested data is deduplicated and made available for queries through the interface of your choice: SQL, NoSQL, Excel, etc. BI solutions like Tableau or Power BI can integrate directly with HarperDB without the data needing to be extracted or processed. Both enterprise and community editions are available.
As popular and powerful as Redis is, the in-memory key-value store has been criticised for falling short in threaded performance and ease of use. KeyDB is protocol-compatible with Redis, so can be used as a drop-in replacement. But KeyDB adds some nifty under-the-hood improvements, chiefly multithreading for network I/O operations and query parsing. Plans for the next edition of Redis, Redis 6, include threaded I/O as well, but KeyDB is available now.
A product of Uber’s internal engineering team, M3DB is a distributed time-series database that is used in Uber’s metrics platform (essentially as a data store for Prometheus). Borrowing ideas from Apache Cassandra and a Facebook project named Gorilla, M3DB allows arbitrary time precision, out-of-order insertions, and configurable levels of replication and read consistency. However, the creators note that M3DB might not be suitable for all time-series database use cases. For instance, M3DB can’t insert data out of order beyond a given time window (the default is two hours), and it is mainly optimised for storing and retrieving 64-bit floats rather than other kinds of data.
The name implies a fusion of the Redis in-memory key-value store and SQL query capabilities, and that’s exactly what RediSQL is – specifically, a Redis module that embeds a SQLite database. Data is stored transparently in Redis, so Redis handles persistence and in-memory processing. Each database is associated with a Redis key, so you can have multiple SQL databases on a single Redis instance. Queries to those databases are standard SQL, passed via the standard Redis API. You can also create and pre-compile statements (essentially stored procedures) in RediSQL to speed up query execution. Both commercial and open source editions are available.
SQLite is a little miracle: an embeddable open source database that is lightning-fast and ultra-reliable. SQLite makes a great default choice whenever you need a database in a single-user application, but SQLite instances are limited to a single node.
RQLite builds on SQLite to create a distributed database system. Setting up multiple nodes is easy, and data automatically replicates across those nodes using the Raft consensus algorithm. RQLite also provides encryption between nodes and a discovery service that makes it easy to add nodes automatically. But RQLite also has a few drawbacks: Write speeds are slower than in SQLite, and only deterministic SQL functions – ie, those guaranteed to produce the same result on every node – are safe to use.
Most high-end databases these days have some kind of in-memory functionality, even if it involves something like table pinning (eg, SQL Server). UmbraDB, an analytics database that can run as a drop-in replacement for PostgreSQL, is designed to use in-memory processing whenever it can. When it can’t, it uses a novel variable-size page mechanism for paging data from storage. Long-running queries are optimised for execution with LLVM.
IDG News Service