Wednesday, July 21, 2010

Why Use a NoSQL Database?

NoSQL databases were born mostly from web-centric companies needing to solve problems out of the reach of modern relational databases. For those of us who are not running web-centric companies, are these NoSQL databases useful? I think they could be. Although the problems that motivated the NoSQL database movement may be quite different from ours, these databases offer benefits that may be a good fit for several situations. From my perspective, there are three features of NoSQL databases that are attractive for internal and even commercial applications:

  • Cost: Most of the NoSQL databases are open source and can be used freely. Moreover, they are designed to run on commodity x86 hardware with locally-attached disk. Compared to scaling a modern commercial relational database, which typically requires expensive "big iron" boxes, SAN storage, and other expensive components, NoSQL databases can manage a large set of data at a fraction of the hardware and software costs.
  • Performance: Because of their special purpose, NoSQL databases tend to be "lean and mean", often supporting read/write rates an order of magnitude faster or more than relational databases. Many NoSQL databases can be classified as "persistent hash tables", optimized for performance and availability at the expense of other features.
  • Availability: NoSQL databases use replication and sophisticated fault tolerance algorithms such as Phi Accrual Failure Detection to maximize availability when a node fails. Many NoSQL databases allow new nodes to be dynamically added, after which data--and work--are automatically redistributed. Since field/column semantics are enforced by applications, NoSQL databases don't require table-level schema changes. Even upgrades to new software versions are typically supported via "rolling restarts", which keep the overall database available.
Before you get too excited about a new class of database that is cheap, fast, and highly available, be aware that, compared to relational databases, you're going to give up a lot. Keeping in mind that features of each NoSQL database vary and that this new genre is evolving quickly, consider some things we take for granted from relational databases but we'll mostly find absent with NoSQL databases:

  • SQL: Some NoSQL databases provide a subset of SQL or an SQL-like query language, but these languages are very limited compared to full ANSI SQL. NoSQL databases favor small update transactions and quick queries with little or no support for ACID transactions (see below). Many NoSQL databases provide a way to use MapReduce-style processing for data analysis, but this bulk scan approach facilitates long-running, read-only applications.
  • Data integrity: NoSQL databases don't have a detailed schema - you declare only the major data pools (tables) that will be used. Each row can have a large, variable number of columns, each storing any type of value. NoSQLs have no foreign key support or referential integrity, and they don't have triggers or similar features that help enforce data rules. Instead, you must enforce all data integrity rules within your applications.
  • ACID transactions: Because of their eventual consistency model, NoSQL databases have little or no locking, which means you can't perform multi-record, atomic updates. (Some NoSQL databases allow a single transaction to atomically update a limited number of records in the same partition, but not across partitions.) As a result, updates require strategies such as application-level locking, "soft" transactions, and idempotent algorithms (a subject for another post).
  • Stability: Most of these NoSQL databases are shipping an 0.x release, meaning they haven't yet reached 1.0. They are essentially experimental, which means lots of things: they aren't as stable as production databases, upgrade compatibility isn't guaranteed, and, well, they crash more often than commercial databases. Among other things, this means they're not ready for mission-critical, production databases.
  • An ecosystem: Similarly, because NoSQL databases are so new, you won't find a rich array of tools, add-ons, and even consulting expertise for them. (Though companies such as Riptano are starting to emerge for professional support.) As with all new technologies, it will take time for this field to mature.
Despite their current limitations, NoSQL databases may be a suitable match for certain applications. Especially with the continued decline in storage costs and our end users' growing appetite for Google-like access to information, we are increasingly compelled to make more data available at sub-second speed. This makes the speed and availability factors of NoSQL enticing. Here are some example applications that I think might be good candidates:

  1. Capture and report: Consider applications that capture data from their environment such as events or log records and then provide various forms of reporting. For these applications, records are typically stored once and not updated thereafter. Their challenge is making an ever-expanding data pool accessible for online searching and data analysis. Using a NoSQL database for storage may enable these apps to store more data, more quickly, and more cheaply than with a relational database.
  2. Distributed storage: Today, we would typically use a file system to manage a large number of unstructured objects: documents, images, logs, etc. A large SAN with RAID is one option, but as an alternate solution, a NoSQL database's sharding and replication features may provide dynamically expandable storage at lower cost.
  3. Single updater/copious readers: NoSQL databases were born from social web applications such as Facebook and LinkedIn. These applications experience high update volumes (and even higher fetch volumes), but typically any given object is updated by a single user. In other words, ACID transactions are not a strong requirement while massive scalability and availability are. Correspondingly, collaboration and similar applications that store a large number of objects that are seldom updated by more than one user at a time are also good fits for NoSQL databases.
  4. Hybrid scenarios: Some relational database applications manage a large number of tables, but only a few tables experience large growth. Often, the number of records in the high-growth tables is limited by the performance that can be reasonably provided. That is, records are purged or off-loaded above a certain threshold to avoid response degradation. NoSQL databases offer a new choice where off-loaded records can be stored. In some cases, it may make sense to move the high-volume tables entirely to a NoSQL database, allowing the application to use a hybrid storage approach.
Most companies have a wide range of applications that warrant a database for persistent storage. For a vast majority of these applications, relational databases will continue to be the right solution. Any application whose data fits on a single computer or requires complex transaction processing has little reason to consider a NoSQL database. But for many applications such as the examples above, a NoSQL database may be a good fit. And, as the NoSQL field continues to expand and mature, the range of suitable applications will undoubtedly expand.

No comments:

Post a Comment