As we saw in the second installment of the Big Data Series (Big data Technology Series – Part 2), the database management system market continues to evolve with falling cost of hardware, rising need to process distributed massive data sets, and emergence of cloud-based service models. It used to be that relational database management systems were the be all and end all of database management architectures. The limitations of the relational model in handling internet-scale data and computing requirements gave rise to NoSQL and other non-relational database management systems which are now being used to handle specialized cases where the relational model fails. Database management architectures have thus evolved from a “one size fits all” state to one with an “assorted mix” of tools and techniques that are best of breed and fit for the purpose. Given the plethora of database management tools and technologies, how does one begin to create such “fit for purpose” architecture? What key trade-offs does a database architect need to make while selecting the tools to manage data? To refresh what we discussed in the first introductory installment of the Big Data Series, database management systems fall in the “Operational Environment” (see the graphic below). We will delve a bit deeper into this operational environment in this post.
When selecting an appropriate database management system in an operational distributed data environment, several dimensions come into play. Data consistency obviously is one key dimension (and one at which the relational model excels), but in a distributed environment, other dimensions such as availability and partitioning become key. Described below is a list of key dimensions grouped in three buckets that are critical while evaluating a database management architecture. These dimensions need to be traded off based on specific requirements to arrive at a solution that is fit for purpose. For example, relational databases provide good consistency and performance for OLTP like workloads, but may not be well suited to handle multi-join queries that span multiple entities and nodes (so high data processing complexity and scope).
In addition to the traditional RDBMS database clusters and appliances, there are now several classes of database management products available now in a database architect’s arsenal: NewSQL databases, Document Stores, and Column Stores to name a few. How these different solutions compare and contrast can best be seen through the lens of aforementioned dimensions. Described below are the following classes of database management systems seen through the lens of this framework: 1) NewSQL databases, 2) Key-Value Stores, 3) Document Stores, 4) Column Family Stores, and 5) Graph Databases