In the last three installments of the big data technology series, we looked at the historical evolution and key developments in databases, BI/DW platforms and statistical computing software. This discussion will provide us with a good foundation to understand some of the key trends, developments, solution landscape and architectures that are taking shape today in response to the big data challenges. This installment in the series will focus on understanding an outline of major trends in the data management and analytics platform space.
As we saw in Part 2, innovations and recent advances have greatly changed the database management technology platforms, with the emergence of data stores for unstructured data and distributed large-scale data management architectures. Part 3 focused on how the traditional BI/DW technology platform appliances have emerged to be a critical component of a corporation’s enterprise architecture supporting management decision-making and reporting needs. Finally, Part 4 discussed how statistical computing platform have evolved on their own to support the advanced analytic and data mining needs of the business. Technical developments in these three area are being increasingly intertwined, with those in one area affecting and reinforcing the ones in another. Falling cost of hardware, increasing sophistication of the software, and the rise of big data sets is driving new paradigms and thinking on technologies and their architecture for how data should be managed and analyzed. This new thinking is challenging and extending the way things have been done traditionally.
The graphic below describes some of the key trends in the data management and analytics tools landscape.
Enterprise data management architecture is changing and evolving in various ways due to emergence of big data processing and supporting tools, however there are a few key takeaways about big data architectures:
1) Open Scale-out Shared Nothing Infrastructure
As the demands for data storage and processing grew with the advent of the modern-day Internet, vertical scaling began to be used for managing the higher storage requirements. In vertical scaling, resources such as processing power or disk are added to a single machine to match the higher processing requirements. New architectures, such as database clustering in which data is spread out amongst a cluster of servers, were adopted. MPP appliances provided scalability to process massive data sets across a cluster of high-end proprietary servers. Hardware improvements that have happened over the past many decades however brought down the price/performance ratio of x-86 servers to the point where companies started using these machines to store and process data for day-to-day operations. The usage of cheap x-86 machines for data processing was pioneered by new age information companies such as Google and Amazon to store and manage their massive data sets. Modern day scale-out architectures leverage x-86 servers with open source standard configurations using industry standard networking and communication protocols. In fact, many modern-day data analytics platforms are basically a software platform that are certified to run on a cluster of commodity servers with a given configuration.
2) Tailored Data Management Architecture
The hugely successful relational model forms the basis of a majority of enterprise data computing environments today. In spite of the variety of uses cases that the relational model has been used for, it has its set of shortcomings. Database innovation in recent years has focused on tools and techniques to store unstructured data using non-relational techniques. A raft of database management tools for such data have emerged in the past decade. Alternative forms of data storage are being increasingly used e.g. columnar databases that storage data indexed by columns rather than rows. Similarly, a number of innovative data storage solutions such as SSD based storage have come out. These innovations have created a plethora of data management system options each of which is optimized to handle a specific set of use cases and applications. Enterprise data management architectures are moving from using “one size fits all” relational database systems to using a “tailored” combination of relational/non-relational, row-oriented/column oriented, disk based/memory based etc. solutions as guided by data workloads’ characteristics and processing needs.
3) Logical Enterprise Data Warehouse
Traditional BI and DW platforms have been successful at delivering decision support and reporting capabilities with structured data to answer pre-defined questions. Advanced analytics solutions have traditionally been delivered using proprietary software and high-end hardware platforms. Relational databases have typically been used to manage transactional data. This picture is slowly evolving due to falling hardware costs and rise of big data needs, and the consequent emergence of unstructured data processing solutions and new big data analytic platforms. Unstructured data stores such document stores are slowly making their way into the enterprise to mange unstructured data needs. The new analytic platforms provide a powerful suite of tools and libraries based on open source technologies to run advanced analytics supported by a processing layer and query optimizer that leverages scale-out distributed architectures to process data. The enterprise data architecture is thus slowly evolving and increasing in complexity as companies leverage myriad data storage and processing options to manage their data needs. In response to these developments, Gartner coined the concept of “logical data warehouse”, essentially an architecture in which the concept and scope of the traditional warehouse has been expanded to include the new data processing tools and technologies, all abstracted by a data virtualization layer.
The database and analytic platform market continues to evolve, and successful enterprise data architecture patterns to manage the big data needs are just emerging. In the next installment of the big data series, we will look at some of the key capabilities of a big data analytics platform and some major players in the market.