Understanding The Building Blocks of a Distributed Ledger System

Introduction to DLTs

Distributed Ledger technology (DLT) is being hailed as a transformative technology with comparisons being drawn to the Internet in its potential to transform and disrupt industries.  As a “platform” technology for decentralized, trust-based peer-to-peer computing, DLT helps shape new “domain” capabilities, just as computer networking enabled the Internet and creation of capabilities across communication, collaboration and commerce. Like the Internet, it will have far reaching consequences for enterprise architectures of the future.  Not only will DLT transform the technology stack of established domains (witness how Blockchain is transforming identity management infrastructure in the enterprise), but it will also give rise to new architecture paradigms as computing moves to decentralized trust-based networks, for example, in how an enterprise interacts with its business partners, suppliers and buyers.  The Internet took 30 years to have disruptive effects in the enterprise, and DLT’s full impact is expected to play out over similar time frames.

DLT represents a generic class of technologies (Blockchain is a prominent example), but all DLTs share the concept of the distributed ledger: a shared, immutable database that is the system of record for all transactions, current and historic, which is maintained by a community of participating nodes that have some sort of an incentive (usually a token or a cryptocurrency) to maintain the ledger in good standing.  The emergence of DLT’s can be traced to back to the original blockchain applications, Bitcoin and Ethereum.  Various other distributed ledger applications have emerged to solve specific industry/domain issues: R3’s Corda in financial services, Ripple for payments, etc.  Innovation in the DLT space is proceeding at a feverish pace.  The well-established DLT based networks can be essentially segmented based on two dimensions: how ledger integrity is guaranteed through validation, and whether the ledger is private or public.

DLT and Enterprise Architecture

As participants to DLT based networks developed by industry utilities or consortiums, organizations may not have a strong need to master internal architecture design and trade-offs associated this such a platform.  However, the architecture community in those organizations will still be required to understand how the networks they are participating in work, to the extent required to understand the implications for their organizations.  Furthermore, as intra-company applications of DLT become mainstream, enterprise architects will be increasingly called to provide perspectives on most optimal design of the underlying technology.  As DLT moves from innovation labs into the mainstream enterprise, architects will need to  start preparing their organizations for accepting DLT-based applications into the organizational landscape.  A good place to start for the enterprise architects will be to understand just what the DLT technical architecture encompasses.  This involves understanding what building blocks comprise a DLT system, and what architectural decisions need to be made.

The Building Blocks of a DLT System

To understand a complex technology such as DLT, it may be helpful to draw parallels to the TCP/IP stack for computer networking, which Blockchain has been compared to in the past (The Truth About Blockchain).  While there may not be a straight one-to-one correspondence between the Internet’s OSI model and the DLT architecture, drawing the parallel helps one understand conceptually how the building blocks fit together.  The OSI model is a generic architecture that represents the several flavors of networking that exist today, ranging from closed, proprietary networks to open, standards-based. The DLT building blocks provide a generic architecture that represents the several flavors of DLTs that exist today, and ones yet to be born.

In theory, it should be possible to design each building block independently with well-defined interfaces for the whole DLT system to come together as one whole, with higher level building blocks abstracted from the lower level ones. In reality, architectural choices in a building block influence those in other building blocks e.g., choice of a DLT’S data structure influences the consensus protocol most suitable for the system.  As common industry standards for DLT architecture and design develop (Hyperledger is an early development spearheaded by The Linux Foundation) and new technology is proved out in the marketplace, a more standardized DLT architecture stack will perhaps emerge, again following how computer networking standards emerged.  There is value, nevertheless, in being able to conceptually view a DLT system as an assembly of these building blocks to understand the key architecture decisions that need to be made.

Key Architectural Tradeoffs in DLT Systems

Architecting a DLT system involves making a series of decisions and tradeoffs across key dimensions.  These decisions optimize the DLT for the specific business requirement: for some DLT applications, performance and scalability may be key, while for some others, ensuring fundamental DLT properties (e.g., immutability and transparency) may be paramount.   Inherent in these decisions are architectural tradeoffs, since the dimensions represent ideal states seldom realized in practice.  These tradeoffs essentially involve traversing the triple constraint of Decentralization, Scalability, and Security.

Decentralization reflects the fundamental egalitarian philosophy of the original Bitcoin/Blockchain vision i.e., the distributed ledger should be accessible, available and transparent to all at all times, and that all participating nodes in the network should validate the ledger and thus have the full ledger data.  Decentralization enables trustless parties to participate in the network without the need for central authorization.  Scalability refers to the goal of having appropriate level of transaction throughput, storage capacity of the DLT to record transaction data, and the latency for the transaction to be validated and recorded once it is submitted.  Scalability ensures that appropriate performance levels are maintained as the size of the network grows.  Finally, Security is being able to maintain the integrity of the ledger by warding off attacks or making it impossible to maliciously change the ledger for one’s benefit. Fundamentally, this dimension reflects a security design that is inbuilt into the fabric of how the ledger operates, and not rely on external ‘checking’ to ensure safety.

Bringing It Together: DLT Building Block Decisions and Architectural Tradeoffs

Applying the architectural decisions to the DLT system allows one to come up with different flavors of DLT systems, each making tradeoffs to navigate the triple constraint described above.  Traversing the sides of the triangle allows one to transcend different DLT architecture styles with the vertices of the triangle denoting most pure architectural states seldom realized in practice.  For example, systems like Bitcoin and Ethereum aim to tend toward Vertex A maximizing Decentralization through their decentralized P2P trustless model, and Security through their consensus building and validation methods that prevent malicious attacks (although both Bitcoin and Ethereum have been shown to have other security vulnerabilities), but sacrifice much in terms of Scalability (Bitcoin’s scalability woes are well-known, and Ethereum is only slightly better).  On the other hand, permissioned DLTs, such as Corda, aim to tend to Vertex C maximizing Scalability and guaranteeing Security, but sacrifice Decentralization (by definition, permissioned DLT’s are not transparent since they restrict access and also validation is provided only by a set of pre-authorized validating nodes), and also may suffer other security issues (both the trusted nodes and the central authority in a permissioned DLT system can be attacked by a nefarious party).  DLT variations such as Bitcoin Lightning Network and Ethereum Raiden tend toward Vertex B, aiming to use off-chain capabilities to improve Scalability of traditional Blockchain and Ethereum networks, while preserving Decentralization (despite some recent concerns that these networks have a tendency to become centralized in the long run), although their off-chain capabilities may require additional Security capabilities (they also partially move away from the Blockchain’s decentralized security apparatus).   Let’s examine how these tradeoffs come into play at the level of DLT building blocks.

Layer 3: Ledger Data Structure

Ledger Data Structure encapsulates decisions around how the distributed ledger is actually structured and linked at a physical level e.g., chain of blocks, a graph, etc.  Additionally, it captures decisions around how many ledger chains there are, and specifies if the nodes carry the entire or just a part of the ledger.  In traditional Blockchain, the ledger is structured as a global sequential linked list of blocks instances of which are replicated across all participating nodes.  This design goes hand in hand with the Proof of Work consensus protocol that traditional Blockchain has in ensuring high levels of Decentralization and Security- since each node has current instance of the global ledger chain, and there is decentralized consensus building for block validation (although, a few security vulnerabilities with Blockchain have come to the forefront and Proof Work is susceptible to centralization due to economies of scale in mining).  As we know, this design takes a toll on Scalability – Blockchain can process only a few transactions per minute and time required for processing a block is high (Bitcoin generates a new block every 10 minutes).

Some new designs are coming with alternate data structures that improve Scalability & Performance, such as NXT’s and SPECTRE’s DAG (directed acyclic graph) of blocks, which mine DAG blocks in parallel to allow for more throughput and lower transaction time, and IOTA’s Tangle, the so called “blockless” DLT’s that get rid of block mining altogether and rely on a DAG of transactions to maintain system state and integrity.  These new designs have to be implemented and used at scale, with many of these designs having their own set of challenges (some claim they will continue to rely on some form of centralization to gain scale, and also have security related challenges).  However, DLT community’s interest has been high: IOTA’s Tangle has been creating a buzz in the DLT circles has a possible serious contender in the IoT world (since its data structure and protocol is well suited for handling volumes of continual streams of data), and several blockless DLT startups have been born lately.

Tinkering with how the ledger data is stored across nodes represent another opportunity for gaining in Scalability.  For example, sharding, a concept fairly well established in the distributed database world, is coming to DLTs.  Applied to DLTs, sharding enables the overall Blockhain state to be split into shards which are then stored and processed by different nodes in the network in parallel – allowing higher transaction throughput (Ethereum’s Casper utilizes sharding to drive scalability and speed).  Similarly, Scalability can be improved by having multiple chains, possibly private,  to enable separation of concerns: “side chains” enable processing to happen on a separate chain without overloading the original main chain.  While such designs improve Scalability, they move away from DLT’s  vision of enabling democratic access and availability to all participants at all times, and also present Security related challenges, part of the reason why widespread adoption of sidechains has been slow.

Layer 2: Consensus Protocol

Consensus protocol determines how transactions are validated and added to the ledger, and the decision-making in this building block involves deciding which specific protocol to choose based on the underlying data structure and objectives related to the triple constraint. Proof of Work, the traditional Blockchain consensus protocol, requires transactions to be validated by all participating nodes, and enables high degree of Decentralization and Security, but suffers on Scalability.  Alternative protocols, such as Proof of Stake, provide slightly better Scalability by changing the inventive mechanism to align more closely with the good operation of the ledger.  Protocols such as those based on Byzantine Fault Tolerance (BFT), which have been successfully applied to other distributed systems, are applicable to private ledgers, and depend upon a collection of pre-trusted nodes.  Such protocols sacrifice Decentralization to gain in Scalability.

Ethereum’s Raiden and Bitcoin’s Lightning Network are innovations to drive scalability to Ethereum and Bitcoin respectively by securely moving transactions off the main chain to a separate transacting channel, and then moving back to the main chain for settlement purposes – the so called “Layer 2” innovations.  This design allows load to move off of the main ledger, however, since transactions occuring on the channel are not recorded on the ledger, it sacrifices Security as the transacting channels need additional security apparatus not part of the original chain, as well as Decentralization (since channel transactions are not accessible to participants).

A number of other protocols and schemes to improve scalability and security are in the works, many of which are variations of the basic PoW and PoS, and which envision a future comprising not one single ledger chain, but a collection of chains.  For example, Kadena, which uses a PoW on a braid of chains, EOS which uses a delegated PoS, and Cosmos Tendermint, which uses BFT-based PoS across a universe of chains.

Layer 1:  Computation and App Data

DLT resources such as storage and computation come at a premium, and it costs real money to submit transactions in a DLT systems.  In the topmost layer, therefore, the architectural decisions deal with providing flexibility and functionality related to data storage and computation – essentially how much of it should reside on-chain, and how much off-chain.  Additionally, this layer deals with decisions around how to integrate the DLT with events from the real world.

For computation, Bitcoin Blockchain and Ethereum provide constructs for putting data and business logic to be executed on-chain, and Ethereum is far advanced than Blockchain in this since it offers “smart contracts”, which is essentially code that is executed on the chain when certain conditions are met.  There are obviously advantages to doing all computation on chain: interoperability between parties and immutability of code, which facilitates trust building.  There is, however, a practical limit to how complex smart contracts can be, a limit that is easily reached.  Offloading complex calculation to off-chain capabilities allows one to leverage the DLT capabilities in a cost-effective and high performing manner.  TrueBit,  on online marketplace for computation, enables a pattern in which complex resource-intensive computation can be offloaded to a community of miners who compete to complete the computation for a reward and provide results that can be verified on-chain for authenticity.  While this provides upside in terms of Scalability and Decentralization, there are Security related implications of using off-chain computation, an area of active research and development.

What applies to computation, also applies to data storage in the DLT world.  While Blockchain and Ethereum provide basic capabilities for storing data elements, a more suitable design for managing large data sets in DLT transactions is through off-chain data infrastructure providers or cloud storage providers while maintaining hashed pointers to these data sets on-chain.  Solutions like Storj, Sia, and IPFS aim to provide a P2P decentralized secure data management infrastructure that can hook into DLTs through tokens and smart contracts, manage data and computation securely through such technologies as Secure MPC (multi party computation).  Similar to off-chain computation, off-chain storage has upside in terms of Scalability and Decentralization, however, there are security and durability related implications.

What provides immutability to the distributed ledger (its deterministic method of recording transactions) is also its Achille’s heel: it is difficult for the ledger to communicate with and interpret data it gets from the outside non-deterministic world.  Oracles, services which act as middle men between the distributed ledger and the non-DLT world, bridge that gap and make it possible for smart contracts to be put to real world use.  Various DLT oracle infrastructures are in development: ChainLink, Zap, Oraclize, etc.  that provide varying features; choosing the right oracle architecture is thus extremely crucial for the specific use case under consideration.  Similar to off-chain data, oracles provide upside in terms of Scalability and Decentralization, however there are security and data verifiability related concerns.

Untitled

Conclusion

These are still early days for the DLT technology, and the many improvements that need to happen to make DLT commercially implementable are yet to come.  Beyond scalability and security, DLTs face a number of hurdles in enterprise adoption, such as interoperability, complexity and lack of developer friendly toolkits.  The future is probably going to be a not just one ledger technology here or there, but a multitude, each optimized for the specific use case within an organization, and even superstructures such as chains of chains connected with oracles, middleware and such.  And these structures will not replace existing technology architecture either; they will exist alongside and will need to be integrated with legacy technologies.  Like networking, DLTs will give rise to new processes, teams, and management structures.  Enterprise architects will play a central role in facilitating the development of DLT as a true enterprise technology.

BIG DATA TECHNOLOGY SERIES – PART 7

As we sBig dataaw in the second installment of the Big Data Series (Big data Technology Series – Part 2), the database management system market continues to evolve with falling cost of hardware, rising need to process distributed massive data sets, and emergence of cloud-based service models.  It used to be that relational database management systems were the be all and end all of database management architectures.  The limitations of the relational model in handling internet-scale data and computing requirements gave rise to NoSQL and other non-relational database management systems which are now being used to handle specialized cases where the relational model fails.  Database management architectures have thus evolved from a “one size fits all” state to one with an “assorted mix” of tools and techniques that are best of breed and fit for the purpose.  Given the plethora of database management tools and technologies, how does one begin to create such “fit for purpose” architecture? What key trade-offs does a database architect need to make while selecting the tools to manage data? To refresh what we discussed in the first introductory installment of the Big Data Series, database management systems fall in the “Operational Environment” (see the graphic below). We will delve a bit deeper into this operational environment in this post.

Pic 1

When selecting an appropriate database management system in an operational distributed data environment, several dimensions come into play.  Data consistency obviously is one key dimension (and one at which the relational model excels), but in a distributed environment, other dimensions such as availability and partitioning become key.  Described below is a list of key dimensions grouped in three buckets that are critical while evaluating a database management architecture.  These dimensions need to be traded off based on specific requirements to arrive at a solution that is fit for purpose.  For example, relational databases provide good consistency and performance for OLTP like workloads, but may not be well suited to handle multi-join queries that span multiple entities and nodes (so high data processing complexity and scope).

Dimensions

In addition to the traditional RDBMS database clusters and appliances, there are now several classes of database management products available now in a database architect’s arsenal: NewSQL databases, Document Stores, and Column Stores to name a few.  How these different solutions compare and contrast can best be seen through the lens of aforementioned dimensions.  Described below are the following classes of database management systems seen through the lens of this framework: 1) NewSQL databases, 2) Key-Value Stores, 3) Document Stores, 4) Column Family Stores, and 5) Graph Databases

NewSQL

Key Value

Document Stores

Column Family

Graph

Big Data Technology Series – Part 5

Big dataIn the last three installments of the big data technology series, we looked at the historical evolution and key developments in databases, BI/DW platforms and statistical computing software.  This discussion will provide us with a good foundation to understand some of the key trends, developments, solution landscape and architectures that are taking shape today in response to the big data challenges.  This installment in the series will focus on understanding an outline of major trends in the data management and analytics platform  space.

As we saw in Part 2, innovations and recent advances have greatly changed the database management technology platforms, with the emergence of data stores for unstructured data and distributed large-scale data management architectures.  Part 3 focused on how the traditional BI/DW technology platform appliances have emerged to be a critical component of a corporation’s enterprise architecture supporting management decision-making and reporting needs.   Finally, Part 4 discussed how statistical computing platform have evolved on their own to support the advanced analytic and data mining needs of the business.  Technical developments in these three area are being increasingly intertwined,  with those in one area affecting and reinforcing the ones in another.  Falling cost of hardware, increasing sophistication of the software, and the rise of big data sets is driving new paradigms and thinking on technologies and their architecture for how data should be managed and analyzed.  This new thinking is challenging and extending the way things have been done traditionally.

The graphic below describes some of the key trends in the data management and analytics tools landscape.

Big Data Analytics Platform Trends

Enterprise data management architecture is changing and evolving in various ways due to emergence of big data processing and supporting tools, however there are a few key takeaways about big data architectures:

1) Open Scale-out Shared Nothing Infrastructure

As the demands for data storage and processing grew with the advent of the modern-day Internet, vertical scaling began to be used for managing the higher storage requirements.  In vertical scaling, resources such as processing power or disk are added to a single machine to match the higher processing requirements.  New architectures, such as database clustering in which data is spread out amongst a cluster of servers, were adopted.  MPP appliances provided scalability to process massive data sets across a cluster of high-end proprietary servers.  Hardware improvements that have happened over the past many decades however brought down the price/performance ratio of x-86 servers to the point where companies started using these machines to store and process data for day-to-day operations.  The usage of cheap x-86 machines for data processing was pioneered by new age information companies such as Google and Amazon to store and manage their massive data sets. Modern day scale-out architectures leverage x-86 servers with open source standard configurations using industry standard networking and communication protocols.  In fact, many modern-day data analytics platforms are basically a software platform that are certified to run on a cluster of commodity servers with a given configuration.

2) Tailored Data Management Architecture

The hugely successful relational model forms the basis of a majority of enterprise data computing environments today.  In spite of the variety of uses cases that the relational model has been used for, it has its set of shortcomings.  Database innovation in recent years has focused on tools and techniques to store unstructured data using non-relational techniques.  A raft of database management tools for such data have emerged in the past decade.  Alternative forms of data storage are being increasingly used e.g. columnar databases that storage data indexed by columns rather than rows.  Similarly, a number of innovative data storage solutions such as SSD based storage have come out.  These innovations have created a plethora of data management system options each of which is optimized to handle a specific set of use cases and applications.  Enterprise data management architectures are moving from using “one size fits all” relational database systems to using a “tailored” combination of relational/non-relational, row-oriented/column oriented, disk based/memory based etc. solutions as guided by data workloads’ characteristics and processing needs.

3) Logical Enterprise Data Warehouse

Traditional BI and DW platforms have been successful at delivering decision support and reporting capabilities with structured data to answer  pre-defined questions.  Advanced analytics solutions have traditionally been delivered using proprietary software and high-end hardware platforms.  Relational databases have typically been used to manage transactional data.  This picture is slowly evolving due to falling hardware  costs and rise of big data needs, and the consequent emergence of unstructured data processing solutions and new big data analytic platforms.  Unstructured data stores such document stores are slowly making their way into the enterprise to mange unstructured data needs.  The new analytic platforms provide a powerful suite of tools and libraries based on open source technologies to run advanced analytics supported by a processing layer and query optimizer that leverages scale-out distributed architectures to process data.  The enterprise data architecture is thus slowly evolving and increasing in complexity as companies leverage myriad data storage and processing options to manage their data needs.  In response to these developments, Gartner coined the concept of “logical data warehouse”, essentially an architecture in which the concept and scope of the traditional warehouse has been expanded to include the new data processing tools and technologies, all abstracted by a data virtualization layer.

The database and analytic platform market continues to evolve, and successful enterprise data architecture patterns to manage the big data needs are just emerging.  In the next installment of the big data series, we will look at some of the key capabilities of a big data analytics platform and some major players in the market.