The Five D’s of Fintech: Debiasing

FinTechCreating a Level Playing Field

Much of the modern financial services edifice has been built on structures and arrangements that have relied on certain participants having exclusive access to information, third party human involvement due to complex and risky nature of financial services, and an opacity of transactions to preserve business models of certain parties.  Such structures and arrangements have worked just fine until now, but now the balance is tilting in favor of creating a more open environment where trust and transparency is improved, and bias and conflicts of interest are mitigated.  Toward this end, a variety of regulations are being proposed and enforced, and technology is increasingly creating opportunities as well. There is also a focus on simplification and enablement to empower the traditionally disadvantaged parties, and automation so as to minimize the principal-agent problem.  Fintechs are at the forefront in driving a large part of this.  This last installment in the Fintech series (The Five D’s of Fintech) focuses on how opacity of information, bias, and conflicts of interest in financial services are being targeted by fintechs that want to create an open, level playing field through technology and new business models. 

Opacity of Information, Lack of Trust & Transparency

Trade execution is one area where various issues related to trust and transparency are being targeted by fintechs.  The information asymmetry that exists between large institutional players such as market makers and small retail investors inevitably leads to inefficient outcomes for the latter – this is true in case of certain exchanges whereby high speed traders, because they get up to date market information quicker, are able to jump in front of retail investors during order execution thereby giving the former a price advantage.  IEX Exchange in 2012 came out with a new model for providing trade execution services with the goal of leveling the playing field and providing transparency.  For example, IEX’s “speed bump” was an innovation that introduced a delay in their trades with the goal of negating the high frequency trader’s speed advantages.  IEX also does not engage in “payment for order flow” – an arrangement in which HFT’s and market makers make payments to brokers for retail orders – which could potentially lead to conflicts of interest and ultimately shortchanging the retail investors.  And it also actively publishes a comprehensive set of metrics and KPIs to demonstrate execution quality.

In other cases, there is lack of transparency on asset price, which provides an unfair advantage to the seller – as in the case in the digitally archaic corporate bond market where in many to most instances, there is a lack of visibility of order books across markets, and dealers unilaterally set the price of the bonds (since they may be the only ones that carry those securities on their books), leading buyers to be meek price takers. Electronic bond trading platforms such as MarketAxess and TradeWeb have recently introduced “all to all” trading in corporate bond markets whereby buyers can get broad market visibility and price transparency and trade with any dealer or asset manager in the network.  New outfits such as LiquidityBook are jumping in with new solutions centered on pre-trade and post-trade transparency.

Pre-trade and post-trade transparency and ‘best execution’ are becoming the norm, thanks to regulations like MIFID2.  Investors, retail and institutional asset managers alike, are looking to find out more about where brokers are sending their orders. For many years now, full broker transparency on order routing has been difficult to attain.  These issues have increasingly become the focus of both regulators as well as fintech players.  As full order routing transparency continues to top the ever-growing list of concerns for asset managers, trading technology companies are slowly but surely emerging to provide such solutions to the buy-side.  Dash Financial Technologies is a technology and services company that provides order routing and customized trading solutions for the buy side enabling full order routing transparency.  The firm has seen considerable growth in recent years, in large part due to its transparency services.  Another example is Luminex, which is a consortium formed by major buy side players that provides order execution transparency. 


Portfolio manager and financial advisor bias in asset and wealth management has been the focus of study and research for some time now, with well-known biases and subtle preferences factoring into investment decisions.  For example, portfolio managers are prone to confirmation and overconfidence bias, which may ultimately lead to investment underperformance.  Financial advisors, despite having fiduciary focus, can be susceptible to a range of cognitive biases.  After all, portfolio managers and financial advisors are only human, and as behavioral economics has proved, no matter how rational advisors may think they are, subtle biases do often creep into decision making.

The application of behavioral economics to personal finance – dubbed “behavioral finance” – is the latest development in the wealth management industry purported to help drive investor outcomes.  Behavioral finance techniques can help reduce bias in investment decisions.  While bias in the context of wealth management has traditionally focused on end investor/client bias, equally important are the biases of the financial advisors themselves – indeed biases on either side can reinforce and influence each other.  Time-tested, automated, and quantitative strategies can help take human emotion out of investment decision-making, both for financial advisors and their clients.  As financial advisors increasingly focus on financial planning and driving investor satisfaction and outcomes, they realize the importance of leveraging behavioral finance techniques that can help advisors consistently apply the wisdom of proven investment strategies across clients and across market cycles.  However, per recent surveys, they cite difficulty translating theory into implementation, and a lack of software/tools as the primary reason preventing adoption (Charles Schwab survey) – until now.

Many technology savvy advisors and wealthtech firms are focused squarely on enabling behavioral finance capabilities through technology. United Capital for example recently unveiled FinLife Partners, a tool for advisors that taps into clients’ views on money and spending and how they prioritize their decision-making (United Capital, a well-known financial advisor, was recently acquired by Goldman Sachs).  Advisor Software has released Behavioral IQ, which aims to get a clearer picture of six behavioral traits influencing clients’ approach to risk and decision-making though a series of questions and follow-ups. The tool, which essentially weighs biases by analyzing such factors as confidence and loss aversion, lets advisors make more appropriate recommendations.  Lirio’s Finworx also applies insight into clients’ risk tolerance and decision-making approaches derived from questionnaires.

As technology commoditizes parts of wealth management (tax loss harvesting, rebalancing, notifications, to name a few), it is moving increasingly upstream to functions such as asset allocation and financial planning that have been the sole preserve of human financial advisors, thanks to machine learning and more generally, AI.  As technology revolutionizes wealth management, it reduces the risk of mistakes and bias, intentional or not.  Although AI can introduce bias and discrimination of its own, new techniques and methods to manage and mitigate such issues are already paving the way to widespread AI adoption.

Conflicts of Interest

The retirement industry is increasingly delivering on its fiduciary responsibility toward investors thanks to a string of successful class action lawsuits that has forced large plan sponsors and providers in the 401(k) space to clearly disclose fees and conflicts of interest.  However, there are segments of the industry where the problem is still rife.  For example, the small 401(k) plan market is not only underserved and overpriced, but it also pays high compliance fines due to failed DOL audit checks. 403(b) plans, offered to schoolteachers and employees of tax-exempt organizations, are particularly notorious for shirking their fiduciary responsibility.  403(b) plans, unlike 401(k) plans, are not subject to ERISA mandates, which have been key to ensuring fiduciary treatment of plan design and operation.  Indeed, many 403(b) plans suffer from higher costs, lack of transparency and conflicts of interest where, for example, administrators, who receive kickbacks from asset managers, promote highly priced and complex investment products (such as annuities) to sponsors and participants.  No wonder, New York’s Department of Financial Services recently launched an investigation into the 403(b) plan practices of brokers and investment providers such as insurers, the well-known purveyors of annuity products.

It is these segments that retiretech firms are beginning to target by bringing new value propositions and models to not just provide ease of use and fiduciary support, but also transparency and elimination of conflicts of interest.  Backed by private equity, these firms are targeting the RIA channel, leveraging cloud, APIs, machine learning, low-cost passive investments and the latest in “on-boarding” practices to make plans cheap, easy, and riskless enough for small company employers to sponsor.  Fintech firms like Vestwell are offering a full white-label retirement platform to RIA’s including investment services, trading, administration, recordkeeping, as well as taking on the fiduciary responsibility through the so called 3(38) and 3(16) arrangements whereby advisors are relieved of the financial cost and legal liability associated with providing retirement services. Outfits such as Dream Forward and ForUsAll are providing advisor focused retirement platform services and solutions by partnering with low cost asset managers and third party recordkeepers, thus eliminating conflicts of interest that typically arise with traditional bundled retirement plan service providers.  BidMoni offers a retirement plan marketplace that allows sponsors and advisors to manage provider bids and RFPs, and provide comparisons and ongoing fee analysis – all with the purpose of reducing the fiduciary responsibility burden on sponsors and advisors.  Furthermore, many of these innovators are approaching retirement planning with the context of the broader financial needs assessment and planning – which helps to align retirements products and services with the participant’s true financial picture.

As the wealth management and the retirement industry takes on the fiduciary challenge, industry structures favoring such arrangements as commissions and revenue sharing are slowly giving way to ones where investors are in charge of devising their financial future in full freedom and transparency, and fintech, starting with small corners in a vast industry, is leading the way for a broader transformation.


It is still early days, but financial services are slowly but surely becoming more accessible, more equitable, and end consumer focused, and credit goes in large part to fintechs that are demonstrating what is possible and motivating the staid incumbents to change.  Driving transparency, reducing bias, mitigating conflicts of interest is laudable, however, achieving perfection is neither practical, nor desirable.  Industry structures and business models do need to evolve from what and where they are today, nevertheless, and fintechs are demonstrating what is feasible while still preserving the economics of the industry.

The Five D’s of Fintech: Decentralization

The recent rout in the cryptocurrency markets has not been positive for crypto enthusiasts, but has hardly put the damper on its underlying technology, the Blockchain.  Blockchain is moving out of the labs into real world applications for capital markets and securities firms. Over the past four years, global banks have successfully piloted Blockchain, and are now planning to scale those pilots into production. Government jurisdictions and regulators have been critical of cryptocurrencies, but are embracing Blockchain, which is a type of distributed ledger technology, as a way to streamline the financial markets. Infrastructure service providers such as exchanges and data vendors have teamed up with startups and banks to develop consortia based solutions. Various other parts of financial services are beginning to look at Blockchain.  Decentralization is the topic of this post, the third installment in the fintech series (see The Five D’s of Fintech: Introduction), in which we will look at where and how Blockchain is transforming financial services, in particular the retirement industry.

Modern developed financial markets are an intricate patchwork of intermediaries connecting transacting parties: middlemen that connect entities, assist in drawing up contracts, and provide monitoring and verification of transactions (centralized clearing and settlement agencies are examples of such intermediaries).  While this arrangement has worked well so far, it suffers from various issues: an example is information disclosure that the intermediary requires of the transacting parties, which opens up potential for conflicts of interest and privacy risk.   Distributed ledger technology such as Blockchain, which reduces the cost of networking and cost of transaction verification, has the potential to fundamentally alter this intermediary based structure of modern financial markets.

Why Blockchain in Financial Services

At the heart of financial markets operations is how data is used to exchange information, store records of ownership and obligation, and manage transfer of assets.  Complexity of products and markets, a legacy of old IT infrastructures, and organizational factors making managing such data a hugely complex undertaking.  This is evident by the number of typical reconciliations that happen in a given transaction across the value chain, the time it takes to settle transactions, and the preponderance of security breaks.  Financial institutions spend a large amount of resources on managing such data, and yet take on significant operational and regulatory risk.  Blockchain can help alleviate these operational issues, provide transparency and enhance security.

Blockchain is a type of distributed ledger technology: a shared, immutable database that is the system of record for all transactions, current and historic.  Based on advanced cryptography, Blockchain provides a set of foundational capabilities.


Blockchain has big applicability in financial services precisely because it can address the industry challenges mentioned above all at once. Blockchain provides data efficiency: the immutable ledger provides golden source historical information access, decentralized nature provides availability of data, and shared access provides accurate data provenance and enrichment.  The decentralized peer to peer model allows trustless parties to engage in secure streamlined transactions obviating the need for centralized processing.  Blockchain’s smart contract functionality enables a network of value for managing the transfer of assets e.g., currency, securities.

Retirement: An Industry in Transformation 

At $28.2 trillion, the US retirement industry, comprised of advisors, asset managers and retirement providers, is a large part of the financial services ecosystem. Closely associated with asset and wealth management, retirement is becoming increasingly important due to broader trends in financial markets.  In the aftermath of the financial crisis,  asset management has taken on new market significance, and given the evolving dynamics, it will take on a leadership role in setting the technology agenda and driving change.  The retirement industry is encountering additional pressures.  There is an excess supply of retirement providers in the market leading to consolidation.  Participants are expecting full financial well-being solutions.  There is emerging competition from “wealthtech” firms now squarely focusing on retirement solutions.  The retirement industry is facing the triple challenge of improving cost efficiency, providing transparency, and improving services to drive engagement.

Efforts undertaken by asset managers and retirement providers to drive cost efficiency and improve transparency may not be enough, given consumer expectations, regulatory evolution and strengthening competition.  Barriers to realizing fundamental cost efficiencies and improving services result from a lack of a trusted shared data environment that facilitates the creation and management of the retirement plan, essentially a contract guiding the investment and distribution of retirement assets. A big chunk of the industry cost structure results from having to make do without such an environment.  A lack of a true marketplace where sponsors and participants can choose advisors and provider services limits innovation, choice, and scale.  Further, regulators lack true on-demand granular transparency, and data access to understand true systemic risk, which can limit efficacy of regulatory enforcement and policy making.  Beyond driving local optimizations, the retirement industry has an opportunity to fundamentally remake itself.

How Blockchain Can Revolutionize Retirement

The retirement industry needs to acknowledge the Blockchain movement not just because the operational backbone of financial services will one day move to Blockchain, but because Blockchain can help the industry tackle its issues here and now.  It can help drive cost efficiency by streamlining operational processes and reducing manual work effort; it can improve transparency by enabling all stakeholders to securely view and report on data; it can drive innovation and engagement by enabling new products, improving processes, and empowering users.

There are value propositions for all players in the retirement marketplace.  For participants, it can provide a comprehensive view of assets, analytics and advisory; for plan sponsors, it can reduce overhead and provide transparency into fees and performance; for providers, it can bring operational efficiency and enable value-added solutions; and for regulators, it can enable easier enforcement of regulations and access to data for future policy making.  Blockchain can enable all this because it facilitates the creation of an open data environment for industry players on which additional services and innovation can be enabled.

Forward thinking retirement industry players have started experimenting with Blockchain already.  Asset managers such as Vanguard, BlackRock and Fidelity have Blockchain based research projects, pilots and product roadmaps for data sourcing, custody operations and crypto investment products.  Insurance outfits such as Prudential and John Hancock are partnering with Blockchain startups to run POCs for trading and recordkeeping  platforms.  Even payroll providers such as ADP and Kronos are actively evaluating Blockchain for their payroll solutions and services.

A Strategic Imperative

A typical retirement provider has operational pain points and challenges in each part of the functional value chain.   Being a general purpose technology, Blockchain has application areas across all retirement provider functions. User identity and fraud management is a key area for Blockchain: Blockchain based solution for digital identities, KYC/AML, and fraud detection is a proven use case.  Further, empowering users with control of their data will increasingly become critical due to regulation and customer expectations, which Blockchain can help address.


Since Blockchain is a rapidly evolving technology, providers should take a long-term view to adoption, going through stages of adoption to learn the technology and prove business value.  Organizations should start out by testing basic Blockchain based data sharing, gradually moving transactions and whole processes to the new platform. There are various use cases which the industry can focus on in the immediate term on a proofing/piloting basis. Such uses cases can help providers evaluate “ways to play” and develop institutional expertise with the technology.


Blockchain is Here To Stay

Blockchain is taking the financial services world by storm. Financial services is spending about $1.7 billion annually on Blockchain with a 67% annual increase in 2017, and one in 10 financial institutions is now reporting Blockchain budgets in excess of $10 MM. Now well beyond initial experimentation, Blockchain-related interest within financial services has reached critical significance. 14% of the banks and other companies in a recent survey claim to have successfully deployed a production Blockchain solution. Headcount dedicated to Blockchain initiatives doubled in 2017, and a typical top tier bank now has 18 employees dedicated to Blockchain development. Financial institutions are coming together in Blockchain-based industry utilities and mutualizing commercial interests by working with Blockchain startups.

While initial applications have focused on cross-border payments and trade finance, it is capital markets where Blockchain has the most disruptive potential, promising to revamp the industry’s operational backbone. Not surprisingly, capital markets infrastructure providers are at the forefront of Blockchain innovation, closely followed by commercial and investment banks, with asset management a distant third. As infrastructure providers, banks, and broker-dealers prove out use cases, uptake is expected to increase in asset management.

All financial services organizations should be paying close attention to Blockchain due to its transformative potential. Since Blockchain reduces the need for financial intermediation, traditional intermediaries such as banks, counter parties and distributors will need to develop new value propositions. Blockchain is the enabling technology behind all cryptocurrencies and emerging crypto finance markets that are now emerging to challenge the traditional ones. Furthermore, Blockchain enables creation of new digital assets and services, which will unleash a wave of financial innovation and a corresponding support ecosystem.

Time for Action

With a game changing technology such as Blockchain, the temptation for most industry players will be to adopt a wait-and-watch attitude. However, such a stance may come at a price: innovative incumbents or new fintech players may prove hard to beat, or a Blockchain infrastructure solution may simply be unavoidable due to broad industry adoption. More fundamentally however is the prospect of other industry players setting the agenda for the future Blockchain architecture, which may prove to be strategically disadvantageous for the laggards. The retirement industry should get ahead of the game with a proactive stance toward innovating with Blockchain.

Providers need to develop a sense of urgency and action, creating a case for change and highlighting the opportunity costs of doing nothing. Providers should take the following actionable steps:

  • Define How To Innovate: Identify the scale and scope of adoption, understand capabilities required, and define the roadmap
  • Build The Foundation: Undertake “no regret” preparation e.g., technology modernization, data quality and governance, API/cloud architecture
  • Test The Waters: Test Blockchain applications and evaluate “ways to play” in the future through partnerships and M&A opportunities
  • Be A Part Of The Movement: Participate in industry forums, consortiums, and innovation efforts on Blockchain and develop institutional expertise

The industry is expected to go through the growing pains of Blockchain adoption: new economic opportunities will emerge, regulatory overhaul and industry participation will need to happen, and rules of play and governance will need to be defined. For retirement, as for many in the financial services world, it is not a question of “if” but “when” for Blockchain, and providers should start preparing for the eventual transition today.


Understanding The Building Blocks of a Distributed Ledger System

Introduction to DLTs

Distributed Ledger technology (DLT) is being hailed as a transformative technology with comparisons being drawn to the Internet in its potential to transform and disrupt industries.  As a “platform” technology for decentralized, trust-based peer-to-peer computing, DLT helps shape new “domain” capabilities, just as computer networking enabled the Internet and creation of capabilities across communication, collaboration and commerce. Like the Internet, it will have far reaching consequences for enterprise architectures of the future.  Not only will DLT transform the technology stack of established domains (witness how Blockchain is transforming identity management infrastructure in the enterprise), but it will also give rise to new architecture paradigms as computing moves to decentralized trust-based networks, for example, in how an enterprise interacts with its business partners, suppliers and buyers.  The Internet took 30 years to have disruptive effects in the enterprise, and DLT’s full impact is expected to play out over similar time frames.

DLT represents a generic class of technologies (Blockchain is a prominent example), but all DLTs share the concept of the distributed ledger: a shared, immutable database that is the system of record for all transactions, current and historic, which is maintained by a community of participating nodes that have some sort of an incentive (usually a token or a cryptocurrency) to maintain the ledger in good standing.  The emergence of DLT’s can be traced to back to the original blockchain applications, Bitcoin and Ethereum.  Various other distributed ledger applications have emerged to solve specific industry/domain issues: R3’s Corda in financial services, Ripple for payments, etc.  Innovation in the DLT space is proceeding at a feverish pace.  The well-established DLT based networks can be essentially segmented based on two dimensions: how ledger integrity is guaranteed through validation, and whether the ledger is private or public.

DLT and Enterprise Architecture

As participants to DLT based networks developed by industry utilities or consortiums, organizations may not have a strong need to master internal architecture design and trade-offs associated this such a platform.  However, the architecture community in those organizations will still be required to understand how the networks they are participating in work, to the extent required to understand the implications for their organizations.  Furthermore, as intra-company applications of DLT become mainstream, enterprise architects will be increasingly called to provide perspectives on most optimal design of the underlying technology.  As DLT moves from innovation labs into the mainstream enterprise, architects will need to  start preparing their organizations for accepting DLT-based applications into the organizational landscape.  A good place to start for the enterprise architects will be to understand just what the DLT technical architecture encompasses.  This involves understanding what building blocks comprise a DLT system, and what architectural decisions need to be made.

The Building Blocks of a DLT System

To understand a complex technology such as DLT, it may be helpful to draw parallels to the TCP/IP stack for computer networking, which Blockchain has been compared to in the past (The Truth About Blockchain).  While there may not be a straight one-to-one correspondence between the Internet’s OSI model and the DLT architecture, drawing the parallel helps one understand conceptually how the building blocks fit together.  The OSI model is a generic architecture that represents the several flavors of networking that exist today, ranging from closed, proprietary networks to open, standards-based. The DLT building blocks provide a generic architecture that represents the several flavors of DLTs that exist today, and ones yet to be born.

In theory, it should be possible to design each building block independently with well-defined interfaces for the whole DLT system to come together as one whole, with higher level building blocks abstracted from the lower level ones. In reality, architectural choices in a building block influence those in other building blocks e.g., choice of a DLT’S data structure influences the consensus protocol most suitable for the system.  As common industry standards for DLT architecture and design develop (Hyperledger is an early development spearheaded by The Linux Foundation) and new technology is proved out in the marketplace, a more standardized DLT architecture stack will perhaps emerge, again following how computer networking standards emerged.  There is value, nevertheless, in being able to conceptually view a DLT system as an assembly of these building blocks to understand the key architecture decisions that need to be made.

Key Architectural Tradeoffs in DLT Systems

Architecting a DLT system involves making a series of decisions and tradeoffs across key dimensions.  These decisions optimize the DLT for the specific business requirement: for some DLT applications, performance and scalability may be key, while for some others, ensuring fundamental DLT properties (e.g., immutability and transparency) may be paramount.   Inherent in these decisions are architectural tradeoffs, since the dimensions represent ideal states seldom realized in practice.  These tradeoffs essentially involve traversing the triple constraint of Decentralization, Scalability, and Security.

Decentralization reflects the fundamental egalitarian philosophy of the original Bitcoin/Blockchain vision i.e., the distributed ledger should be accessible, available and transparent to all at all times, and that all participating nodes in the network should validate the ledger and thus have the full ledger data.  Decentralization enables trustless parties to participate in the network without the need for central authorization.  Scalability refers to the goal of having appropriate level of transaction throughput, storage capacity of the DLT to record transaction data, and the latency for the transaction to be validated and recorded once it is submitted.  Scalability ensures that appropriate performance levels are maintained as the size of the network grows.  Finally, Security is being able to maintain the integrity of the ledger by warding off attacks or making it impossible to maliciously change the ledger for one’s benefit. Fundamentally, this dimension reflects a security design that is inbuilt into the fabric of how the ledger operates, and not rely on external ‘checking’ to ensure safety.

Bringing It Together: DLT Building Block Decisions and Architectural Tradeoffs

Applying the architectural decisions to the DLT system allows one to come up with different flavors of DLT systems, each making tradeoffs to navigate the triple constraint described above.  Traversing the sides of the triangle allows one to transcend different DLT architecture styles with the vertices of the triangle denoting most pure architectural states seldom realized in practice.  For example, systems like Bitcoin and Ethereum aim to tend toward Vertex A maximizing Decentralization through their decentralized P2P trustless model, and Security through their consensus building and validation methods that prevent malicious attacks (although both Bitcoin and Ethereum have been shown to have other security vulnerabilities), but sacrifice much in terms of Scalability (Bitcoin’s scalability woes are well-known, and Ethereum is only slightly better).  On the other hand, permissioned DLTs, such as Corda, aim to tend to Vertex C maximizing Scalability and guaranteeing Security, but sacrifice Decentralization (by definition, permissioned DLT’s are not transparent since they restrict access and also validation is provided only by a set of pre-authorized validating nodes), and also may suffer other security issues (both the trusted nodes and the central authority in a permissioned DLT system can be attacked by a nefarious party).  DLT variations such as Bitcoin Lightning Network and Ethereum Raiden tend toward Vertex B, aiming to use off-chain capabilities to improve Scalability of traditional Blockchain and Ethereum networks, while preserving Decentralization (despite some recent concerns that these networks have a tendency to become centralized in the long run), although their off-chain capabilities may require additional Security capabilities (they also partially move away from the Blockchain’s decentralized security apparatus).   Let’s examine how these tradeoffs come into play at the level of DLT building blocks.

Layer 3: Ledger Data Structure

Ledger Data Structure encapsulates decisions around how the distributed ledger is actually structured and linked at a physical level e.g., chain of blocks, a graph, etc.  Additionally, it captures decisions around how many ledger chains there are, and specifies if the nodes carry the entire or just a part of the ledger.  In traditional Blockchain, the ledger is structured as a global sequential linked list of blocks instances of which are replicated across all participating nodes.  This design goes hand in hand with the Proof of Work consensus protocol that traditional Blockchain has in ensuring high levels of Decentralization and Security- since each node has current instance of the global ledger chain, and there is decentralized consensus building for block validation (although, a few security vulnerabilities with Blockchain have come to the forefront and Proof Work is susceptible to centralization due to economies of scale in mining).  As we know, this design takes a toll on Scalability – Blockchain can process only a few transactions per minute and time required for processing a block is high (Bitcoin generates a new block every 10 minutes).

Some new designs are coming with alternate data structures that improve Scalability & Performance, such as NXT’s and SPECTRE’s DAG (directed acyclic graph) of blocks, which mine DAG blocks in parallel to allow for more throughput and lower transaction time, and IOTA’s Tangle, the so called “blockless” DLT’s that get rid of block mining altogether and rely on a DAG of transactions to maintain system state and integrity.  These new designs have to be implemented and used at scale, with many of these designs having their own set of challenges (some claim they will continue to rely on some form of centralization to gain scale, and also have security related challenges).  However, DLT community’s interest has been high: IOTA’s Tangle has been creating a buzz in the DLT circles has a possible serious contender in the IoT world (since its data structure and protocol is well suited for handling volumes of continual streams of data), and several blockless DLT startups have been born lately.

Tinkering with how the ledger data is stored across nodes represent another opportunity for gaining in Scalability.  For example, sharding, a concept fairly well established in the distributed database world, is coming to DLTs.  Applied to DLTs, sharding enables the overall Blockhain state to be split into shards which are then stored and processed by different nodes in the network in parallel – allowing higher transaction throughput (Ethereum’s Casper utilizes sharding to drive scalability and speed).  Similarly, Scalability can be improved by having multiple chains, possibly private,  to enable separation of concerns: “side chains” enable processing to happen on a separate chain without overloading the original main chain.  While such designs improve Scalability, they move away from DLT’s  vision of enabling democratic access and availability to all participants at all times, and also present Security related challenges, part of the reason why widespread adoption of sidechains has been slow.

Layer 2: Consensus Protocol

Consensus protocol determines how transactions are validated and added to the ledger, and the decision-making in this building block involves deciding which specific protocol to choose based on the underlying data structure and objectives related to the triple constraint. Proof of Work, the traditional Blockchain consensus protocol, requires transactions to be validated by all participating nodes, and enables high degree of Decentralization and Security, but suffers on Scalability.  Alternative protocols, such as Proof of Stake, provide slightly better Scalability by changing the inventive mechanism to align more closely with the good operation of the ledger.  Protocols such as those based on Byzantine Fault Tolerance (BFT), which have been successfully applied to other distributed systems, are applicable to private ledgers, and depend upon a collection of pre-trusted nodes.  Such protocols sacrifice Decentralization to gain in Scalability.

Ethereum’s Raiden and Bitcoin’s Lightning Network are innovations to drive scalability to Ethereum and Bitcoin respectively by securely moving transactions off the main chain to a separate transacting channel, and then moving back to the main chain for settlement purposes – the so called “Layer 2” innovations.  This design allows load to move off of the main ledger, however, since transactions occuring on the channel are not recorded on the ledger, it sacrifices Security as the transacting channels need additional security apparatus not part of the original chain, as well as Decentralization (since channel transactions are not accessible to participants).

A number of other protocols and schemes to improve scalability and security are in the works, many of which are variations of the basic PoW and PoS, and which envision a future comprising not one single ledger chain, but a collection of chains.  For example, Kadena, which uses a PoW on a braid of chains, EOS which uses a delegated PoS, and Cosmos Tendermint, which uses BFT-based PoS across a universe of chains.

Layer 1:  Computation and App Data

DLT resources such as storage and computation come at a premium, and it costs real money to submit transactions in a DLT systems.  In the topmost layer, therefore, the architectural decisions deal with providing flexibility and functionality related to data storage and computation – essentially how much of it should reside on-chain, and how much off-chain.  Additionally, this layer deals with decisions around how to integrate the DLT with events from the real world.

For computation, Bitcoin Blockchain and Ethereum provide constructs for putting data and business logic to be executed on-chain, and Ethereum is far advanced than Blockchain in this since it offers “smart contracts”, which is essentially code that is executed on the chain when certain conditions are met.  There are obviously advantages to doing all computation on chain: interoperability between parties and immutability of code, which facilitates trust building.  There is, however, a practical limit to how complex smart contracts can be, a limit that is easily reached.  Offloading complex calculation to off-chain capabilities allows one to leverage the DLT capabilities in a cost-effective and high performing manner.  TrueBit,  on online marketplace for computation, enables a pattern in which complex resource-intensive computation can be offloaded to a community of miners who compete to complete the computation for a reward and provide results that can be verified on-chain for authenticity.  While this provides upside in terms of Scalability and Decentralization, there are Security related implications of using off-chain computation, an area of active research and development.

What applies to computation, also applies to data storage in the DLT world.  While Blockchain and Ethereum provide basic capabilities for storing data elements, a more suitable design for managing large data sets in DLT transactions is through off-chain data infrastructure providers or cloud storage providers while maintaining hashed pointers to these data sets on-chain.  Solutions like Storj, Sia, and IPFS aim to provide a P2P decentralized secure data management infrastructure that can hook into DLTs through tokens and smart contracts, manage data and computation securely through such technologies as Secure MPC (multi party computation).  Similar to off-chain computation, off-chain storage has upside in terms of Scalability and Decentralization, however, there are security and durability related implications.

What provides immutability to the distributed ledger (its deterministic method of recording transactions) is also its Achille’s heel: it is difficult for the ledger to communicate with and interpret data it gets from the outside non-deterministic world.  Oracles, services which act as middle men between the distributed ledger and the non-DLT world, bridge that gap and make it possible for smart contracts to be put to real world use.  Various DLT oracle infrastructures are in development: ChainLink, Zap, Oraclize, etc.  that provide varying features; choosing the right oracle architecture is thus extremely crucial for the specific use case under consideration.  Similar to off-chain data, oracles provide upside in terms of Scalability and Decentralization, however there are security and data verifiability related concerns.



These are still early days for the DLT technology, and the many improvements that need to happen to make DLT commercially implementable are yet to come.  Beyond scalability and security, DLTs face a number of hurdles in enterprise adoption, such as interoperability, complexity and lack of developer friendly toolkits.  The future is probably going to be a not just one ledger technology here or there, but a multitude, each optimized for the specific use case within an organization, and even superstructures such as chains of chains connected with oracles, middleware and such.  And these structures will not replace existing technology architecture either; they will exist alongside and will need to be integrated with legacy technologies.  Like networking, DLTs will give rise to new processes, teams, and management structures.  Enterprise architects will play a central role in facilitating the development of DLT as a true enterprise technology.

The Five D’s of Fintech: Disintermediation

Finance, as the adage goes, is the art of passing money from hand to hand until it finally disappears. Like giant spiders in the middle of a system of webs, banks have played a key role in intermediating money flows across the vast capitalistic system, and doing so in a highly profitable manner. Banking from yore – taking deposits from customers and making loans to borrowers – has given way to a lending system that is dominated by non-banking financials institutions and capital markets. This “banking disintermediation” – banks no longer holding the loans they originated on their balance sheets but selling them off; borrowers going directly to the capital markets rather than to banks to obtain a credit; or savers investing directly in securities – refers to credit disintermediation and has been in the making for a number of decades: banks moved from the boring low-profitability business of credit provisioning to high margin fee-based businesses (investment banking, M&A, insurance etc.). “Disintermediation” – the third theme in The Five D’s of Fintech – has taken on a renewed significance in the wake of the rising tide of fintech: this time around, it refers to how banks may potentially be disintermediated from their customers in their fee-based businesses. Customer relationship, the bedrock of origination and sales that dominates such fee-based businesses, is now under attack, and banks have started to take note.

Fintech led disintermediation has been palpable in areas where venture investment has traditionally poured into: peer-to-peer lending, remittances, payments, equity crowdfunding and even robo-advisory. Fintech use cases on disintermediation of traditional payment players are old news. By comparison, the impact of fintech led disintermediation of the banks in capital markets appears to be relatively small. By some estimates, only $4 billion of the $96 billion (around 4%) of the fintech investment since the beginning of the millennium has been directed to capital markets, and of 8000+ fintechs globally, only 570 (around 7%) are operating in capital markets (Fintech in Capital Markets: A Land of Opportunity).

This may be beginning to change not least because of the juicy returns that can potentially be plucked from the investment banks in origination and sales (which is the key activity in fee based business such as investment banking, M&A/payments, asset management etc.) which at an estimated 22% ROE (Cutting through the noise around financial technology), is a much more profitable business than credit provisioning (which is the key activity in core banking (lending, deposits, etc.)) with an estimated ROE of only 6%. Entry barriers to capital markets and investment banking are significant: the highly regulated nature of the business, the oligopolistic nature of the industry, vicious competition from incumbent banks, and complex operating environment. However, regulatory pressures, compressed margins and technology enabled opportunities are forcing incumbent banks to acknowledge fintech led opportunities.

The investment banking world is being buffeted by the forces of digitization, regulation, and competition. Continued electronification of OTC markets, for example, has meant greater transparency around the price discovery process for investors. Dealers have pulled back from markets where regulatory requirements have rendered them unprofitable for those banks, thus providing an opening wedge for non-bank players such as hedge funds and institutional investors to step in. The financial crisis has tarnished banks’ reputation. It is against this backdrop that fintech outfits are evolving from just providing solutions to automate and streamline bank transactions to being serious threats across asset classes (equity/debt/FX), markets (primary/secondary), and the value chain (origination through distribution and dealing). In the world of supply chain financing, technology-led disintermediation has begun to make inroads. Regulatory pressures, decreasing margins, and KYC/AML challenges have made it difficult for commercial banks to scale their supply chain finance business: the market is underserved and rapidly growing. This is changing with the emergence of such platforms as PrimeRevenue, an American supply chain financing platform, that connects 70 lenders, including 50-odd banks, to 25,000 suppliers with $7bn-worth of invoices a month. While such platforms are not overthrowing banks’ supply chain financing business completely, they are slowly but surely intermediating themselves into direct relationships with end customers: witness how C2FO has offered Citi more lending opportunities in return for access to its supply chain financing customers.

The origination market for new securities issuance is an area where incumbents are still strong but evolving conditions may favor fintech led innovation. Some large asset managers want to ensure that corporations structure securities that those asset managers want to buy, not necessarily those that a bank may structure: this may just be the kind of solution a fintech may develop. Platforms that connect investors directly with issuers are few and far between in the equity and debt primary markets, and even those that are there have dealers in the mix, but slowly and surely there outfits such as Origin (a private placement platform aiming to simplify and automate the issuance of Medium Term Notes) and Overbond (a Canadian platform provider aiming to streamline bond issuance for both private- placements and public offerings) that are going to market with automation and auditing solutions today, but which may in the future choose to intermediate themselves more prominently with investors and issuers by offering data assets and related value added services.

There are, however, more worrisome developments for banks in the securities origination market. Investors are discovering new avenues for investing, for example equity crowdfunding, which may affect bank’s traditional business in private placements and investment banking payments. There are indications that institutional investors have already started to use crowdfunding websites to gain stakes in new businesses, such as in the real estate sector. One website, for example, enables crowdfunders to pool their capital and compete with institutional investors or co-invest with venture capital funds. There is a perceived threat that more buy-side firms will tap crowdfunding sites. Most worrisome for banks is perhaps the specter of alternate fund raising models whereby conventional IPO gives way to alternate intermediated mechanisms such as electronic auctions, crowdfunding and initial coin offerings (ICOs).

New capital raising models represent the latest in how disintermediation is playing out at the very heart of the capitalistic system. Equity crowdfunding has the potential to disintermediate early stage financing and democratizing access to capital and investment opportunities (in spite of the regulatory and reporting burdens imposed by the JOBS Act): witness the rise of such platforms as Kickstarter and Indiegogo. Equity crowdfunding has been thriving for a few years now, but it is Initial Coin Offering (ICOs) that has attracted investor attention of late.

In contrast to a traditional venture funding or IPOs where investors typically get a slice of the equity and control in the business, ICOs involve raising money by selling a cryptocurrency – called a “token” – at a discount to investors who can then trade the tokens on a crypto exchange. ICOs are different from traditional fundraising and even equity crowdfunding: they are unregulated, and startups do not have to fork out a slice of their equity. Indeed, ICOs have been on a tear: a quarter of the blockchain investment over the past couple of years (approx. $250 million) has come from ICOs, two recent ICOs (Qtum and OMG) passed the unicorn mark in a matter of mere months, and storied VCs are investing in ICO funded startups (Cryptocurrency Hedge Fund Polychain Raises $10 million). ICOs offer yet another fund raising avenue for startups that are skittish about handing over control to outsiders, those that do not want to go through the regulatory hassle of taking their company public, or those that have been neglected by the market. For the investors, ICOs provide the lure of massive returns (although they are extremely high risk and many are scams).

Capital markets’ “Uber moment” may not be around the corner, but the capitalistic middlemen are going to have to increasingly define what value they bring in light of disintermediated banking models being championed by fintechs. It is not just capital raising where fintech startups are mushrooming, but in other areas of the capital markets as well, for example clearing and settlement where blockchain-based startups are providing capabilities for fully automated settlement without the need for a central counterparty. Symbiont, a New York based blockchain startup has demonstrated how “smart securities” can streamline post trade processing and settlement. This has huge implications for middle/back office functions in banks, and the wider capital market ecosystem including custodians, clearing houses and depository institutions. This redundancy of centralized processing in the fintech utopia will be the theme of the next installment of the 5 D’s of Fintech series.

The Five D’s of Fintech: Disaggregation


“Death by a thousand cuts”, “Sandblasting”, “Mastodons attacked by ants” and similar such metaphors have been used to describe the scourge of fintech and insurtech insurgents and their impact on incumbent banks and insurance companies. “Disaggregation” or “Unbundling” of products services lies behind fintech’s poaching of customers from banks, not singlehandedly but collectively, one product or service at a time. Stories and studies abound on how technology is unbundling not just products and services, but impacting entire value chains. This disaggregation is now well established in traditional banking (See CBInsight’s blog post as an example), but only now emerging in insurance. Disaggregation is the topic of this post, the second installment in the fintech series (see The Five D’s of Fintech: Introduction), in which we will look at where and how such disaggregation is taking place in the insurance industry.

The insurance industry is experiencing a technological double whammy of sorts: not only does technology enable creation and usage of new behavioral context which fuels new competition, but also it stands to threaten the underlying business economics of the industry. Insurance experts talk about “behavior disaggregation” to describe how consumer behaviors can be tracked and analyzed to engage with consumers directly in real-time, price risk accurately and discretely, and provide frictionless services. For example, It is not hard to imagine a “connected home” where various safety measures one might take e.g., installing a burglar alarm, are instantly used to recalibrate risk and thus adjust the homeowners or renters insurance. Tech savvy startups are already leveraging such behavior disaggregation: pay as you go car insurance, from Metromile is an example where driver behavior can be tracked to provide fit for purpose policies. Oscar, a health insurer in New York, gives all its policyholders a fitness tracker; whenever they hit a set goal (walking 10,000 steps in a day, say) they get a refund of a dollar. Where incumbent insurers put their consumers in a policy straightjacket based on blunt risk indicators such as age and occupation, insurtech players leverage technology to study behaviors down to the ‘last mile’ to offer highly flexible solutions, customized not just by the asset type insured but also by duration, for example through “micro duration” policies. These companies have the focus and the agility to build this specialization: like Metromile has done for car insurance, Trov has done for general merchandize, Ladder for life, and Hippo for home. Increasingly traditional insurers may find that insurtech’s win-win value propositions are hard to beat with traditional product bundles – hence the case for unbundling. But unbundling existing products is not enough. Unfortunately for traditional insurers, insurtech’s micro targeting of risk reduces incumbents’ existing risk and profit pools: they are thus forced to not just compete with the upstarts, but also seek out new revenue sources.

Insurtech is forcing disaggregation at an industry-level scale. Incumbents all have traditionally managed the entire value chain spanning product/service distribution, underwriting, claims management and investment/risk management. Using independent agency networks, conventional underwriting models and re-insurance for risk management, carriers have thrived in the marketplace. This value chain is now coming apart as each link in the chain is impacted by technology. New digital sources of distribution are threatening to disintermediate carriers from end consumers – just witness the rise of CoverHound for auto, PolicyGenius for life and disability, Abaris for annuities, and various others. Traditional risk pools are shrinking, risk is migrating from consumers to products, and the nature of risk is evolving thanks to self-driving cars, IoT technologies and the sharing economy — all this has led to emergence of new competitors offering alternate underwriting models such as Lemonade, Guevara, and Friendsurance. Upstarts such as Claimable seek to intermediate to provide easy claims settlement experience to end consumers. New arrangements such as Managing General Agents, catastrophe bonds and collaterized reinsurance are disaggregating the carrier/re-insurer relationship: now carriers can go directly to capital markets, and re-insurers can strike up business arrangements with startups focused on customer acquisition.   The neatly linked insurance value chain is slowly moving to horizontal stacks based structure (see BCG’s Philip Evans’ idea of stacks here).

Insurance is different from traditional financial services in that players in insurance, unless they are pure brokers, have to take on element of risk and hold associated capital, all of which comes with ton loads of regulatory requirements. Due to these reasons, insurtech has been slow to penetrate the $6 trillion insurance industry goliath. The pace however may accelerate. According to McKinsey, automation could leave up to 25 percent of the insurance industry’s current full-time positions consolidated or replaced over the next decade (see Automating the Insurance Industry). If nothing else, carriers should do everything in their power to prevent disintermediation, which will be the topic of the next installment in the fintech series.

The DevOps Movement

The DevOps movement has been resurgent in the past few years as companies look to improve their delivery capabilities to meet rapidly shifting market needs and business priorities.  Many have been preaching how companies should actually become not just robust and agile, but in fact “anti fragile” with the ability to expect failures and adapt to them.  The likes of Google, Amazon and Netflix embody this agile and anti-fragile philosophy, and traditional business houses facing increasingly uncertain and competitive markets want to borrow a chapter from their books and become agile and anti-fragile as well, and DevOps is high on their list as a means to achieve that.

blue devops_4

DevOps is a loose constellation of philosophies, approaches, work practices, technologies and tactics to enable anti fragility in the development and delivery of software and business systems.  In the DevOps world, traditional software development and delivery with its craft and cottage industry approaches is turned on its head.  Software development is fraught with inherent risks and challenges, which DevOps confronts and embraces.  The concept seems exciting, a lot of companies are talking about it, some claim to do it, but nobody really understands how to do it!

The much available literature on DevOps talks about everything being continuous in the DevOps world: Continuous Integration, Continuous Delivery and Continuous Feedback.  Not only does this literature fail to address  how the concept translates into reality, but also it takes a overly simplistic view of the change involved: use Chef to automate your deployment, or use Jenkins continuous integration server to do “continuous integration”. To be fair, the concept of DevOps is still evolving.  However, much can be done to educate the common folk on the conceptual underpinnings of DevOps before jumping to the more mundane and mechanistic aspects.

DevOps is much more of a methodology, process and cultural change than anything else. The concept borrows heavily from existing manufacturing methodologies and practices such as Lean and Kanban and extends existing thinking around lean software development to the enterprise.  Whereas the traditional software development approach is based on a “push” model, DevOps focuses on building a continuous delivery pipeline in which things are “pulled” actively by different teams as required to keep the pipeline going at all times.   It takes the agile development and delivery methodologies such as Scrum and XP and extends them into operations so as to enable not just agile development, but agile delivery as well.  And it attempts to address the frequently cantankerous relationship between those traditionally separated groups of development and operations into a synergistic mutually supportive one. Even within the development sphere, DevOps aims to bring various players including development, testing & QA, and build management together by encouraging teams to take on responsibilities beyond their immediate role (e.g., development taking on more of testing) and empowering traditionally relegated roles to positions of influence (e.g., build manager taking developers to task for fixing broken builds).

We are still in early days with the DevOps movement, and until we witness real life references and case studies of how DevOps has been implemented end-to-end, learning about DevOps will be a bit of an academic exercise.  Having said that, some literature does come close to actually articulating what it means to put in practice such concepts as Continuous Delivery and Continuous Integration.  To the curious, I would recommend the Martin Fowler Signature Series of books on the two topics.  Although agonizingly technical, the two books do a good job of getting down to the brasstacks. My future posts on DevOps will be an attempt to synthesize some of the teachings from those books into management summaries.

Big Data Technology Series – Part 5

Big dataIn the last three installments of the big data technology series, we looked at the historical evolution and key developments in databases, BI/DW platforms and statistical computing software.  This discussion will provide us with a good foundation to understand some of the key trends, developments, solution landscape and architectures that are taking shape today in response to the big data challenges.  This installment in the series will focus on understanding an outline of major trends in the data management and analytics platform  space.

As we saw in Part 2, innovations and recent advances have greatly changed the database management technology platforms, with the emergence of data stores for unstructured data and distributed large-scale data management architectures.  Part 3 focused on how the traditional BI/DW technology platform appliances have emerged to be a critical component of a corporation’s enterprise architecture supporting management decision-making and reporting needs.   Finally, Part 4 discussed how statistical computing platform have evolved on their own to support the advanced analytic and data mining needs of the business.  Technical developments in these three area are being increasingly intertwined,  with those in one area affecting and reinforcing the ones in another.  Falling cost of hardware, increasing sophistication of the software, and the rise of big data sets is driving new paradigms and thinking on technologies and their architecture for how data should be managed and analyzed.  This new thinking is challenging and extending the way things have been done traditionally.

The graphic below describes some of the key trends in the data management and analytics tools landscape.

Big Data Analytics Platform Trends

Enterprise data management architecture is changing and evolving in various ways due to emergence of big data processing and supporting tools, however there are a few key takeaways about big data architectures:

1) Open Scale-out Shared Nothing Infrastructure

As the demands for data storage and processing grew with the advent of the modern-day Internet, vertical scaling began to be used for managing the higher storage requirements.  In vertical scaling, resources such as processing power or disk are added to a single machine to match the higher processing requirements.  New architectures, such as database clustering in which data is spread out amongst a cluster of servers, were adopted.  MPP appliances provided scalability to process massive data sets across a cluster of high-end proprietary servers.  Hardware improvements that have happened over the past many decades however brought down the price/performance ratio of x-86 servers to the point where companies started using these machines to store and process data for day-to-day operations.  The usage of cheap x-86 machines for data processing was pioneered by new age information companies such as Google and Amazon to store and manage their massive data sets. Modern day scale-out architectures leverage x-86 servers with open source standard configurations using industry standard networking and communication protocols.  In fact, many modern-day data analytics platforms are basically a software platform that are certified to run on a cluster of commodity servers with a given configuration.

2) Tailored Data Management Architecture

The hugely successful relational model forms the basis of a majority of enterprise data computing environments today.  In spite of the variety of uses cases that the relational model has been used for, it has its set of shortcomings.  Database innovation in recent years has focused on tools and techniques to store unstructured data using non-relational techniques.  A raft of database management tools for such data have emerged in the past decade.  Alternative forms of data storage are being increasingly used e.g. columnar databases that storage data indexed by columns rather than rows.  Similarly, a number of innovative data storage solutions such as SSD based storage have come out.  These innovations have created a plethora of data management system options each of which is optimized to handle a specific set of use cases and applications.  Enterprise data management architectures are moving from using “one size fits all” relational database systems to using a “tailored” combination of relational/non-relational, row-oriented/column oriented, disk based/memory based etc. solutions as guided by data workloads’ characteristics and processing needs.

3) Logical Enterprise Data Warehouse

Traditional BI and DW platforms have been successful at delivering decision support and reporting capabilities with structured data to answer  pre-defined questions.  Advanced analytics solutions have traditionally been delivered using proprietary software and high-end hardware platforms.  Relational databases have typically been used to manage transactional data.  This picture is slowly evolving due to falling hardware  costs and rise of big data needs, and the consequent emergence of unstructured data processing solutions and new big data analytic platforms.  Unstructured data stores such document stores are slowly making their way into the enterprise to mange unstructured data needs.  The new analytic platforms provide a powerful suite of tools and libraries based on open source technologies to run advanced analytics supported by a processing layer and query optimizer that leverages scale-out distributed architectures to process data.  The enterprise data architecture is thus slowly evolving and increasing in complexity as companies leverage myriad data storage and processing options to manage their data needs.  In response to these developments, Gartner coined the concept of “logical data warehouse”, essentially an architecture in which the concept and scope of the traditional warehouse has been expanded to include the new data processing tools and technologies, all abstracted by a data virtualization layer.

The database and analytic platform market continues to evolve, and successful enterprise data architecture patterns to manage the big data needs are just emerging.  In the next installment of the big data series, we will look at some of the key capabilities of a big data analytics platform and some major players in the market.


Big Data Technology Series – Part 4

Big dataIn the last installment of the Big Data Technology Series, we looked at the second thread in the story of big data technology evolution: the origins, evolution and adoption of systems/solutions for managerial analysis and decision-making.  In this installment, we will look at the third and the last thread in the story:  the origins, evolution and adoption of systems/solutions for statistical processing.  Statistical processing solutions have been evolving independently since the 1950s to support analyses and applications in social sciences and agribusiness.  Recently, however, such solutions are increasingly being applied in commercial applications in tandem with traditional business intelligence and decision-making solutions, especially in the context of large unstructured datasets.  This post is an attempt to understand the key evolutionary points in the history of statistical computing with our overarching goal of better understanding today’s big data technology trends and technology landscape.

See the graphic below that summarizes the key points in our discussion.

Statistical Packages History

The use of computers to manage statistical analysis began in the 1950s when FORTRAN was invented, making it possible for mathematicians and statisticians to leverage the power of computers.  Statisticians appreciated this new-found opportunity to leverage computers to run analyses, however, most of the programs were developed in a labor intensive and heavily customized one-off fashion.  In the 1960s, work commenced to use languages such as FORTRAN and ALGOL to formulate high level statistical computing libraries and modules among the scientific and research community.  This work resulted in the emergence of the following popular statistical packages in the 1960s:

  • Statistical Package for Social Sciences (SPSS) for social sciences research
  • Biomedical Package (BMD) for medical and clinical data analysis
  • Statistical Analysis System (SAS) for agricultural research

These packages rapidly caught on with the rest of scientific and research community.  The increasing adoption prompted the authors of these packages to incorporate companies to support commercial development of their creations.  SAS and SPSS were thus born in the early 1970s.  These statistical processing solutions were developed and adopted widely in academia and also in industries such as pharmaceuticals.  The rapid adoption of software packages for statistical processing thus gave rise to the “statistical computing” industry in the 1970s.  Various symposia/societies, conferences and journals focusing on statistical computing emerged during that time.

Statistical processing packages expanded and developed greatly through the 1970s, however, they were still difficult to use, as well as limited in their application due to their batch-oriented nature.  Efforts were undertaken in the 1970s to provide a more real-time and easy to use programming paradigm for statistical analysis.  These efforts gave rise to the S programming language, which provided a more interactive alternative to traditional FORTRAN-based statistical subroutines.   The emergence of personal computing and sophisticated graphical functionality in the 1980s were welcome developments that enabled real-time interactive statistical processing.  Statistical package vendors such as SAS and SPSS extended their product suites to provide this interactive real-time functionality e.g. SAS came out with their JMP suite of software in the 1980s.

Another major related development that happened in the 1980s was the emergence of expert systems and other artificial intelligence (AI) based techniques.  AI had been in development for some time, and in the 1980s received much hype as a set of new techniques to solve problems and create new opportunities.  A field of AI, machine learning, emerged and developed in the 1980s as a way to predict outcomes and results based on prior datasets that a computer could analyze and “learn from”.  The application of such machine learning techniques to data gave rise to the new disciplines of “knowledge discovery in databases ” (KDD) and ultimately “data mining”.

AI did not live up to its hype into the 1990s and experienced much criticism and funding drawdown.  However, some AI/machine learning techniques such as decision trees and neural networks found useful application.  These techniques were developed and productized by several database/data mining product vendors in the 1990s. Data mining solutions started appearing in the marketplace along side traditional business intelligence and data warehousing solutions.

The open-source movement appearing in the 1990s as well as the rapid advancement of the Web impacted the world of statistical computing.  The R programming language, an open source framework for statistical analysis modeled after the S programming language, emerged in the 1990s, and has become wildly successful since, giving rise to a plethora of open source projects for R-based data analysis.  The increasingly large and unstructured datasets that started emerging in the 1990s/2000s prompted the rise of natural language processing and text analytics.  The modern analytic platforms that emerged in 2000s incorporated these new developments as well as new and advanced machine learning and data classification techniques such as support vector machines.

The statistical processing platforms and solutions continue to evolve today.  As computers have become cheaper and increasingly more powerful, several product vendors have adapted niche traditional statistical processing techniques and tools to increasingly varied and large datasets. Through open source libraries, development environments and powerful execution engines running across massively parallel databases, the modern analytic platform of today provides capabilities to meld traditional data analysis with statistical computing tools and techniques.  We will witness more of this convergence and integration as these analytic platforms and supporting technologies continue to evolve.

Having examined in  detail the three threads in the story of big data technology, we are now in a position to better understand the current trends and makeup of the modern analytic platforms.   In the next installment of the Big Data Technology Series, we will shift gears and focus on current trends in big data analytic technology market place and core capabilities of a typical big data analytic platform.