The Five D’s of Fintech: Introduction

FinTech“Fintech” (a portmanteau of financial technology that refers to the disruptive application of technology to processes, products and business models in the financial services industry) is coming of age: two of the most prominent Fintechers, OnDeck and Lending Club, have gone public, many more are processing transactions to the order of billions of dollars, and outfits providing market intelligence in fintech are cropping up – there is even a newly minted index to track activity in marketplace lending. Banks are increasingly taking note of the Fintech movement, partnering with startups, investing in them or even acquiring them outright. Venture funding in fintech grew by 300% in one year to $12 billion in 2014.  According to the Goldman Sachs’s “Future of Finance” report, the total value of the market that can potentially be disrupted by Fintechers is an estimated $4.3 trillion.

Fintech is a complex market, spanning a broad swath of finance across individual and institutional markets and including market infrastructure providers as well. It is a broadly defined category for upstarts who have a different philosophy around how finance should function and how it should serve individuals and institutions. While some Fintechers seek to reduce transaction fees and improve customer experience, others exist to provide more visibility into the inner working of finance. In spite of this diversity, there are some common threads and recurring themes around why Fintech firms exist and what their market philosophy is. The 5 D’s of Fintech – Democratization, Disaggregation, Disintermediation, Decentralization and De-biasing – represent common themes around the mission, business models, values, and goals of many of these firms. In this series of posts on Fintech, we will look at each of the 5 D’s of Fintech, starting with Democratization — the mission of many a Fintech firm.

The Five D’s of Fintech

Fintech Slides


Technology has long enabled democratized access to financial services, however Fintech is taking the movement to another level by targeting specific market niches with customized value propositions. A central appeal of many Fintechers is their promise to bring to the masses resources and capabilities which heretofore have been the preserve of the wealthy, elite, or the privileged. This has been made possible by both by market opportunity and internal capability: market opportunity of serving a market whitespace, and the ability to do so economically through the use of data and advanced technologies.

The financial inclusion that Fintechers are now enabling is driven by their ability to clear obstacles, remove barriers, and enable access where none existed before, whether it is serving the unserved or underserved SMBs that have typically been shunned by traditional banks (Funding Circle), providing credit to the underbanked segment lacking the traditional credit scores (Kreditech), enabling investment advice without the need to rely on expensive financial advisors (Nutmeg or Betterment), or facilitating access to the capital markets by offering low-cost brokerage services (Robinhood). Financial services are now “for the people” and “by the people” as well: Quantiacs, a fintech startup with the aim of revolutionizing the hedge fund industry, is essentially a market place for quantitative trading strategies that enables anyone to market their quantitative skills and trading strategies. Or OpenFolio, which is an online community that allows one to link portfolios and measure investment performance against their communities and relevant benchmarks. Wealth management perhaps is a market ripest for democratization as shown by rapid emergence of a raft of outfits such as HedgeCoVest and iBillionaire (platforms that allow investors to mirror the trades of hedge funds and billionaires, respectively), Loyal3 (which offers no fee access to IPOs), Algomi and True Potential (which undo trading obstacles for investors).

As Vikas Raj with Accion Venture Lab notes, the real potential of fintech is in democratizing access to finance for the billions of low-income unbanked population in the emerging markets. The high complexity and low scale nature of this market is exactly the kind Fintechers are good at capitalizing on, and this is evident from the long list of companies that is emerging in this market beyond Silicon Valley and New York. Where traditional finance and government agencies have failed, Fintech has the promise and the potential to excel.

Other industries can learn a lot by observing how Fintech is driving democratization in finance. Whether it is healthcare, education, media or government services, there is potential value in market segments that are currently un/under served which a Fintech like movement can unlock. Adopting the technologies underlying Fintech is part of the story; what is needed first is the recognition of the potential for change, the support from the markets, and an entrepreneurial spirit to lead the movement.



A New Kid on the Blockchain

Fintech, the application of information technology to the world of finance, is the topic of discussion in The Economist’s latest special report on banking (Special Report on International Banking).  Bitcoin was featured in one of the articles, but this time the focus is not on bitcoin the currency per se, but the blockchain, Bitcoin’s underlying protocol that enables distributed ledger management using cryptography and powerful computers spread across world’s data centers.


The blockchain, since its invention by Satoshi Nakamoto (the obscure inventor behind Bitcoin and its protocol), has taken the world of fintech by storm.  The blockchain is being touted as the next big thing, not unlike the Internet and its underlying protocols for communication, that has the potential to revolutionize everything from money transfer to real estate transactions and the internet of things. Blockchain, as a concept, is being bastardized to serve multiple applications, including communication, agreements, asset transfers, record tracking etc.   Numerous startups are cropping up that provide value added services on top of the original Bitcoin blockchain, such as CoinSpark, an Israeli startup that has devised a technology to add information and metadata to the blockchain, one application of which is being able to provide “notary” services to agreements and documents recorded on the blockchain.  There are other outfits, however, that are fundamentally trying to re-architect the original blockchain to make it better or to make it work for specific purposes.

Colored Coins, for instance, enables the storage and transaction of “smart property” on top of the blockchain. Smart property is property whose ownership is controlled via the blockchain using “smart contracts,” which are contracts enforced by computer algorithms that can automatically execute the stipulations of an agreement once predetermined conditions are activated. Examples of smart property could include stocks, bonds, houses, cars, boats, and commodities. By harnessing blockchain technology as both a ledger and trading instrument, the Colored Coins protocol functions as a distributed asset management platform, facilitating issuance across different asset categories by individuals as well as businesses. This could have a significant impact on the global economy as the technology permits property ownership to be transferred in a safe, quick, and transparent manner without an intermediary. Visionaries see many other exciting opportunities too, including linking telecommunications with blockchain technology. This could, for example, provide car-leasing companies the ability to automatically deactivate the digital keys needed to operate a leased vehicle if a loan payment is missed.

Ethereum is another outfit that has created technology to develop blockchain based smart contracts.  Ethereum—an open-source development project that provides a platform for developers to create and publish next-generation distributed applications—uses blockchain technology to facilitate the trading of binding smart contracts that can act as a substitute to conventional business documents. The technology allows the contracts to be traced and used to confirm business deals without the need to turn to the legal system.  Then there are outfits such as Ripple Labs that are devising their own blockchain like protocols to facilitate quick and secure money transfer.

Other blockchain innovation involves combining blockchain technology with conventional technologies.  IBM and Samsung are developing a blockchain-powered backbone for Internet of Things products called ADEPT that combines three protocols – BitTorrent (file sharing), Ethereum (smart contracts) and TeleHash (peer-to-peer messaging).  ADEPT is a blockchain powered secure communication and transaction protocol for devices.  When a washing machine, for example, is bought by a consumer, ADEPT will allow the washing machine to be automatically registered in the home-based network of things, not just sending out to/receiving messages from other registered devices, but also automatically initiating and fulfilling transactions on its own for say replenishing the washing powder by placing an order with the local grocery store.

These innovations are at the leading edge of the blockchain technology and it will be several years before their use will be become widespread, if at all.  In the meantime, more mundane application of blockchain has great potential to flourish. Future fintech entrepreneurs should not discount considering the blockchain as the grounds of their creative pursuits.  All that is needed is a “killer app” that will niftily apply the concept to solve a present-day problem.  Just as Marc Andressen’s NetScape Navigator unleashed a wave of innovation in the history of the Internet, so too will a blockchain startup in the world of distributed ledgers and asset registers.


For most business executives, the term “economics” conjures images of either simplistic supply-demand graphs they may have come across in Economics 101, or theoreticians devising arcane macroeconomic models to study the impact of interest rates and money supply on national economies.  Although businesses such as banks and financial institutions have maintained armies of economists on their payrolls, the economist’s stature and standing even in such institutions has been relatively short, limited to providing advice on general market based trends and developments, as opposed to actionable recommendations directly impacting the business bottom-line.  Even the most successful business executive would be stumped when faced with the question of how exactly economics is really applied to improving their day-to-day business.  All this may now be changing, thanks to a more front and center role of economics in new age businesses that now routinely employ economists to sift through all kinds of data to fine tune their product offerings, pricing and other business strategies.  Text book economists of yore are descending down from their ivory towers and taking on a new role, a role that is increasingly being shaped by availability of new analytic tools and raw market data.


Economists, especially the macro kind, are a dispraised bunch, with a large part of criticism stemming from their inability to predict major economic events (economists missed anticipating the 2008 market crash).  For this and other reasons (not least the Lucas Critique), macroeconomic modeling focused on building large scale econometric models has been losing its allure for some time.  Microeconomic modeling enabled by powerful data-driven micro econometric models focused on individual entities has been transforming and expanding over the past few decades.  The ever expanding use of sophisticating micro-models on large data datasets has caused some to perceive this as paving the foundation for “real time econometrics”.  Econometrics, the interdisciplinary study of empirical economics combining economics, statistics and computer science, has continually evolved over the past several decades thanks to advances in computing and statistics, and is yet again ready for disruption – this time due to availability of massive data sets and easy-to-procure computing power to run econometric analyses.  The likes of Google, Yahoo and Facebook are already applying advanced micro econometric models to understand causal statistics surrounding advertising, displays and impressions and their impact on key business variables such as clicks and searches. Applied econometrics is but one feather in the modern economist’s cap: economists are at the forefront in the sharing economy and “market design”.

A celebrated area of economic modeling and research that has found successful application in business is “market design” and “matching theory”, pioneered by Nobel prize-winning economists Al Roth and Lloyd Shapley.  Market design and matching theory is concerned with optimizing the pairing or matching of providers and suppliers in a market place based on “fit” that is driven by dimensions that go beyond just price, for example. Al Roth successfully applied game theory based market design and matching algorithms to improving a number of market places, including placement of New York City’s high school students, the matching of medical students with residency schools, and kidney donation programs.  The fundamentals of matching theory are being widely applied by economists today: many modern day online markets and sharing platforms such as eBay, Lyft etc. are in the business of matching suppliers/providers and consumers, and economists employed by these outfits have successfully applied those fundamentals to improving their businesses, increasingly with the aid of multi-dimensional data that is available in real time. Other market places, including LinkedIn (workers and employers) and Accretive Heath (doctors and patients) have applied similar learnings to improve their matching quality and effectiveness.  Airbnb economists analyzed data to try to figure out why certain hosts were more successful than others in sharing their space with guests, and successfully applied their learnings to help struggling hosts and also to better balance supply and demand in many of Airbnb markets (their analysis pointed out that successful hosts shared high quality pictures of their homes, which led Airbnb to offer a complementary photography service to its hosts).

Beyond market design, economics research is changing in a number of areas thanks to availability of large data sets and analytic tools in various ways as Liran Einav and Jonathan Levin of Stanford University outline in “The Data Revolution and Economic Analysis“.  One such area is measurement of the state of the economy and economic activity and generation of economic statistics to inform policy making. The issue with macroeconomic measurement is that the raw data produced by the official statistical agencies comes with a lag and is subject to revision.  Gross domestic product (GDP), for example, is a quarterly series that is published with a two-month lag and revised over the next four years.  Contrast this with the ability to collect real time economic data, as is being done by the Billion Prices Project, which collects vast amounts of retail transaction data in near real time to develop a retail price inflation index.  What’s more, new data sets may allow economists to shine the light on places of economic activity that have been dark heretofore.  Small businesses’ contribution to the national economic output, for example, is routinely underestimated because of exclusion of certain businesses. Companies such as Intuit, which does business with many small outfits, now have payroll transactional data that can be potentially analyzed to gauge the economic contribution of such small businesses.  Moody’s Analytics has partnered with ADP, the payroll software and services vendor, to enhance official private sector employment statistics based on ADP’s payroll data.

Conservatives and the old guard may downplay the role of data in applied economics, reveling in their grand macroeconomic models and theories.  To be fair, empirical modeling would be lost without theory.  However, data’s “invisible hand” in shaping today’s online markets and business models is perceptible, if not openly visible.  Economists of all stripes will be well advised to pay attention to the increasing role of data in their field.  Next time you see an economist, ask them to go take a course on Machine Learning in the Computer Science department, to pass on Google Chief Economist Hal Varian’s counsel – it will be time worth spent.

The Promise of Geospatial and Satellite Data

Stories of traders vying to compete and finding new ways to beat the market and eke out profits are nothing new.  But what has been interesting of late is the lengths and levels to which trading houses are now able to take the competition to, thanks to real-time data analytics supporting trading indicators and signals, purveyors of which include companies like Genscape and Orbital Insight.  Orbital Insight, a startup specializing in analytics and real-time intelligence solutions, was featured in a recent Wall Street Journal writeup (Startups Mine Market-Moving Data From Fields, Parking Lots, WSJ, Nov 20, 2014).  Genscape, a more established player, is another outfit that  employs sophisticated surveillance and data-crunching technology to supply traders with nonpublic information about topics including oil supplies, electric-power production, retail traffic, and crop yield. Genscape and Orbital are but a couple of players in a broad developing market of “situational intelligence” solutions that provide the infrastructure and the intelligence for rapid real-time data driven decision-making.  These two companies, however, are particularly interesting because they provide a view into the promise of geospatial and satellite imagery data and how it can be exploited to disrupt traditional operational and tactical decision-making processes.


Geospatial data is simply data about things and related events indexed in three-dimensional geographic space on earth  (with temporal data being collected too for events taking place across time).  Geospatial data sources include those of two types: GPS data that is gathered through satellites and ground-based navigation systems, and remote sensing data that involves specialized devices to collect data and transmit it in a digital form (sensors, radars and drones fall in this type).  Geospatial data is of interest to private corporations and public entities alike.  When triangulated with traditional data sources, personal data, and social media feeds, it can provide valuable insight into real-time sales and logistics activities, enabling real-time optimization.  On the public side, geospatial data can provide valuable information on detecting and tracking epidemics, migration of refugees in a conflict zone, or intelligence of geopolitical significance.  These are but a handful of use cases that can be made possible through the use of such data.

Once the preserve of secretive governments and intelligence agencies worldwide, geospatial and satellite imagery data is slowly but surely entering commercial and public domains, spawning an entire industry comprising outfits that build and manage the satellite and sensor infrastructure, to manufacturers and suppliers of parts and components that make up the satellites, and not least entities such as Orbital Insight that add value to the raw data by providing real-time actionable information to businesses.   Orbital Insight, for example, leverages sophisticated machine learning algorithms and analysis against huge volumes of satellite imagery made available by DigitalGlobe’s Geospatial Big Data™ platform, allowing for accurate, verifiable information to be extracted.  Outfits such as DigitalGlobe, Planet Labs, and Blackbridge Geomatics are examples of companies that are making investments to launch and manage the satellite and sensor infrastructure to collect detailed real-time geospatial data.  Google, not to be left behind in the space race, jumped into the market with its acquisition of SkyBox Imaging earlier this year.  SkyBox intends to a build a constellation of twenty-four satellites that will collect anything and everything across the globe.  What’s more, Skybox, unlike other companies such as DigitalGlobe, intends to make available all the data it will collect through its satellite constellation for public and commercial use.  But even companies such as SkyBox are not blazing the trail in the satellite business – there are numerous other start-ups that are vying to put into orbit low-cost and disposable nano satellites that will be much more smaller and cheaper to launch and manage.  These developments are only going to create and open up an even wider range of applications for private and public use than has been possible heretofore.

These are still very early days for commercial application of geospatial and satellite imagery data, and exciting developments are still ahead of us.  For one, the number and kinds of data sources that such applications may possibly need to be able to handle in the future will be exponentially higher: imagine a fleet of satellites, aerial drones, quadcopters and ground-based sensors all providing various kinds of data that could potentially be collated and flanged together.  So too will new algorithms and ways of storing and manipulating streaming data at mind-boggling scales, all of which may require a level of thinking beyond what we may currently have.


Data Platforms Set Off a Cambrian Explosion

The Jan. 18, 2014, edition of The Economist featured “A Cambrian Moment,” a special report on tech start-ups. The report discusses how the digital marketplace is experiencing a Cambrian explosion of products and services brought to market by an ever-increasing plethora of new start-ups. Precipitating this explosion is the modern “digital platform” – an assembly of basic building blocks of open-source software, cloud computing, and social networks that is revolutionizing the IT industry. This digital platform has enabled technology start-ups to rapidly design and bring to market a raft of new products and services.


A similar evolution is taking form in the world of data. Anabelle Gawker, a researcher at Imperial College Business School, has argued that “platforms” are a common feature of highly evolved complex systems, whether economic or biological. The emergence of such platforms is the ultimate result of evolving exogenous conditions that force a recombination and rearrangement of the building blocks of such systems. In the data world, this rearrangement is taking place thanks to the falling cost of information processing, the standardization of data formats, and maturation of large-scale connectivity protocols. The emergence of such data platforms has significant implications for a range of industries, not the least of which are data-intensive industries like healthcare and financial services. Like the digital platform, the data platform will give rise to an explosion of data-enabled services and products.

Early evidence already can be seen in the pharmaceutical and drug manufacturing industries. Drug manufacturers need thousands of patients with certain disease states for late-stage clinical trials of drugs under development. This patient recruitment process can take years. The Wall Street Journal reported that to speed up the recruitment process, drug manufacturers have turned to entities such as Blue Chip Marketing Worldwide, a drug-industry contractor that provides data-enabled solutions based on consumer data sets obtained from Experian, a data broker and provider of consumer data. Blue Chip uses sophisticated big data analyses and algorithms to identify individuals with potential disease states. Blue Chip’s services have already enabled a drug manufacturer to cut patient recruitment time from years to months.

Trading is another industry where early indications of such movements can be seen. Also reported in The Wall Street Journal, in their never-ending quest to gather market-moving information quicker, traders have turned to outfits such as Genscape, a player in the growing industry that employs sophisticated surveillance and data-crunching technology to supply traders with nonpublic information about topics including oil supplies, electric-power production, retail traffic, and crop yield. Founded by two former power traders, Genscape crunches vast amounts of sensor data, video camera feeds, and satellite imagery to draw patterns and make predictions on potential movements in supply and demand of commodities such as oil and electricity.

We are in the early days of this evolutionary process. The evolution of the data platform will most likely mirror the evolution of the digital platform, although it is expected to proceed at a faster pace. As competing technologies and solutions evolve and mature, we will see consolidation and emergence of just a few data platforms that will benefit from the tremendous horizontal economies of scale. Information industry incumbents such as Google will be in the pole position to be leaders in the data platform space. For example, Google already has a massive base of web and social interaction data thanks to Google Search, Android, and Google Maps, and it is making aggressive moves to expand into the “Internet of things,” the next frontier of big data.

Concurrently, we will witness a broad emergence of a long tail of small vendors of industry-oriented data products and services. Blue Chip and Genscape are first movers in a nascent market. As the economics of harvesting and using data become more attractive, an increasing number of industry players will want to leverage third-party data services, which in turn will create opportunities for data entrepreneurs. The Cambrian explosion will then be complete and the start-up garden of the data world will be in full bloom.


Lab as a Service

The Wall Street Journal this week featured an article on Silicon Valley startups that are employing software and robotics to bring to market new models for managing discovery and pre-clinical research (Research Labs Jump to the Cloud).  It was interesting to read about companies such as Emerald Therapeutics that are offering cloud-based services that can provide end-to-end and precise design and execution of common pre-clinical phase experiments, analyses and assays (Emerald recently closed a Series B funding round with Peter Thiel’s Founder’s Fund).  Investor interest in companies such as Emerald indicates how there is serious promise in new technologies of disrupting even staid industries and functions otherwise thought to be impervious to technological advances.


Discovery and pre-clinical research in the drug development process is a phase before clinical trials.  Pre-clinical research is concerned with understanding the feasibility, toxicology and side effects of drug molecules, with the ultimate goal of building a feasibility and safety profile of potential molecules for further development in the clinical testing phase which typically invovles conducting experiments on human subjects.  The pre-clinical phase involves running finely controlled and detailed experiments, both in vitro (in which specific cells or tissue in test tubes and petri dishes are used to study the effects of drug molecules) and in vivo (in which experiments are conducted on entire living entities).  As such, these experiments involve a lot of iterative testing with common routine setup, execution and analysis of results.  Outfits such as Emerald hope to offer outsourced services that provide a way to automate these repetitive tasks thus improving turnaround time for running pre-clinical experiments and also researcher productivity by providing higher order services such as data analysis and reporting.

The potential of such “lab as a service” offerings is promising due to the confluence of three technology trends in life sciences and pharma industries: robotics, lab automation software and data analytics.  In the lab, machines such as shakers have been in use for a number of years, however, where robots come in is that they can take on increasingly complex and precise tasks traditionally performed by lab technicians, such as sample preparation and liquid handling.  Sophisticated robotic systems can now provide end-to-end automation for a complete procedure such as performing and analyzing a polymerase chain reaction.  Thanks to falling cost of hardware and sensors, rise of such technologies as 3D manufacturing, and smart software, robots have become a much more central piece of the lab automation systems.  The second key trend is the increasing sophistication of lab software.  Lab software has been used to automate different lab management processes such as specimen set up, data collection and analysis.  For example, software solutions such as Electronic Lab Notebooks provide a way for researchers and technicians to easily capture handwritten notes and analyses in a digital form.  These systems have traditionally been developed in a stand alone fashion, and it is only now that efforts are being made to integrate these into one so as to enable end-to-end processing.  Increasing automation does not just produce productivity benefits; it provides the ability to precisely capture and analyze data for the various variables that go into making an experiment and the outputs that are produced.  Sophisticated data analyses on data thus collected can provide insight into how variables impact the results and reproducibility of experiments.  This coupled with advanced simulation and predictive technologies can greatly inform planning of subsequent iterations, thus cutting down time to completion of the research phase.

Such lab-as-a-service offerings have the potential of democratizing access to expensive lab resources in the future; anyone with a credit card and an Internet connection will be able to source such resources to conduct experiments and get results.  In a world struggling to tame the scourge of ever evolving diseases and infections, this would be welcome development.

Big Data Technology Series – Part 4

Big dataIn the last installment of the Big Data Technology Series, we looked at the second thread in the story of big data technology evolution: the origins, evolution and adoption of systems/solutions for managerial analysis and decision-making.  In this installment, we will look at the third and the last thread in the story:  the origins, evolution and adoption of systems/solutions for statistical processing.  Statistical processing solutions have been evolving independently since the 1950s to support analyses and applications in social sciences and agribusiness.  Recently, however, such solutions are increasingly being applied in commercial applications in tandem with traditional business intelligence and decision-making solutions, especially in the context of large unstructured datasets.  This post is an attempt to understand the key evolutionary points in the history of statistical computing with our overarching goal of better understanding today’s big data technology trends and technology landscape.

See the graphic below that summarizes the key points in our discussion.

Statistical Packages History

The use of computers to manage statistical analysis began in the 1950s when FORTRAN was invented, making it possible for mathematicians and statisticians to leverage the power of computers.  Statisticians appreciated this new-found opportunity to leverage computers to run analyses, however, most of the programs were developed in a labor intensive and heavily customized one-off fashion.  In the 1960s, work commenced to use languages such as FORTRAN and ALGOL to formulate high level statistical computing libraries and modules among the scientific and research community.  This work resulted in the emergence of the following popular statistical packages in the 1960s:

  • Statistical Package for Social Sciences (SPSS) for social sciences research
  • Biomedical Package (BMD) for medical and clinical data analysis
  • Statistical Analysis System (SAS) for agricultural research

These packages rapidly caught on with the rest of scientific and research community.  The increasing adoption prompted the authors of these packages to incorporate companies to support commercial development of their creations.  SAS and SPSS were thus born in the early 1970s.  These statistical processing solutions were developed and adopted widely in academia and also in industries such as pharmaceuticals.  The rapid adoption of software packages for statistical processing thus gave rise to the “statistical computing” industry in the 1970s.  Various symposia/societies, conferences and journals focusing on statistical computing emerged during that time.

Statistical processing packages expanded and developed greatly through the 1970s, however, they were still difficult to use, as well as limited in their application due to their batch-oriented nature.  Efforts were undertaken in the 1970s to provide a more real-time and easy to use programming paradigm for statistical analysis.  These efforts gave rise to the S programming language, which provided a more interactive alternative to traditional FORTRAN-based statistical subroutines.   The emergence of personal computing and sophisticated graphical functionality in the 1980s were welcome developments that enabled real-time interactive statistical processing.  Statistical package vendors such as SAS and SPSS extended their product suites to provide this interactive real-time functionality e.g. SAS came out with their JMP suite of software in the 1980s.

Another major related development that happened in the 1980s was the emergence of expert systems and other artificial intelligence (AI) based techniques.  AI had been in development for some time, and in the 1980s received much hype as a set of new techniques to solve problems and create new opportunities.  A field of AI, machine learning, emerged and developed in the 1980s as a way to predict outcomes and results based on prior datasets that a computer could analyze and “learn from”.  The application of such machine learning techniques to data gave rise to the new disciplines of “knowledge discovery in databases ” (KDD) and ultimately “data mining”.

AI did not live up to its hype into the 1990s and experienced much criticism and funding drawdown.  However, some AI/machine learning techniques such as decision trees and neural networks found useful application.  These techniques were developed and productized by several database/data mining product vendors in the 1990s. Data mining solutions started appearing in the marketplace along side traditional business intelligence and data warehousing solutions.

The open-source movement appearing in the 1990s as well as the rapid advancement of the Web impacted the world of statistical computing.  The R programming language, an open source framework for statistical analysis modeled after the S programming language, emerged in the 1990s, and has become wildly successful since, giving rise to a plethora of open source projects for R-based data analysis.  The increasingly large and unstructured datasets that started emerging in the 1990s/2000s prompted the rise of natural language processing and text analytics.  The modern analytic platforms that emerged in 2000s incorporated these new developments as well as new and advanced machine learning and data classification techniques such as support vector machines.

The statistical processing platforms and solutions continue to evolve today.  As computers have become cheaper and increasingly more powerful, several product vendors have adapted niche traditional statistical processing techniques and tools to increasingly varied and large datasets. Through open source libraries, development environments and powerful execution engines running across massively parallel databases, the modern analytic platform of today provides capabilities to meld traditional data analysis with statistical computing tools and techniques.  We will witness more of this convergence and integration as these analytic platforms and supporting technologies continue to evolve.

Having examined in  detail the three threads in the story of big data technology, we are now in a position to better understand the current trends and makeup of the modern analytic platforms.   In the next installment of the Big Data Technology Series, we will shift gears and focus on current trends in big data analytic technology market place and core capabilities of a typical big data analytic platform.

The Platformization of Robotics


It was a treat to read the latest Economist special report on advances in robotics (Immigrants From the Future, March 29 2014).  Robotics is one of the technologies that suffer from “high visibility of its promises and near-invisibility of its successes”.  When people think about robots, they invariably picture something that is incomplete and bug-prone.  Yet, as the report discusses, robots are slowly but surely making their way into businesses and households.  The potential of robotic technology is highlighted by a recent string of acquisitions of robotics companies by Google, and Amazon’s announcement in November 2013 of their intention to use robotic drones for household package delivery.  Drones are already being extensively used by US military for ISR (Intelligence, Surveillance, and Reconnaissance) operations.

Robotics is getting industrialized:  robots are evolving from being quaint one-off creations to being standardized products of industrial technology.  This big push into robotics will be driven to a great extent by the falling cost of sensors, processors and other hardware that goes into making a robot.  However, what is going to accelerate the process is platformization of robotic technology that will form the foundation of cheap mass production in the future.  There are many evolutionary parallels here, between robotics and other modern technologies.  For example, the modern-day digital platform enabled by such technologies as cloud computing, open source software, and social networks has evolved to provide a quick and easy way for the proliferating technology startups to rapidly bring to market a variety of solutions through plug and play and a Lego like assembly of basic technology blocks.  Until recently, robot development was a cottage industry, demanding expertise in a number of fields such as artificial intelligence, sensor and engineering technology, and electronics.  Increasingly, however, robots can be designed, assembled and tested in a standardized and an automated manner.

New developments across the entire cycle of robotic development, including design, prototyping and operation, will enable this push into platformization.  Robots are nothing more than assemblies of various hardware and electronic modules controlled and coordinated by software.  Standardized robotic design will increasingly be aided by such developments as the emergence of the Robot Operating System (ROS), which provides a uniform way to enable the software-based control and coordination.  Open Source Robotics Foundation, a not for profit that manages the ROS, provides a forum for open source collaboration and development that will further drive the standardization and adoption of such building blocks in robotic design. Prototyping and testing is an important step in robot development since this is where the rubber meets the road.  Increasingly, teams are using sophisticated simulation software to predict actual performance of their designs, many times totally circumventing the need to actually build a prototype.  In the cases where there is such a need, teams have used 3D manufacturing techniques to quickly manufacture and assemble robotic parts, which has greatly improved the lead times and cost of building and testing robotic assemblies.  Finally, the operation of robots will increasingly be standardized and automated.  The “Internet of Things” and cloud based technologies will increasingly enable collaboration and functionality externalization that will lead to leaner and simpler operation: robots will collaborate and “learn” from each other as well as other connected devices, as well be able to tap into the vast data and online knowledge trove for all aspects of their functioning such as object recognition and decision-making.  Indeed, “cloud robotics” is an emerging field that envisions this convergence between robotics and cloud computing technologies.

Robotics is a fascinating field in that it offers unique insights into the human psyche and consciousness.  We are still afar from seeing a real life C-3PO among ourselves.  The developments in the field thus far, however, seem promising in that robots will continue to delight and surprise us in new ways for years to come.

The Real Buzz Behind Bitcoin

BitcoinBitcoin is in the news again.  The cryptocurrency, after making a splash in 2012, has of late earned the ire of investors and governments.  The digital currency has been one of the worst performing assets year to date per a recent article published on LinkedIn.  A popular Bitcoin trading exchange based in Japan abruptly went bust earlier  this year, taking with it millions of investor money.  Investors, as some like to say, have been “bitconned”.

While Bitcoin the currency has run into a raft of regulatory, fiscal and technical challenges, the enthusiasm around the potential of Bitcoin the platform remains unabated.  The Bitcoin platform that underpins the digital currency is essentially an automated, distributed, self-policing platform for managing ownership (See this series of YouTube videos on Bitcoin by Campbell Harvey, Professor of Finance at Duke’s Fuqua School of Business).  Essentially, Bitcoin’s platform provides foundational infrastructural services (such as encryption, non-repudiation , reconciliation etc.) for managing transactions related to ownership of digital assets in a distributed  and automated manner through the use of public key cryptography and distributed computing.  It offers a distributed consensus-driven model, so no central coordinating authority or overseer is required to validate and track transactions.  And it is automated, managed transparently by a distributed network of machines coordinating the work with each other.  If you are still confused as to what Bitcoin is and why people are so gung-ho about it, I don’t blame you.  The concept of Bitcoin is nifty with wide ranging and complex implications.  Perhaps an analogy will help.

The emergence and evolution of the Bitcoin platform is not unlike that of the modern day Internet.  The early days of computing were characterized by a prevalence of giant computing machines (the mainframes and the minicomputers) that packed all the computing power and resources for networking and storage.  Networking in those days was closed and proprietary, with each vendor managing their own stack of hardware and related software.  Getting computers to connect and talk to each other required a lot of intermediation, manual setup and ongoing management due to a centralized approach to computing and lack of clear and common communication standards.  The evolution and ultimate emergence of Internet changed all that.  The evolution of the TCP/IP protocols and standardization of other networking mechanisms provided a common platform for standardized communication.  The Internet took the complexity out of the task of creating complex distributed systems by providing foundational infrastructural services and guarantees.  Internet thus allowed us to transition from a centralized ‘command and control’ computing paradigm to one involving secure distributed computing.

Like the Internet, the Bitcoin platform attempts to provide infrastructural services and guarantees to the task of managing ownership transactions in an automated and distributed manner.  The institutions, contracts and arrangements needed today to manage ownership of such assets as say stocks or bonds (think of custodians and clearing agencies) are reminiscent of the mainframe based computing and networking era in our analogy.  The Bitcoin platform envisions to fundamentally change this picture from the ground up.  Essentially, the Bitcoin platform envisions to provide us an Internet like distributed platform for managing digital asset ownership.  As the Bitcoin platform and its protocols evolve and go through the growing pains, we will increasingly have a stable and solid platform for managing ownership of digital assets in an automated and decentralized manner.  Just as the Internet allowed the market to focus on creating higher order value-added stacks, products and services (think of the world-wide-web or email protocols such as SMTP), the Bitcoin platform has potential to provide the infrastructure not just for digital currency, but anything that can be digitized as an asset. Imagine the range of possibilities that a Bitcoin like platform can enable in the “Internet of Things” in which all things physical will have digital identities interconnected in a networked world.

The Bitcoin platform has already unleashed a wave of innovation, as demonstrated for example by the rise of other virtual currencies such as Colored Coins, which provides an abstraction layer to encode information for ownership of real world physical assets such as property, stocks, or bonds. The Bitcoin platform is still in its infancy, and there are a number of technical kinks related to security and scalability that need to be worked out.  Irrespective, being a platform technology, the Bitcoin phenomenon is a potentially major disruptive force that like the Internet can have far reaching consequences for entire industry structures and value chains.

“The Second Machine Age”

The Economist recently featured a book review on The Second Machine Age by Erik Brynjolfsson and Andrew McAfee , two academics with MIT’s Center for Digital Business.  The book is a wonderful treatment summarizing the impact of various technological revolutions of the past including steam engine and electrification, and more recently the so called “Second Machine Age” that began with the introduction of electronic computing in the 1960s.  Reading the first few introductory chapters reminded me just how important technology has been to economic growth and betterment of humanity as a whole.  If one were to plot the world GDP across time and overlay the major technological revolutions of the past (see graphic below), one cannot help but see the strong correlation between the two.

World GDP and Technology


What is interesting about the data that  Brynjolfsson and McAfee present, however, is that we are just getting started with the second machine age.  Advances in electric power generation and transmission are continuing to improve growth and productivity to this day, and there is no reason to believe that we have seen everything there is to see with advances enabled by electronic computing.  As the graphic from the book comparing the timelines of electrification era and second machine age era shows (see below), we are currently at the same level of productivity gains from electronic computing as we were in the 1930s with electrification.



The other important thing that Brynjolfsson and McAfee point out is how there is a lag effect between the time new technology is introduced and the time actual productivity benefits are realized.  When initially deployed, technology may help automate and improve operations to a certain extent, however true benefits accrue when managerial innovation fully exploits the benefits of the new technology, as happened with electrification when simply replacing steam engines with electric motors did little to improve productivity in the beginning, but later delivered great benefits as overall work flows and operations were improved to leverage the new technology.