Testing in a Continuous Delivery Environment

In his book Out of the Crisis, W. Edwards Deming cautioned “Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place”. Ever since, ‘Building Quality In’ has become one of the central tenets of quality focused lean initiatives, including lean software development. The act of testing in software development is an example of inspection: inspection to find bugs and faults in the developed software; static code analysis is another example of inspection. Quality is important in the context of software development because software bugs cost both users and software providers dearly: a study conducted on behalf of the US National Institute of Standards and Technology estimated the cost of software bugs to the US economy to be around $60 billion. Perhaps the extent of this scourge is not surprising since in many organizations software testing is not effective: testing/QA teams run “quality gates” as an afterthought and even then testing does not necessarily translate into quality. When Agile came around, practitioners came up with new approaches to testing, aptly described under the banner of “Agile Testing”, that provided some improvement by driving more collaboration across teams and bringing testing up in the development cycle. Now with the advent of DevOps, testing specifically has taken on a new level of significance since continuous delivery is not just about delivering software rapidly, but software that works as well. A few have even coined a term for this new discipline: continuous testing. All that is well, but what does testing mean in a continuous integration/delivery environment?

blue devops_4

In a continuous delivery (CD) environment, quality becomes the responsibility of all. This does not mean that the QA and testing teams do not have a role to play in a CD environment. On the contrary, the QA and testing function moves into a strategic role, providing oversight, direction and leadership for diving overall quality. For example, instead of spending countless hours running manual tests, QA teams will invest resources and time to develop and implement a comprehensive test automation strategy, or they will spend effort putting in place governance processes, metrics and incentives to drive quality at every step. An example of how quality becomes everybody’s responsibility is what the development staff would do in such an environment. Development teams in a CD environment are empowered to take on quite a bit of testing upon themselves. In addition to a ‘test first’ approach, developers may also be required run pre commit testing that runs a suite of unit, component and integration tests. Indeed many CI servers provide the capability for ‘private builds’, which allows an individual developer to see if their code changes can be integrated into the main trunk for a successful build. Pre commit testing should enable developers to conduct a quick ‘smoke test’ to ensure that their work will not break the code in the main trunk. Therefore, pre commit testing may contain a selection of integration and acceptance tests. Once the developer checks in the code to the CI server after pre commit testing, the CI server runs the commit stage tests, which includes performing any static code analysis as required, component and integration testing, followed by system testing. Commit stage testing results are immediately fed back to the development team to get any errors or bugs addressed. Successful commit stage testing increases confidence in the build’s ability to be a candidate for acceptance testing. Builds failing commit stage testing do not progress to the next stage: the acceptance testing stage.

Acceptance testing is the domain of business analysts and business representatives assigned to the project team. However, this should not mean that development staff do not have any involvement in acceptance testing. Successful testing in a CD environment gives developers more ownership in driving quality by allowing them to conduct automated acceptance tests in their development environments. Common obstacles to enabling this, such as insufficient licenses and/or manual deployment and setup processes, need to be removed. Acceptance testing is a critical step in the deployment pipeline: a release is deemed acceptable for deployment only if it passes the acceptance test stage. The entire team should focus on fixing acceptance testing issues for a given release. A CD environment requires acceptance testing to be automated as much as possible: a fully automated acceptance testing suite enables the tests to be run for a build as when needed – this speeds up the development process and also enables creation of a powerful suite of regression tests that can be run over and over again. Some tools even offer capabilities to encode acceptance test criteria and to programmatically drive creation of acceptance testing based on those criteria: thus testing, and hence ultimately delivered software, can never be out of sync with evolving acceptance criteria and requirements.

If the system under development is a high performance system, some capacity and performance testing may become part of acceptance testing as well. Usually however capacity testing and testing for other ‘non-functional requirements’ is separate stage in a CD deployment pipeline. Although a CD environment requires such tests to be as automated as possible e.g. through the use of Recorded Interaction Templates and other devices, the success criteria for such tests is somewhat subjective – so although a release may fail automated capacity testing technically, it may still be greenlighted to go ahead based on human judgment. Ultimately, as the release completes the non-functional testing stage gate, it may then be put through more of the traditional manual testing. This is where human testers can excel and apply their expertise in UI testing, exploratory testing, and in creating unique testing conditions that automated testing may have not tested the software for. Manual testing effort thus is one of the last stages in the testing pipeline in a CD environment.

If testing is to indeed become ‘continuous’ in nature, there are several critical factors that need to be in place. Perhaps the most critical one is test automation, which is often times slammed by some practitioners to be difficult to do or non-value added. Whatever the reservations with automation, testing in a CD environment cannot possibly be efficient and effective without automation of testing – especially since testing is done in big numbers and it is done quite often. Automation is just one piece of the various test design and execution strategies to make testing execute efficiently and thus be successful in a CD environment. For example, CD practitioners recommend a commit testing stage lasting no more than 10 minutes – a hurdle that can be met only by adopting such strategies. Automation also applies to provisioning of and deployment to environments. ‘Push button’ deployments and provisioning of test environments is critical if developers are to conduct smoke acceptance tests to quickly test their work. Similarly, test data needs to be managed effectively. Test design and test isolation need to be such that data requirements for testing purposes are fit for purpose and parsimonious : wholesale replication of production data is neither feasible nor recommended in a CD environment. Data management, like environment management, needs to be fully automated with configurable design and push button techniques.

Testing has the opportunity to move from being a reactive purely static analysis function to being a proactive quality focused initiative. Achieving this requires making a number of tough decisions related to processes, division of responsibilities and organization of the testing effort. Making these tough decisions in a traditional environment is many times a choice. Moving to a CD environment however mandates those decisions to be made, which should be reason enough for organizations to start examining today how they can evolve and improve their testing efforts toward that ultimate model.

Transparency in Pharma and Drug Industries

With the passing of the Affordable Care Act and the recent ratification provided by the US Supreme Court in King Vs Burwell for federal tax subsidies, there is now a strengthening movement toward providing more transparency and accountability in the health care industry in America. Pharma and drug companies, an important part of the health value chain, suffer from a number of transparency issues. While the ACA made some progress in enabling this transparency by mandating that pharma companies make public drug related payments made to doctors, much remains to be done on other fronts such as publication of clinical trials data and full disclosure of drug side effects. A few, such as Dr. Ben Goldacre, the founder of AllTrials, have launched public movements to campaign for more data transparency in the pharma and drug industry. More data transparency, however, can be a double-edged sword. There are benefits, however, there are practical considerations as well.


Providing transparency around clinical data can be valuable. When clinical data on a class of antidepressants called selective serotonin-reuptake inhibitors (SSRIs) was analyzed, an increased risk of suicide among adolescents from the use of SSRIs was discovered. Similarly, when the raw clinical data of Tamiflu was analyzed, Tamiflu’s efficacy in fighting viral infections and reducing hospital admission rates was brought into question. Like any large scale statistical analyses of data, clinical data analysis upon which drug companies, regulators and government agencies depend for risk evaluation and approvals, can have anything ranging from egregious mistakes to subtle biases. These can stem from a number of factors, including selection bias in the controlled trials, or mistakes in interpreting statistical significance. The latter, in which the statistical model either lacks statistical power (thus increasing the likelihood of false negatives) and/or the threshold for detecting significance is not enough number of standard deviations (thus increasing the likelihood of false positives), are fairly common in the scientific research community. Couple this with other exacerbating factors, such as research scientists lacking appropriate skills in advanced statistical analysis, a prevalent tendency toward publishing positive hypotheses as opposed to negative ones, and ineffective peer reviews of clinical research findings, and one has a perfect storm in which such mistakes can be fairly common. Greater transparency of clinical data allows any external third party to review and validate the research findings and thus bring to light any potential issues and insights that may have escaped the research team’s due diligence or the government agency’s regulatory scrutiny. Thousands of clinical trials have never been registered with oversight agencies and results from around half of all clinical trials remain unpublished. Making that data available to statisticians would almost certainly lead to new discoveries and clinically useful findings (quoted directly from an article in The Economist).

The noble intention behind the push for greater transparency however may not translate into desirable effects and worse may have unintended consequences. One of the biggest fears is inappropriate analyses and interpretations of the clinical datasets. In a litigious environment, with pharma and drug companies already battling an image of being inhumane greedy corporates least concerned with the ability of the people to afford exorbitant drug prices, this can spell disaster. And it may serve as a strong innovation disincentive for the pharma industry in the long term, with the opportunity cost of not experimenting with novel treatment techniques ultimately being borne by consumers in the form of shortened life spans and/or degraded quality of life. Even when there is less room for misinterpretation, practical challenges with replicating the results of experiments may prevent one from exactly reproducing the results of clinical trials. It is a well-established fact that experimenters employ tacit knowledge and improvisations that are not always captured in experimental setup and process steps. Furthermore, many research teams may use proprietary models to analyze and interpret raw clinical data to arrive at their conclusions – models that they may be averse to sharing with the public, but which nonetheless are critical to arriving at the proper conclusions. The cost of full data disclosure for drug companies is not even discussed many times, but there is a non-trivial cost component to retaining and disclosing such data to the public at large.

So, mandating full disclosure of raw clinical data is just one of the items in an entire menu that needs to be put in place if indeed the objective is to improve the safety, efficacy and efficiency of the pharma and drug industry. The field of biostatistics can go long ways in educating researchers on correctly employing and interpreting clinical datasets. Independent data monitoring committees to oversee the construction and execution of clinical trials to ensure appropriate application of analytic techniques could be in place to provide guidance as the experiments are being conducted. Big data and modern statistical analytic techniques could be developed further to provide researchers with means to more effectively analyze data. In the process of doing all this, if we can help prevent even minor mistakes or incorrect interpretations of drug data, we will have made medicine that much safer for mankind.