Thursday, July 11, 2013

Why Do Software Defects Exist? (Part 2)

(Originally published at www.servicevirtualization.com.)
In Part 1, I proposed that application release decisions are not actually time-based but are instead risk-based.  To summarize, when the Lines of Business demand a specific time to release (from project inception) the Project Lead considers the risk to the business of releasing the application at that time.  This is illustrated to the right.

Application Development Constraints

Let's take a quick look at the "risks to the risk."  These are also known as constraints of the application development process.  I'll start by describing them as they are defined in the book Reality is Overrated (link goes to Amazon).
  • Incomplete Development refers to the fact that software that a developer requires to validate their own code production is unavailable, requiring that developer to stub downstream systems.  The net result is that testing coverage early on is very sparse and puts the onus on the Quality Assurance team to find every defect, something that rarely happens if at all.
  • Infrastructure Unavailability refers to the fact that hardware that is required to run software that a developer requires to validate their own code is unavailable with the same result.  An additional side effect is that sometimes the infrastructure that is unavailable is required to run the developer's own code, meaning they are "dead in the water."
  • Third Party Access Fees refers to the fact that integration with other, internal applications (in situations where charge back policies are in effect) or external applications where fees are assessed for test accounts or, worse, per test transaction.  What I've personally seen happen is that a self-imposed availability constraint is put into effect to avoid "funding bleed" and the inevitable question, "why didn't you hire more consultants?"
  • Finally, Test Data and Scenario Management refers to the long turnaround times when production quality test data is requested.  The slow responses are due to the fact that the DBAs have other, higher priority activities to attend to typically plus the effort to scrub production data to remain in conformance with regulatory requirements (PII, PHI, etc.) is no small feat.
Availability is "Risk to the Risk"

I've defined these four constraints in these specific ways with the intention of highlighting that every one of them, at their core, is a problem of availability:  unavailable software components or applications; unavailable infrastructure; and unavailable test data.  It is this availability problem that is the "risk to the risk" for the following reason:  every time an availability problem manifests itself, a delay in the SDLC is introduced.

What does this mean?  This means that the original assessment of being able to, in our example, deliver the application at 6 months with 75% correctness is no longer valid.  Instead, the inability of the development team to complete their normal activities prevents them from ensuring correctness, so the point in time where 75% correctness is achieved may be 7, 8, 9 months or more from project inception.

The Net Impact

This puts the Project Lead in a bad position because they had the chance to negotiate timelines at the outset.  Now either the delivery date gets pushed in the name of risk while making the Project Lead and their management (who committed the date to the Lines of Business) look unreliable, or the project is released "on time" with quality that's further reduced.

Regardless of which road is taken, quality is not improving and frequently declines in applications released to production.  This adds to the risk that the business will suffer a production outage and ultimately has the possibility of materially impacting revenue; causing a decline in brand equity; or even resulting in shareholder lawsuits in severe enough instances.

Obviously, alleviating the availability constraint in any or all of its four forms has a snowball effect on the quality of the result.  It is the removal of this constraint that the discipline of Service Virtualization effects.  You'll find more excellent material on the industry leading Service Virtualization solution provided by CA Technologies at www.servicevirtualization.com as well in the few discussion groups on LinkedIn.

Monday, July 1, 2013

Why Do Software Defects Exist? (Part 1)

(Originally published at www.servicevirtualization.com.)

After my recent webinar (entitled Agile is Dead, replay is here, registration required but is free) I was having a follow up discussion with someone about it when the discussion turned to the nuances between what Agile actually promised and what people perceived it was supposed to deliver.  From my perspective, the simplest explanation is that Agile promised to help ensure that business requirements were being met while people thought it meant that applications would be produced with far fewer defects.  In the webinar, I described how Agile, if anything, increased the total number of defects due to its attempt to be more adjustable to the needs of the business mid-implementation.

The question was then asked:  are software defects inevitable?  If not then why do they exist?  We're not talking about an insignificant problem.  As I've often quoted, NIST produced a study in 2002 that illustrated a cost multiple of 30 to fix a defect discovered in production and another study that same year showing the net impact on the US Economy of production defects to be $60 billion.

To answer those first two questions, let's look at a few things. 

Businesses Exist to Generate Revenue

This seems obvious, but it's worth stating the obvious here since we're going to be chaining a few items like this together into a cohesive whole.  "But what about government agencies whose sole purpose is to provide free services?" you ask.  Let's redefine (slightly) the phrase "generate revenue":  by this I mean they are trying to increase the amount of cash flowing into the entity.  I hesitate to say "cash flow positive" because that has a specific accounting definition that isn't met here.  For our purposes, the ability to convince the Federal, State, and/or Local government that more money would allow them to produce better or more services is considered "generating revenue." 

Technology Isn't a Luxury

This also seems obvious, so let me explain why I'm pointing this out using a question:  could an accounting firm offer a legitimate service to potential customers using paper based ledger books only?  The answer is yes - they would be fulfilling the definition of "accountant" - but they would probably have no customers.

The reason they would have no customers:  manual data entry and calculations are slow, error prone, and prohibit value added services like quick financial analysis, etc.  Even I, the most accounting challenged individual in the entire world, stopped using my checkbook registry (the personal version of a ledger) years ago in lieu of an Excel sheet that I created because the latter let's me see, at a glance, where all of my money is going; perform cash flow analysis; and defend my argument that groceries are expensive to the point now that they are the modern day equivalent of highway robbery.

The net result of this is that technology is at the very least indirectly responsible for the influx of cash to a business, profit, non-profit, or government entity alike.

When Would You Release Your Software?

The next question I asked my conversation partner was, "If you were the only Amazon, when would you release the next version of the website?"  The answer was quick:  they said they would release it when the following two conditions are met:
  1. When the user's new needs were met
  2. When no defects exist 
The latter point needs some clarification.  I'm not describing the situation where no defects are found at UAT.  Instead, I'm talking about a theoretical point in time when the code could be mathematically proven to be 100% correct.

Obviously, that last point is no easy task (if it's achievable at all) and would take significant amounts of time to achieve.  And, unfortunately, you aren't the only "Amazon," i.e. you have competition. Therefore, when software is being developed a decision has to be made:
  • If I can't achieve "nerd-vana" by waiting until the code is 100% perfect, what is the highest probability that a critical function will work incorrectly that I am willing to accept, i.e. what's my risk threshold that a production defect will cause material harm to the business?
To illustrate that second point, "nerd-vana" for me may be 1 year for a new release but I'm willing to release it after 6 months with 75% correctness.

In part 2, we'll continue by examining how all of this amasses in a tidal wave of process problems that result in software defects being released to production in spite of all of the nasty side effects their presence causes.