Thursday, December 10, 2015

Stagnation Leads to Death

"Pró-gress must pro-gréss" - Unknown

Nowhere more than in the world of technology is the notion of velocity and acceleration more evident in terms of its ability to contribute to the success of a business.  It is imperative for companies to evolve if they intend on staying in business.  In fact, Gartner said in 2014:

What is your weakest link?
"In 2014, CEOs must focus on leading their organizations to think like and become more like 'tech' companies, because within a few years, digital business capabilities will dominate every industry. Urgent action is needed because first-mover advantage is common in digital business, and fast followers must be very fast." (CEO Resolutions for 2014 - Time to Act on Digital Business, published by Gartner in March 2014.)

Contrast this with the Fortune 500.  Established in 1955, it chronicled the evolution of the top 500 global companies.  But in spite of the implied size of the companies on this list, less than 20% of those companies are still in existence.  And although attribution to events such as M&A must be given, it is reasonable to assume that the majority of those companies are no longer around due to their own inability to adapt to the drumbeat of change in the marketplace.

Is this the inevitable fate of most companies?  I would argue that it isn't.  What is needed to avoid this is the ability to recognize the need to evolve.  As a great example of this, the vast majority of industry pundits were ready the shepherd the decline and ultimately the demise of the mainframe after the Y2K efforts had concluded.  Not only has the mainframe survived, though, its use has increased since that time.  (See the 2014 Mainframe Study published by BMC in January 2015.)

If the oldest computing platform can continue to evolve then this begs the question:  why haven't all legacy, general purpose computing applications also done the same?  A great example of this is workload automation.  Originally designed to be simply a job scheduling system, its purpose then was to simply coordinate the execution of tasks, or jobs, that were developed by other IT operations or application development teams.  The problem that has developed, however, is that the number of systems with which workload processes may possibly interact with have increased exponentially over the years.  And so organizations are faced with Sophie's choice:  either continue to retain staff that have specialized knowledge on systems that are now considered ancient or implement a costly modernization effort to remove those systems with more modern equivalents that, assuming such replacements exist, have a lower burden from a staffing perspective on an organization.

Even if such modernization potential exists, companies are frequently loathe to undertake them due to the heightened amount of operational risk they incur during the initiative's overt phases and immediately after until time has proven such modernization to have been successful due to the absence of production errors.  While data relating to this is scarce at best, we can see such reluctance exhibited within the realm of IT's newest darling, DevOps.  In 2013, IDG published a survey describing how 45% of its respondents said that application release automation was a key enabler for DevOps, yet only 11% had implemented such automation.  Considering that DevOps was born in 2008 the rate of adoption has been pitiful to say the least.

All is not lost, however.  Even though some workload automation systems are still antiquated in terms of capabilities, they are only antiquated in the sense that they have peers who have evolved over time.  Integration with existing infrastructure, monitoring capabilities, and the addition of operationally focused capabilities (like SAP system copy or processing of big data warehouses) exist in solutions produced by companies exhibiting thought leadership.

In summary, there is no need to settle for less.  While the tendency exists for company to mark time with their existing operational systems footprint, there is typically no requirement to do so.  If your systems are not providing the capabilities you need in a way that allows you to optimize your run-rate then perhaps you should be looking elsewhere for systems that do meet those requirements.

Tuesday, November 3, 2015

Provisioning For What Purpose? (Part 2)

In part 1, we discussed how provisioning is part of the overall process of releasing an application, and how application release is a specific case of process automation.  In this part, we are going to look at the general capabilities of an automation platform with the overall view of applying those capabilities to application release, service orchestration / provisioning, and workload / job scheduling.

The earliest use of automation can be traced back to IT Operations where Run Books were used heavily to "begin, stop, supervise and debug the system(s)" (from Wikipedia) in the Network Operation Center (NOC).  Run Books were initially index cards containing a set of instructions to accomplish a certain task; these were then listed on 8.5" x 11" paper; and ultimately moved to huge three ring binders due to the complexity of the underlying systems with which the operator interacted.

At some point, companies such as Opsware, RealOps, and Opalis recognized that well defined, mature processes were simply a set of repeatable steps with simple branch logic built-in.  They built products that were then sold to HP (in 2007), BMC (in 2007), and Microsoft (in 2009), respectively, to allow Run Book Scripts to be defined in a computerized form and then initiated via an operator console and, later, via integration with ITSM solutions.

From a capabilities standpoint, all automation (Run Book Automation [RBA] now often referred to as Process Automation, Workload Automation, or Release Automation) requires similar capabilities.  These are listed below:

Support for a broad number of platforms.  Distributed systems are widely used, of course, but the mainframe also didn't die as many "industry experts" predicted it would in the last decade.  Combined with the many Unix variants and even lesser known platforms like IBM's iSeries, all of these operating systems have enough market share that they cannot be ignored and should be supported.

Built-in integration with the surrounding ecosystem.  While being able to enter commands manually via some script window is no worse than writing BASH or WSH scripts, having built-in support for commonly used IT infrastructure (e.g. monitoring systems, typical system metrics such as CPU usage or free disk space available, web / application / database servers, etc.) allows the workflow designer to simply enter a few values in a data form and the underlying system takes care of translating the action to the underlying commands.

Parse and react.  Taking the output of executed commands or their result codes and either extracting values to be used in subsequent steps or branching based on those values is critical. 

Complex scheduling.  Email systems like Outlook standardized the use of calendars for scheduling meetings or tasks to be completed.  The use of scheduling in an automation platform, however, needs to be much more capable since automated IT processes are often run according to very complex scheduling rules.

Integration capabilities cannot be emphasized enough.  To illustrate the former, the need to query free disk space (for example) exists no matter what the ecosystem is so using high level, abstract commands frees the author from having to explicitly add support for new platforms as they are adopted by their IT department.  Instead, they can simply drag and drop a step called "query disk space" into their workflow and do not have to worry if the workflow will be running on Windows, Unix, OS/400, etc.

Similarly, the ability to support very complex scheduling rules is also a "must-have."  For example, end of month financial reporting may need to run on the last business day of each month (which varies in length) unless it is a holiday, in which case it would run on the next business day after that.  Rules like this cannot easily be expressed using "Monday, Tuesday, ..." or "Every n weeks" types of criteria that end users are typically familiar with.

Other core capabilities that are not automation specific are multi-tenancy, Role Based Access Control (RBAC), High Availability (HA), and auditing features.  These will not be discussed here due to their use in several other types of IT Operations systems with which you are ultimately familiar.

All of these capabilities do not belong in one type of automation system or another.  Instead, they belong in a core platform that can be utilized by several types of solutions to meet various business needs.  Whether it is something general (e.g.job scheduling or application release) or something  specific (e.g. processing large Hadoop datasets or copying your SAP system from one instance to another), having a feature rich automation-centric foundation ensures that all of your operations systems will not only meet your current needs but will also grow as your needs do.

Tuesday, September 22, 2015

Provisioning For What Purpose? (Part 1)

In the early part of the last decade, I ran a global education program at a mid-sized company called Softwatch.  In this role, my responsibilities required that I provide education to our partners around the world.  One such occasion had me traveling to Paris to teach a class with 20 students from various parts of Europe, and so my partner Matt Grob and I flew out on Saturday to spend the weekend creating a complete education lab from scratch.

Granted, it's not incredibly taxing when you're building a lab in Paris of all places.  The process, which was fairly straightforward, took us both days and then we had the nights to explore the Champs-Élysées, La Concorde, etc. not to mention eat and drink amazing culinary delights.

However, I digress.  The point I am trying to make is that it took us an entire weekend to build 20 computers in an identical manner.  It cost the company several hundred dollars to provide accommodations (hotel and meals) that would have been unnecessary if it were 2012 instead of 2002.

But it must be said that provisioning, for provisioning's sake, is never done - there is always a purpose. This task is performed to provide an environment that can run applications.  Maybe the applications are COTS (Commercial, Off The Shelf) or internally developed, but applications are the raison d'être.  Even in my story, we were building the machines so that my company's software could also be installed and run by the students during the class.

Provisioning, then, should be considered part of application release.

Many companies have missed the boat:  BMC's BladeLogic and CA's Automation Suite for Clouds are provisioning with no purpose.  They require other solutions to consume the produced machine, real or virtual, so that it may be incorporated into a bigger picture.  But customers no longer want building blocks - they want turn-key solutions - so these formerly bedrock solutions are now passé.  This is evidenced by the fact that both of these solutions are now buried deep within the websites of their respective owners.  I suspect similarly deep locations will be found on the websites of other vendors in this space.

Looking at the bigger picture, however, we see that provisioning is simply a sequence of steps to be followed:  
  1. Install the OS
  2. Install the application server, DBMS, or ESB
  3. Configure the application server, DBMS, or ESB
  4. Connect it to the rest of the infrastructure
  5. Deploy the application components on the box.  
The above steps define a process to be automated.

Automation is automation, regardless of how it's being used.  And all automation, regardless of its design, requires core capabilities that have withstood the test of time in terms of stability and scalability.

In part 2 we will look at some of these core capabilities that should be required features of any automation platform.

Monday, June 22, 2015

Application Defects are a Real Problem

This past weekend was my high school's 30th year reunion get together in my hometown in Beaufort, SC, which is home to Parris Island and the awesome Marine Corps Air Station across town.  My companion purchased the tickets for us both using reward miles on her airline of choice via that airline's website.  She is a frequent flier on that airline for professional reasons so her profile contained her rewards program number and her TSA pre-check number.  I, on the other hand, don't fly on this particular airline much so I don't belong to their rewards program.  In fact, I don't fly often at all, since my job typically keeps me in the greater NYC metro area where I can drive everywhere that I need to go so I never filed for a TSA pre-check number either.

Imagine my dismay, then, when browsing our boarding passes revealed that my boarding pass said "TSA pre" on it.  "That's odd," we thought, and were sure that some other system would flag that as a mistake.  That didn't happen, though, and I was allowed to waltz through the high speed line without taking my shoes or belt off and walk through the much more lenient metal detector rather than the full body scanner.

Still, we chalked it up to a fluke.  After all, my companion printed the boarding passes at home.  We wouldn't have that luxury for the return trip and would be required to use the kiosk at the ticketing counter, and we were sure that this wouldn't happen again.  It did, however, and once again I was able to make it through security with the greatest of ease.

I can't say with 100% certainty what the cause is, but it is most likely an application defect that resides somewhere in the backend systems of either the airline or Sabre (which is the largest Global Distribution Systems provider for air bookings in North America, according to Wikipedia).  Regardless, you can understand why this is a real problem since any person with malicious intent could exploit this easily with very harmful results.

We've all read various research reports on the impact of application defects, but here are a few numbers for you:

  • 29% of an application developer's time is spent in some part of the problem resolution process, whether that is root cause determination or remediation of the actual defect.[1]

  • 6.7 days elapse from the time a defect is first observed until it is fixed[1]

  • The annual impact of application defects on the U.S. economy is $60 billion.[2]

  • It costs 30 times as much to fix a defect if it is detected in production.[2]

[1] Forrester Research
[2] NIST

These numbers, while interesting, really hit home when you consider that organizations are already under intense pressure to release new features and functionality more quickly than ever before.  This is compounded by the fact that many organizations have still not fully made the transition from Waterfall to an Agile SDLC methodology, so the process of releasing these new features is still quite cumbersome.  When a lot of time is spent remediating application defects, that means less time is being spent developing code or performing more unit tests.

What can be done?  One of the obvious solutions is to switch to an Agile SDLC methodology.  Other viable solutions that address significant parts of the problem are:

Virtualize the backend systems.  Using a solution like CA Service Virtualization or a similar solution from IBM or HP, you can isolate live code so that defects uncovered during testing are more quickly found.  Furthermore, virtual services developed by such solutions eliminate the need to have access to downstream systems all of the time, allowing developers to work in parallel, resulting in savings of 30% or more time from inception to release.

Accelerate testing.  This means more than just developing test scripts that can be automated.  Automatic generation of test scripts - "automating the automation" - as well as the synthetic generation of test data that eliminates the multi-week wait time between regression suite runs that are required to allow the DBAs to pull a new copy of the production data and mask it so that regulatory standards are met.  CA Continuous Application Insight, CA Data Finder, BMC AppSight, and IBM Optum provide various capabilities in these areas.

Get insight into your application.  Applications are complex, and long gone are the days when people understood what happens after you issue a request to a downstream component or another application with which you integrate.  Being able to see the various components being invoked along with the data used in each invocation is invaluable to a developer, especially if the tester can generate a document containing this information when the defect is first observed.

Contrast that with the effort to reset the environment, re-execute the test case, get the defect to happen for the exact same reason, all while documenting everything that they were doing.  Forrester once reported that 25% of all defects are rejected as irreproducible and that nearly half of their respondents said they spend over an hour cumulative per defect documenting what happened.  CA Continuous Application Insight, CA APM, and BMC AppSight (and others, undoubtedly) provide various capabilities in this area.

The point is that application defects will always exist.  But there really aren't legitimate reasons why something as simple as the defect that I personally encountered have to exist.  And when the stakes are as high as they are when you have a plane full of passengers, it is the responsibility of every organization to ensure that the applications they produce are of the highest quality.

Thursday, March 26, 2015

Is Docker the End of Traditional Application Release Management?

Ever since its release in 2013, Docker has quickly catapulted into the enviable position of being the darling of every (operations manager's) eye.  If you've been vacationing on Mars since then or simply haven't been staying on top of the news releases such as the one that has heralded Microsoft's intention to support Docker on Windows (a cause célèbre for sure since Docker is originally a Linux specific platform), here is what you've missed.

Docker is a partitioning capability within the address space of an operating environment.  By allowing the partition to use the host OS directly even though that OS resides outside of the partition (known as a container), the start up time is substantially reduced as is the resource requirements for the management of the container.  (I hope my z/OS readers find this concept to be somewhat familiar.)

Typical Virtual Machine layout
(from www.docker.com)
Financial people love this because the cost of acquiring licenses for the operating environment can be substantially reduced since, theoretically, every component that is not part of the application itself can reside outside of the container.  This means only one Windows license needs to be procured versus one per VM, which is traditionally the modus operandi.

The concept is simple, but how does it work?

Essentially, a special file (called a dockerfile) contains one or more instructions on how a container is to be created.  The dockerfile is used as part of a (presumably) automated process to generate the container on the file system, which can contain as little as a single application and its associated binaries.  This container (a subdirectory in the file system) is transferred to the target environment as any set of files would be and is started there using the docker run time, which can be invoked via command line interface or an API (typically REST based but there are other implementations).  

System Administrators love this because containers are easy to deploy (XCOPY anyone?) and maintain (REST interfaces can be easily integrated into any modern Infrastructure Management platform).

So that's it, isn't it?  End of story?  We can now throw away all of our Application Release Management platforms like IBM's UrbanCode Deploy, BMC's Varalogix, or CA Technologies' Release Automation, right?  

Typical Docker layout
(from www.docker.com)
"Not so fast, pardner."  This concept falls down when people eye it as a substitute for true application release management.  More specifically, we can describe the latter using five of the six question words that everyone learned in high school English:

Who.  It should not be allowable for anyone in an organization to deploy an application to an environment.  In fact, even for those who should be allowed to do so, there are frequently others who have to approve the deployment decision.

What.  For organizations that truly embrace the concept of business agility, deploying a complete application every time is unacceptable.  Artifacts deemed as low risk (e.g. content updates) may be deployed immediately while higher risk artifacts will be queued up to be released after much testing and other validation.  Docker falls into this category but has limitations, which will be touched on below.

Where.  The target environment of a deployment is frequently different from every other possible target environment that an application will touch during its journey from development to production.  These differences are typically addressed by making changes to the configuration of the application after it has been deployed.

When.  Release windows are not a new concept.  Even in non-production environments, a case for establishing a release window could be made since environments are often shared among multiple teams within the same function or even across functions (i.e. testing and development may use the same environment).

How.  Probably the most problematic to fully integrate into an organization's operational capabilities, the process of deploying an application is far more than simply understanding how to install and configure it.  Integration with ITSM applications to ensure that change requests have been entered and are in the correct state, approval gates have been successfully completed, etc. have to be incorporated into the process of deployment so that the state of the operating environment is always well understood.

Of the five question words above, Docker only addresses one of them and not in the most effective manner possible.  Consider the scenario of a well known bank based in Europe.  They currently have in excess of a thousand production releases every month.  This was accomplished by recognizing that not all production releases are high risk.  In the example under What it was noted that certain types of artifacts had minimal impact.  As a result, the release of those artifact types could be expedited, which helped ensure that this bank's customer facing assets were always meeting the needs of their clientele.

If they were using Docker, however, the entire application would need to be rebuilt regardless of the types of artifacts that were actually approved for production release.  The risk that unapproved binaries could be released into production is simply unacceptable for most companies.  And this is only for one of the five items, above - Docker does nothing to address the other four.

To summarize, Docker is an exciting and (relatively) new technology that should be viewed as simply another mechanism that exists within the greater whole of application release.  But it should not be viewed as a replacement for a well defined methodology that not only includes the "what," but also includes "who," "where," "when," and "how".