Software Consulting

Too Broken to Fix

How Appreciative Inquiry can be Applied to Resolving Software Problems

When you work in the software business, it is inevitable that you will ship a product, product version or feature that should never have gone out the door. We, as software consumers, have watched as large, successful companies such as Microsoft, Apple, and Google scramble to to patch a flaw mere days or weeks after introducing it in products released with much anticipation and fanfare. If companies with this kind of scale can slip up this way, then the chances are even greater for us little folk. Indeed, all software companies run this risk, but there are rarer instances when a particular product or feature was rushed, released prematurely because of some date or deadline, or otherwise improperly prepared, and a product or component is released that is so defective that a quick follow-on patch alone does not resolve the issue completely. Such problems can be particularly hard to get ahead of when you ship enterprise software to be deployed on-premesis at the customer site, but even software deployed as a service can be a challenge.

At one enterprise software company I worked with, I often used an Apollo 13 analogy when describing potential immediate term solutions to very visible customer problems. Recall that Apollo 13 was damaged by an explosion en route to the moon, and through the amazing efforts of engineers on Earth working with the astronauts, all three men returned safely. After the explosion, the astronauts used the Lunar Module as a 'life raft' as it had its own supply of oxygen. Normally, lithium hydroxide canisters in the Command Module were used to remove CO2 from the air, but with no power these modules failed. NASA controllers devised a system of using the canisters with the Lunar Module's power, a fan from a space suit, plastic bags and tape; the astronauts then made the device.

Apollo 13

At this company, the analogy was most appropriate for several product components that suffered both from poor architecture and premature release. Though management was hyper-focused on finding and fixing the specific defects in the systems involved, I advocated taking stock of the available materials and focusing on immediate relief of the most dire symptoms of the problem. Sure, I figured, such solutions may not be elegant, they may in fact, be kludgey, but they do work. Imagine the result if, after the ground controllers had radioed the Apollo 13 crew with their procedure for constructing a new CO2 scrubber from parts available on the spacecraft, the astronauts had balked, saying the plastic bag and duct tape made the solution too inelegant to deploy.

What is Appreciative Inquiry?

Until a little while ago, I thought I had come up with a unique new analogy for this problem, but as with so many things, some smart people had already been hard at work analyzing and classifying the type of thinking that led to the Apollo 13 solution. I refer specifically to the field of Appreciative Inquiry.

A common human tendency, especially in the face of large problems or catastrophic failures, is to ask "What's wrong?" or "What needs to be fixed?" In fact, in the minutes immediately following the Apollo 13 explosions, pandemonium broke out on Earth as Houston ground crews attempted to find our what went wrong, what was broken, and even whose fault it was that it broke. It wasn't until they stopped trying to quantify and "fix" what was wrong, stepped back, and took an assessment of what still worked on the spacecraft, that solutions began to emerge. Although the term had not yet been coined, this is a classic example of Appreciative Inquiry, an alternative to traditional problem solving.

Today, the term is most frequently used to describe a technique or philosophy for organizational development and improvement, but at it's heart, the concept can be applied anywhere problematic, complex systems are in place. The differences between traditional problem solving and Appreciative Inquiry can be summarized as follows:

Traditional Problem Solving Appreciative Inquiry
Identification of problems
(i.e.: What is broken?)
Identify the best elements of "what is"
(i.e.: What is working?)
Analysis of root causes Envision: what might be done
Analysis of possible solutions Dialogue: what should be done
Action planning (treatment) Innovate: Decide what will be done and do it

Proponents of Appreciative inquiry believe that focusing on what is working is a more motivational approach that leads to more positive outcomes. They argue that excessive focus on dysfunctions can actually cause the problems to get worse or fail to become better.

Traditional Problem Solving

Software developers tend to apply traditional problem solving to the most dysfunctional components of their software products. Over the long term, resolving defects in code does stabilize the product, but in the most severe cases, this stabilization happens over many months. In addition, hyper-focus on defects in the software does have side-effects. These include:

Despite the side-effects, traditional problem solving works fine in most cases, but what if there are components of the product that were poorly designed and implemented, but released anyway? Can some software components be too broken to fix? And, if so, what can be done to resolve customer issues? Could Appreciative Inquiry be used to resolve customer issues by re-composing systems that work into a new solution that avoids the problems of the old?

An Example

Recall the enterprise software company previously mentioned. This company had shipped a reporting system based on an in-house developed data warehouse. This component suffered from poor design, severe scaling problems, data inaccuracy, and data corruption and loss. Like most other companies with products that are known to be flawed, management's strategy was to invest as many resources as was feasible in identification and resolution of defects according to a traditional problem solving approach. While minor improvements were made at the cost of many months (actually years in this case), overall the system failed to become significantly better.

Meanwhile, the Customer Support department was left to handle the inevitable customer complaints. Since this was a reporting system, this often meant trying to come up with alternative methods for providing customers the information they needed. The Customer Support team decided to work around the flawed reporting system entirely as a temporary, band-aid fix. They were able to satisfy customers with data retrieved from the system prior to it being loaded into the data warehouse. Of course, executing reporting queries on an active OLTP database is non-optimal. There are performance impacts in making such choices, but in this case testing determined that the benefits outweighed the potential negative impact.

Although they probably didn't know it at the time, there is an interesting parallel between what the Customer Support team did and Appreciative Inquiry.

Ultimately the development team formalized the Support Team's approach and used it to buy time - they then scrapped the old system and replaced it with a new one.

Conclusions

Returning to our Apollo 13 analogy, it is important to note that neither the ground controllers nor the astronauts actually fixed the severely damaged Command Module. Indeed, that wasn't even the goal - the goal was to bring the men home, the tools used were whatever was on the spacecraft that still worked.

I firmly believe there is a role for this alternative to traditional problem-solving in software development organizations. Software developers free to pursue such solutions can move towards deprecating the more problematic systems by resolving the immediate customer need with more short-term solutions that build upon what already works. This provides the breathing room to innovate over the longer term, including the ability to re-develop a problem component from the ground up.

Of course, there must be a balance between traditional approaches and this process. Sometimes a bug is just a bug, and the best solution is to find and fix it. But if we can all agree that it is possible that a software system is just too broken to fix, then it is comforting to know that we have a methodology in place to address the situation rapidly and positively.


comments powered by Disqus