"Given the undeniable trend towards all-encompassing change in software development, the case can be made that general purpose software is doomed to always be unreliable and buggy."




"Is this some sort of collective insanity that has somehow woven its way into our society?"

30.6.08

 

Why Software Projects Fail

OK, enough analysis and example. Whether you agree with my approach, my points, or even my page layout (yes, I know some of you hate it, but then again, it is my page right?), I think nearly all of us can agree on one point: Software has plenty of room for improvement.

In the big scheme of all things technical, programming is pretty new. Not only that, but since languages change over the course of a few years and new ones crop up regularly (Ruby and .Net for example), it remains pretty new. While traditional engineering disciplines have decades of trial and error, research, and incremental improvement, software is being re-invented about once every decade. Think about what environments you were working in ten years ago (that would be 1998). How different was it from what you are working in today? If you are on the web side of software development, chances are that your language of choice has changed along with many, many techniques and methods. In contrast, how is a civil engineer's job different today than it was in 1998? There are probably more computer-based tools involved, but I would think that the actual core tasks have remained the same.

I bring this comparison up not to further antagonize those who disagree with the "Software vs. Physical" comparison, but to show that the one constant in Software Development that works against it having a solid "engineering" standing: Change.

In many ways, change is the strength of software. When the typical application user was on a dial-up connection and actually connected to the internet to check their email (as opposed to "always connected" broadband), web development tools were designed to deliver lightweight, low bandwidth user interfaces. As broadband seeps out of city cores and into the suburbs, then rural areas, bandwidth becomes much less of an issue and rich user interfaces are now the goal. As the requirements for applications shifted, the software development world reacted and new, richer web development languages were created and became commonplace. This type of constant changing of the core fabric that makes up a discipline would not be possible in more "structured" disciplines such as mechanical, electrical, or civil engineering. Even if it were possible, it would not be accepted by the majority of the engineers who would be doing the work. For engineers, there is safety in standards and the time-proven techniques of the past. For software developers, there is safety in the new, "sky is the limit" world of constantly changing standards, protocols and techniques.

Given the undeniable trend towards all-encompassing change in software development, the case can be made that general purpose software is doomed to always be unreliable and buggy. I do not believe that view though. My view is that the reason software is not reliable is not that the techniques and languages are always "first generation".

It has been pointed out by several readers of this blog that reliable software does exist. They cite traffic control systems, military equipment, navigation systems, and nuclear power controls. So what is the difference between software that guides massive tankers through the ocean in the dark of night and that new desktop application that you just shelled out $59.95 for that keeps crashing everytime you try to "undo"? It is obviously possible to create reliable software, so why is most consumer-level software so bad while most critical infrastructure software so good?

This question could be answered from an economic standpoint. It could be a question of how much money a company is willing to spend to get a $60 piece of software completely error free. After all, if the worst thing that happens when it crashes is you lose the edits you were making to a photo, how much will the consumer (you) be willing to pay for reliability? If the software were absolutely reliable but cost $6000 instead of $60, would you be willing to pay $5940 for that reliability? In most cases the answer is a resounding "NO!". It would be silly to claim that cost does not play a factor in the level of reliability of any piece of software.

This would almost serve as a satisfying answer to the question of software reliability. It is reliable as you are willing to pay for. Almost. But what about the exceptions to the "more money = more reliability" axiom? There are many projects that I could use to illustrate my point and there are many that could be used to illustrate the opposite of my point. But I chose this example for a reason which I will hopefully make obvious.

There was a project that was undertaken by SAIC in 1990 which qualifies as one of these exceptions and will serve to illustrate my point. The project was an FBI modernization effort called "Project Trilogy", and later, the "Virtual Case File". This was a very well funded project whose prime contractor was a very experienced enterprise development company. Despite adequate funding though, the project fell further and further behind until it was finally scrapped altogether.

The factors that were cited as the prime causes of failure are well known by most developers who have experience on large projects:

  • Lack of a strong blueprint from the outset led to poor architectural decisions.
  • Repeated changes in specification.
  • Repeated turnover of management, which contributed to the specification problem.
  • Micromanagement of software developers.
  • The inclusion of many FBI Personnel who had little or no formal training in computer science as managers and even engineers on the project.
  • Scope creep as the requirements were continually added to the system even as it was falling behind schedule.
  • Code bloat due to changing specifications and scope creep. At one point it was estimated the software had over 700,000 lines of code.
  • Addition of more people and resources to the project as it was falling behind, which made it later still (Brooks's law).
  • Planned use of a flash cutover deployment, which made it difficult to adopt the system until it was perfected.

Of the causes cited, there are two real root causes:

1. Lack of planning.
2. Lack of change management.

Another project that met a similar, although not quite as fatal end is the Chandler project which was documented in the book Dreaming In Code. Envisioned as the ultimate personal information management application, it suffered from perpetually missed deadlines, runaway cost, and a serious tendency to wander in the code wilderness.

Many things can be blamed for projects like these, but it is clear from a careful review of the facts that a lack of funds was not the primary factor. Both of these projects were well funded from the start. There must have been other factors that doomed them. It is apparent that a big part of the problems are rooted in bad management decisions, but I think there is something more basic and more well-defined than bumbling managment that, had it been present, would have made the difference between collosial failures and at least modest successes.

I believe the thing that is missing in these projects and in software development in general which makes all of the difference is:

Change Management

So there you have it. You have been lured into a discussion that ends up being about a typically VERY boring topic. Boring as it is perceived though, I believe that Change Management, and its big brother Configuration Management, are absolutely critical to producing reliable software.

I don't think anybody starts out trying to build a failed project. Project designers, architects, and planners plan on many things: hardware, software, staffing, funding, schedule and many other aspects of a typical project. What many of them miss, however, is planning on change. A plan is only good if it is followed. When circumstances require that the original plan change, it becomes crippled. That is because the architecture, budget, schedule, staffing, and all of the other things that go into a project are all based on the original view of what is needed. A good software project knows that change is inevitable and plans for it. That is why NASA's MARS rover software is reliable. As new requirements were realized, they had a process that forced all of the assumptions that were made based on the original plans to be reconsidered.

Taken to the extreme, does this mean that all planning is pointless and that all software should just be designed by creating a series of changes until the desired result is achieved? Maybe. I think that is called "Agile".

Side note: This reminds me of a new methodology I created after working in an "Agile" environment for a year:

1. Create something. Anything. "Hello World", CD organizer, whatever.
2. Submit to customer.
3. Make changes based on their feedback, giving them exactly what they asked for.
4. Repeat until the budget runs out.

While Agile methods have their place, I don't think that large software development projects are ready for them.

My point is not that planning is futile, but that a successful project must plan for change as part of the development process. The simple fact is that all of the requirements and details cannot be known up front. This is the fatal flaw of the "waterfall" methodology with large projects. The goal in planning should be to plan for the things that are known, and also plan for the things that are not known. This is where most large project failures happen. Change must be part of the marrow of the project and part of its natural flow. Project team members should not have to work against change, but rather view it as a way to make the end result better.

So now that you know what my secret topic is, it is up to you whether or not you wish to continue to read my posts. It is my hope that I have drawn you in enough to at least consider that no large project is likely to succeed without adequate planning for change management, and through that avenue I can keep you interested enough to continue to explore how this approach can make software better at all levels.


M@

[1]  Comments:

Blogger Unknown said...

I would like to add another cause, directly related to the change management: failure to identify problem to be solved. Although (as F. Brooks stated in "No Silver Bullet") software is noticeable hard to "envision", there's a big difference between what the user wants and what he needs... this problem usually gets masked as change requests

permalink30/6/08 18:40  

Post a Comment

<< Home