"Given the undeniable trend towards all-encompassing change in software development, the case can be made that general purpose software is doomed to always be unreliable and buggy."




"Is this some sort of collective insanity that has somehow woven its way into our society?"

27.6.08

 

Software Engineering Revisited

The response to my previous article about Software Engineering (or the lack thereof) resulted in an impressive number of responses. This is apparently a topic that a lot of people have an interest in. Someone posted a link to the article on Reddit.com and I have to say, those folks are brutal critics. But as with most critics, they have a very annoying way of being right.

I chose to introduce the topic of software problems by comparing software engineering with other engineering disciplines. It was pointed out a few times in comments on Reddit (ok, probably 30 times) that this was probably not a valid comparison, particularly the way I went about it. While some people may be offended by the sometimes blunt criticisms, I am not. It shows that I am being read by some very intelligent and clear thinking people. Accountability is good, and if I propose a concept, I should be prepared to back it up. Your comments (good or bad, eloquent or blunt) are always welcome and will always be posted on this blog...as long as they aren't just being abusive or spam. Yes, I get to decide what falls in those categories.

First, I want to agree with those of you who said that it was a thin argument to compare engineering disasters with software disasters. As one person put it "Nobody dies if your computer has to be rebooted". True enough. So in this article I will bring the discussion to a more balanced comparison.

It was pointed out that, like all engineering tasks, software engineering costs money. For most software systems, there is a cost/reliability trade off. For example, NASA spent years developing the software that controls the flight of the space shuttle. Nobody on the shuttle software team wakes up one day with a cool new idea for rewriting the nozzle control modules and just throws it in there to see how well it works. I seriously doubt the air traffic control system is open to the latest AJAX tricks or cool new interfaces. These are the types of systems that CAN kill people if they don't work every time, and much energy and expense are put into getting them right. Even so, these systems do sometimes fail. And like other engineering disasters, they make the news. This very thing happened in 2004 when a software error caused a major portion of the east coast's power grid to shut down. It also happened when Britain's air traffic control system experienced a software glitch and grounded hundreds of flights while the system was restored.

These are not the systems I am interested in. I am interested in the everyday software that you, me, my bank teller, my insurance agent, your doctor use. Not the life support systems, mind you. The record keeping systems, the appointment schedulers, and the handheld computers.

I can hear the would-be lawyers and philosophers out there already raising an objection. "He is sidestepping the issue...he started on a premise and then changed it on us." OK. You got me. But I did it for a reason. Stay with me and I think you will agree.

Before I go on, I want to bring the argument into perspective. I agree that the risk to human life associated with a particular application of ANY engineered system, hardware or software plays a strong role in determining how reliable it will be. Obviously a glitch in a tic-tac-toe game is not going to require the level of reliability that the software running a bullet train will. The same can be said about physical devices. Obviously the reliability of a lint roller is not going to be as high as the reliability of a suspension bridge. As far as I know, nobody has ever died as the direct result of having cat hair on their pants (although some have occasionally felt like they would when said cat hair was discovered during a job interview). I agree with this and I stand soundly reprimanded for my less than accurate comparisons of long ago...OK yesterday.

Now, ladies and gentlemen, watch closely as I perform an impossible feat before your very eyes. I will, without any smoke or mirrors, transport myself from a place of ridicule and doubt to a platform of infallible truth...(I hope)...

At this point I think we all agree that not all software is created equal. I have never had the pleasure of working on a software project where quality was top priority but I know such projects exist. My question is (watch carefully): Why is some software really good while the majority of it is really, really bad? What do the developers who produce reliable software do differently than the ones who produce really flaky software?

It has been lovingly pointed out by many people responding to my last article that I am a dolt for comparing software to real-world engineering projects. Why? Because software does not have the limits of the physical world. The only limits are the limits of the developers' imagination. That and what the operating system is capable of supporting...and what the chosen language is capable of implementing...and what the hardware is capable of running. So it looks to me like there are actually limitations to what software can do. It is the practice of most development projects (especially in the gaming industry) to push those limits as hard as possible without breaking them. So although the limitations of a project are not necessarily physical, they do exist.

Before I dig much deeper (dig myself much deeper?), I would like to point out that, as boring as it is, I am really only talking about business software. It isn't that I have anything against gaming software or social networks or cool media players, it is just that I don't have any experience with those types of applications. My work has been almost exclusively in the realm of database driven business applications. Boring, I know. But it pays the bills. I am sure that what I am saying applies to many other types of development but I just haven't got the expertise to judge that. Actually, I expect that we business developers could learn a lot from the game developers out there.

So the question now is: Why doesn't most business software work very well? Now that we have lost all of the gamers, social networkers, and "software artisans", maybe we can make some headway.

Judging from many of the comments I received about the Software Engineering article, it is a pretty widely held belief that the main determining factor of software quality is the amount of money that a company is willing to throw at it. This, apparently, can be programatically determined by considering how many people the product is likely to kill or maim. This seems like a sound, logical argument, but it isn't. I have used some software that is very reliable and is not likely to maim anyone. And if the amount of money thrown at a project dictates its reliability, why are some of the most stable applications available for free?

I believe that there is an answer to these questions and it is something that is right before our eyes, and has been since the 1980s...maybe earlier. I intend to prove in articles in the near future that what is missing in software, what is making us all hate our computers, is not the likelihood of consequences (kill count), is not the amount of money that is spent producing it, and is not the number or credentials of the developers creating it. It is much simpler than all of that. Stick around...I am pretty sure you will find it worth following along with....

M@

[1]  Comments:

Blogger rektide said...

As an engineer, one of my classes in college had an ethics component to it. This was a long lecture about how as engineers we had an ethical responsibility to make sure our work-product worked ok. At the end, we spent a while discussing the point that software is often far more buggy than most other engineering disciplines. Its then that I realized, software engineering is different than all other engineering disciplines because there are no tolerances. In every other engineering, there are safety margins, everything is overbuilt by design for safety. In software and computer engineering, you can add additional safety routines, but with or without these checks and guards all of your code must work as a faultless system.

permalink27/6/08 12:54  

Post a Comment

<< Home