原创 Some persistent ideas (Part 1)

 2011-3-13 18:33  1717 12 12 分类: 消费电子

It won't come as a surprise to any practitioner of the software art that we tend to be an ornery and opinionated sort. That's been true throughout the history of the discipline and was especially true in the early days. We early software folk tended to be "rugged individualists" with very strong, albeit wildly differing, notions about how software should be written. There were few if any established methodologies or ground rules at the time. Even Kernighan's and Plauger's seminal book, Elements of Programming Style, was still in the future.

Today, academic disciplines and studies, textbooks, formal methodologies, company training courses, programming guidelines, and peer attitudes may have dampened the wildest excursions of our individualism but not eliminated them.

Like all creative people, software folk often have ideas of their own as to how things should be done. But ideas come in all flavors: some are good, some are brilliant, and some are crushingly, embarrassingly bad. In the end, the trick is not to have only good ideas (that's not possible) and definitely not to blindly follow the ideas of some vaunted authority. The trick, rather, is to be able to discern the good ideas from the bad ones, reject the bad ones, and adopt the good ones as your own.

For reasons I don't fully understand and therefore can't explain, I've often found my own ideas to be out of step with those of my colleagues. These differences led to debates, ranging from polite discussion to out-and-out food fights. In time, I came to accept the battles as part of the profession and contented myself with the observation that "my side" often carried the day.

But I never anticipated that I'd be having to fight the same tedious battles, generation after generation, ad infinitum. Just when I think I've gotten out of the debates, they pull me back in. Just when I think a given issue has been settled for all time, along comes a new generation of programmers, sporting the same tired old idea. Some of the worst ideas seem to enjoy eternal life, enjoying rebirth and returning to plague me, like Count Dracula rising from his coffin.

Today, I'd like to tell you about two of the more persistent and pernicious of the Bad Ideas.

"It's Too Inefficient"

Most programmers are perfectly willing to agree, in principle, that notions like modularity, encapsulation, information hiding, and software reuse are good ideas, leading to more reliable and maintainable software. Yet all too often, these ideas are honored mostly in the breach. Old Fortran programmers like me well remember the days when most Fortran programs were written in bowl-of-spaghetti (BOS) fashion, with little if any modularity. Although even the earliest compilers, and assemblers before them, supported callable subroutines with passed parameters, many programmers chose not to use them.

I got this one right. From the get-go, I was writing small subroutines with passed parameters. David Parnas had nothing on me. Don't get me wrong: I claim no 20/20 prescience here. I used the modular style for two good reasons: First, it's the way I was taught. The fellow who taught me Fortran didn't show me any other way. He had me writing small subroutines and functions for him to use in a large simulation, and he showed me how to write in the style that he wanted. Black boxes he wanted, black boxes he got, and my programming style was set forever.

When I looked at software created by others, I was pretty dismayed to find that the BOS model predominated. Their Fortran programs would go on for page after page, with nary a single CALL or RETURN, and therefore no parameter lists (for that matter, no comments, either). Just large numbers of GOTO's.

We were doing scientific programming at the time, simulating space trajectories, and therefore making heavy use of vector and matrix math. I used my subroutines, which weren't all that different from the ones you've seen in my C++ vector/math package.

Most of my colleagues used the same algorithms, but not the idea of modularity. Instead, they coded them in line. Where I might write:

Call Cross(a, b, c)

They'd write:

c(1) = a(2)*b(3)—a(3)*b(2);

c(2) = a(3)*b(1)—a(1)*b(3);

c(3) = a(1)*b(2)—a(2)*b(3);

At each place where they needed another cross product, they'd code the same three structures, only with different variables names. If they needed multiple cross products, they'd code the same three lines again, carefully duplicating the index patterns.

Which brings me to the second reason I liked modularity: It kept me from making stupid errors. Each time you copy that three-line algorithm for the cross product, you risk getting either one of the indices, or one of the argument names, wrong. The error can often be hard to track down (did you spot the error in the lines above?). Using a black-box subroutine, I could be pretty certain that if it worked right on the first cross product, it would also work right for the next one.

More than once, I'd mention to a colleague, "You know, you could just call a subroutine for that. I have one you can use, if you like."

The answer was always the same:

"No, I couldn't do that. It would be too inefficient."

Edsger Dijkstra pointed out that code efficiency depends far more on using the right algorithm than on tricky code. But if it's speed you want, and you don't care if the answer is right, I can give you speed out the gazoo. For me, at least, correctness and ease of programming trumps raw performance any day.

Once, a programmer was assigned to adapt one of my simulations to a new computer. After working on it for awhile, he came to me almost livid in anger. My coding, he asserted, was far too inefficient. I had subroutine calls nested four or five layers deep, and he thought this was unconscionable. "Every time you call a subroutine," he warned, "you waste 180µs."

I looked at my watch for a time, then said, "I can wait."

He was not amused.

Ironically enough, he later wrote an attitude simulation of his own. He used the "efficient," BOS model. But he made a mistake, inverting a matrix in the highest-speed loop, even though it was a constant. For this and other reasons, my "inefficient" program was more than twice as fast. Now, one could argue that the two issues aren't connected—that the math mistake didn't have anything to with his coding style. I tend to suspect, though, that if he'd stuck to a modular style, his coding and testing tasks have been easier, perhaps easy enough so he'd have spotted the mistake.

Over time, people learned that modularity was good, spaghetti bad. And as computer mainframes got blazing speed, execution time was not such an issue. But the efficiency notion came back to life in spades, with the advent of microprocessors. With their (initially) slow clock speeds and software floating point, efficiency was again an issue. Even (or especially) when using assembly language, programmers tended to the BOS model. So did (and do) many programmers of embedded systems.

The notion persists to this day. Only a year ago a colleague, whose intelligence I value, showed me a Matlab program he had written. It used many long modules and had many of the "in-line cross product" sorts of constructs. I ran a McCabe metric on it, and got a cyclomatic number around 27 (something like 3-5 is better). I gently asked my friend why he hadn't used a more modular style. He replied,

"I thought about it, but that would have been too inefficient."

Now, you have to take the time to get your arms around the situation. Matlab is an interpreted language. It's not going to use the computer efficiently, no matter what you do. We don't write programs in Matlab when we want speed; we use it when we want the convenience, ease-of-use, and GUI environment of the interpreted language. Granted, Matlab programs can be compiled into faster executables for production, but my friend wasn't using that feature. So efficiency should have been the last thing on his list of needs.

[Continued at Some persistent ideas (Part 2)]