原创 Skip bugging to speed delivery (Part 1)

 2011-3-7 17:36  2170 17 17 分类: 消费电子

I once wrote an introduction to the concept of code inspections, a technique that can hugely accelerate schedules by eliminating bugs before they percolate into testing. I drew an analogy to the quality movement that revolutionized manufacturing. Others made a similar comparison, notably Mary and Tom Poppendieck in their recommended book, Lean Software Development.¹ In manufacturing one strives to eliminate waste, which, for firmware, includes bugs injected into the code.

Code inspections are nothing new; they were first formally described by Michael Fagan in his seminal 1976 paper "Design and Code Inspections to Reduce Errors in Program Development."² But even in 1976 engineers had long employed design reviews to find errors before investing serious money in building a PCB, so the idea of having other developers check one's work before "building" or testing it was hardly novel.

Fagan's approach is in common use, though many groups tune it for their particular needs. I think it's a bit too "heavy" for most of us, though those working on safety-critical devices often use it pretty much unmodified. I'll describe a practical approach used by quite a few embedded developers.

First, the objective of an inspection is to find bugs. Comments like "man, this code sucks!" are really attacks on the author and are inappropriate. Similarly, metrics one collects are not to be used to evaluate developers.

Secondly, all new code gets inspected. There are no exceptions. Teams that exclude certain types of routines because either they're hard ("no one understands DMA here") or because only one person really understands what's going on will find excuses to skip inspections on other sorts of code, and will eventually give up inspections altogether. If only Joe understands the domain or the operation of a particular function, the process of inspection spreads that wisdom and lessens risk if Joe were to get ill or quit.

We only inspect new code because there just isn't time to pour over countless lines of stuff inherited in an acquisition.

In an ideal world an inspection team has four members plus maybe a newbie or two that attend just to learn how the products work. In reality it might be hard to find four people so fewer folks will participate. But there are four roles, all of which must be filled, even if it means one person serves in two capacities:

A moderator runs the inspection process. He finds a conference room, distributes listings to the participants, and runs the meeting. That person is one of us, not a manager, and smart teams insure everyone takes turns being moderator so one person isn't viewed as the perennial bad guy.

A reader looks at the code and translates it into an English-language description of the statement's intent. The reader doesn't say: "if (tank_press>max _press)dump();" After all, the program is nothing more than a translation of an English-language spec into computerese. The reader converts the C back into English, in this case saying something like: "Here we check the tank pressure to see if it exceeds the maximum allowed, which is a constant defined in max_limits.h. If it's too high we call dump, which releases the safety valve to drop the pressure to a safe value." Everyone else is reading along to see if they agree with the translation, and to see if this is indeed what the code should do.

The author is present to provide insight into his intentions when those are not clear (which is a sign there's a problem with either the code or the documentation). If a people shortage means you've doubled up roles, the author may not also be the reader. In writing prose it has long been known that editing your own work is fraught with risk: you see what you thought you wrote, not what's on the paper. The same is true for code.

A recorder logs the problems found.

During the meeting we don't invent fixes for problems. The author is a smart person we respect who will come up with solutions. Why tie up all of these expensive people in endless debates about the best fix?

We do look for testability issues. If a function looks difficult to test then either it, or the test, needs to be rethought.

Are all of the participants familiar enough with the code to do a thorough inspection? Yes, since before the meeting each inspects the code, alone, in the privacy of his or her own office, making notes as necessary. Sometimes it's awfully hard to get people other than the author to take the preparation phase seriously, so log each inspector's name in the function's header block. In other words, we all own this code, and each of us stamped our imprimatur on it.

The inspection takes place after the code compiles cleanly. Let the tools find the easy problems. But do the inspection before any debugging or testing takes place. Of course everyone wants to do a bit of testing first just to ensure our colleagues aren't pointing out stupid mistakes. But since inspections are some 20 times cheaper than test, it's just a waste of company resources to test first. Writing code is a business endeavor; we're not being paid to have fun--though I sure hope we do--and such inefficient use of our time is dumb.

Besides, in my experience when test precedes the inspection the author waltzes into the meeting and tosses off a cheery "but it works!" Few busy people can resist the temptation to accept "works", whatever that means, and the inspection gets dropped.

My data shows that an ideal inspection rate is about 150 lines of code per hour. Others suggest slightly different numbers, but it's clear that working above a few hundred lines of code per hour is too fast to find most errors. 150 LOC/hr is an interesting number: if you've followed the rules and kept functions to no more than 50 LOC or so, you'll be inspecting perhaps a half-dozen functions per hour. That's 2.5 LOC/minute, slow enough to really put some thought into the issues. Of course, the truth is that inspections save time so have a negative cost.

It's impossible to inspect large quantities of code. After an hour or two one's brain turns to pudding. Ideally, limit inspections to an hour a day.

The best teams measure the effectiveness of their inspection process, and, in fact, measure how well they nab bugs pre-shipping in every phase of the development project. Two numbers are important: the defect potential, which is the total number of bugs found in the project between starting development and 90 days after shipping to customers, and the defect removal efficiency. The latter is a fancy phrase that is simply the percentage of bugs found pre-shipping.

[continued at Skip bugging to speed delivery (Part 2)]