原创 How I test software (Part 2)

 2011-3-16 20:03  1848 19 19 分类: 消费电子

[Continued from How I test software (Part 1)]

Is this wise? What happens if the hardware doesn't support the algorithm? Well, I suppose that could happen but only if I completely misunderstood the nature and function of the hardware. Fortunately, that's never happened to me.

But wait, you say. I thought you were advocating doing device drivers first. I was, and am. That's where the outside-in, as opposed to top-down or bottom-up, development comes in. I wasn't lying about avoiding the hardware as long as possible. I also wasn't lying about wanting to be sure the I/O devices are working and that I have good, robust device drivers to interface with them.

I need both. I need to know the hardware—particularly the sensors and actuators—works. I'll be testing them using standalone test software, the moment the devices are available. But I don't test the hardware by running my entire application on top of it. Instead, I write simple little test programs to verify that (a) the hardware works and that (b) I know how to talk to it. Then I can get back to developing my other software, confident that I can marry it to the real hardware when the time comes.

Code and unit test
Ok, let's assume I've found an environment that allows frequent testing, with an edit-compile-test cycle measured in seconds. I've got my software architecture designed, I've identified a unit (subroutine, function, module, or whatever you choose to call it), to start with, and I'm ready to begin coding. When do I start testing?

If you've gotten the point of this series at all, you already know the answer: I start now.

In the classical, waterfall-diagram approach to software, there's a phase called "code and unit test." In the original concept, you're supposed to build a given unit until it's complete. Then you test it and move on to the next unit.

That's not the way we do it anymore. I've told you that I tend to test as I go, testing each time I've added a handful of lines of code. What I haven't said is, test it how? Answer: Using a test driver. As far as I'm concerned, there is no other option.

I worked with one group developing a huge application, so big and slow it took multiple processors, each having multiple cores, to run it. And yet, the only way programmers had to test their functions was to load it into the big Mother of All Programs, run that big mother, and set a breakpoint where their function got called the first time.

That's insane. End of discussion.

What's that you say? You can't test each unit individually, because they all depend on each other and pass data items around among them?

You're actually admitting that you write spaghetti code? Sounds to me like time to do some major refactoring. I submit that if you can't separate one module from its neighbors, you also can't tell if any of them are actually working.

So I'm going to write a test driver that I develop along with the unit under test (UUT). I'll do this for every unit, and every collection of units, right up through the final, main program. Each test driver is going to invoke the UUT, pass some data to it, and test it. What should the first version look like?

Since my initial function has no executable lines of code in it, the test is going to be pretty doggone simple. It's a stub of both the UUT and its test driver—which is why I'm perfectly comfortable with starting with the null program void main(void){}.

The test driver calls no function at all. If you feel uncomfortable about this, you can always write the universal test-driver/UUT pair:

void foo(){} // function prototype
void main(void) //main program
{

foo( );

}

My next step, of course, is to flesh out the UUT, sending data to it from the test driver, and checking the data it returns. The next obvious question is, what data do I use, and how do I know if the UUT got the right result?

The test case(s)
Answer: I must generate test case(s). Good ones.

I can't tell you how many times I've seen otherwise good programmers try to slide on this point. Generating a good test case is not always easy. Some folks seem to just pull some input values out of the air, look at the results, and say, "Yep, that looks about right."

Give me a break! You can't do that! What does "about right" even mean, anyhow? In the worst case, it simply means your module compiled and ran without crashing, and gave a number that wasn't ridiculous. But come on, you'll have to do better than that.

My pal Jim Adams, one of the best software engineers I know, puts it this way: When testing, never ask a computer a question unless you already know the answer.

The test case should be reasonable, in the sense that the inputs should be in the same range as the values expected in the real world. More to the point, the test case should include every computation, not just those that lead to output values.

In the old days, we used to call this a "hand check." I still do, because this is exactly how I use the test. As I'm testing, I check the value generated by every single instruction, and compare it to my hand check. They have to agree.

Now, when I say "agree," I don't just mean "looks reasonable." I mean agree, as in, the numbers should match exactly, within the limits of floating point arithmetic. Anything greater than this, and we need to go track down the source of the discrepancy. Do not pass "go" until the questions are resolved.

In olden times, I was sometimes pretty cavalier in the lower-level testing. I wouldn't write things down, I'd just think up some input data on the fly, then I'd check the output. If I were writing, say, a vector add routine, I might add the two vectors [1,2,3] and [4,5,6]. Not very imaginative, but if the answer was [5,7,9], I'd declare success. It's hard to imagine a way that any function could get that answer, without adding the numbers and getting their sums right.

For the record, you needn't worry about making your input numbers seem more "realistic." You can trust your math processor to get the floating point arithmetic right. If it can add 2.0 and 3.0, we have to believe it can add 1.41421356 and 1.61803399.

The one exception occurs when your algorithm is limited to a range of values. Sometimes algorithms fail at edge conditions. The square root function, for example, can't accept negative numbers. The arcsine function can't accept inputs outside the range –1..+1. The operation 1/x fails when x = 0. When testing, it's absolutely essential that you test these edge conditions, with values at the edge, and also just inside and just outside it. Especially in embedded software, bouncing out on an unhandled exception is not an option.

Sometimes people tell me that they can't really test such edges in their program, because the edges are down inside the UUT, and they can't compute what values of the inputs correspond to an internal edge condition.

In that case, someone has some refactoring to do. We still have to test the edges—all of them.

When I was testing, I at least should have written down the test case inputs I used. I didn't. Now I do. As Jim Adams has pointed out, if I don't document the test case, either in my databook or a test report, I'm doomed to re-invent it. And since by that time, I'm likely to have forgotten the subtleties of the algorithm, I'm more likely to get the test driver wrong.

On the other hand, I don't want to make the test drivera my life's work. For the lowest-level UUTs—as low as a square root function—I can't afford to waste the time writing a test plan, a test report, and everything that goes in between.

Between Jim and I, we've worked out an approach that documents the test cases and their results, without causing undue paperwork. We'll write a test driver that passes test case data (which includes edge-condition values) to the UUT. It evaluates the results, and reports any failures. It does not report successes. It's the very opposite of "verbose." Under normal conditions, you don't want to see line after line of output that goes:

Testing case 1

Case 1 passed

Testing case 2

Case 2 passed

etc.

In this case, no news is good news. A null output means that the UUT passed its tests. This approach also allows one level of test driver to call others, with no verbosity in between. How does the test driver report failure? I usually take the coward's way out and use the C/C++ mechanism, assert( ). You can choose a more sophisticated mechanism. You might choose to have the test driver report each failure, then move on to the next test. This approach may seem more sophisticated, but I'm not convinced it's better. A failure is a failure is a failure, and the first one calls for it to be fixed before continuing.

In the end, the test drivers go into the project library. They get archived and kept under version control right along with the UUTs. You can collect all the test drivers into a parent driver, which becomes a permanent regression test driver.

Generating the test cases
We still haven't discussed the nature of the test cases, and where the test case data comes from. We've only established that it should exist, and be comprehensive. This is where the discussion starts to really get interesting. Hold that thought for next time.