原创 Comparing techniques for storing Morse code in C (Part 2)

 2015-8-28 22:13  1611 24 24 分类: 消费电子

Continued from Comparing techniques for storing Morse code in C (Part 1)

Storing symbols as bytes
The overall architecture and flow of this program is very similar to the string version (you can access the entire program by clicking here). The first difference occurs when we declare our dotsNdashes[] array. In this case, we're using an array of bytes that looks something like the following:

byte dotsNdashes[] = {B01000010, // 0 = A ".-" B10000001, // 1 = B "-..." B10000101, // 2 = C "-.-." B01100001, // 3 = D "-.." : etc.

As we discussed in the overview, the three MSBs contain the number of dot-dash symbols (the two MSBs in the case of the punctuation characters), while the LSBs contain the symbols themselves; 0 = dot and 1 = dash, with the symbols organized in reverse order such that the first symbol appears in the LSB.

Our loop() function is identical to the previous example:

void loop() { displayThis("Hello World"); }

Our displayThis() function appears to be identical to the previous example -- it's word-for-word identical -- but there is a subtle difference regarding the way in which the compiler handles things. Consider the following statement from the displayThis() function:

flashLED(dotsNdashes[36]);

In the case of our previous "store as strings" program, this statement calls our flashLED() function and passes it a pointer to a string of characters as an argument. By comparison, in the case of our "store as bytes" program, the argument is a single char (which we end up treating as a byte):

void flashLED(byte dNdByte) { : // Lots of clever stuff : }

The next big difference between our two programs occurs in the body of our flashLED() function. First of all, we extract the number of symbols forming this character (the three MSBs), shift them five bits to the right, and store this value in a variable called "count":

byte count; count = (dNdByte >> 5) & B00000111; if (count >= 6) count &= B00000110;

Since byte is an unsigned data type, shifting it to the right should theoretically cause 0s to be shifted into the MSBs, but I don’t like to take any chances, so -- following the shift -- the "& B00000111" (bitwise AND) operation forces the five MSBs to be 0s.

Now, remember that the coding scheme we're using looks like the following:

// 000 xxxxx 0 symbols (not used) // 001 xxxx? 1 symbol // 010 xxx?? 2 symbols // 011 xx??? 3 symbols // 100 x???? 4 symbols // 101 ????? 5 symbols // 11 ?????? 6 symbols

Generally speaking, we have three count bits and five data bits. If the count indicates six symbols, however, then we have only two count bits and six data bits. In this case, the original bit 5, a copy of which now finds itself in the LSB of count, could be a 0 or a 1. This explains the statement " if (count >= 6) count &= B00000110;", which ensures that a count of six or seven is forced to be six, because a value of seven means that the LS bit of the count contains an unwanted data bit of 1.

Now, I have to admit that I ran into a bit of a problem with the following, which is the part where we actually extract and display the dots and dashes:

for (int i = 0; i < count; i++) { digitalWrite(pinLED,HIGH); // LED On if (dNdByte & B00000001 == 0) // It's a dot delay(elementDelay / WPM); // Dot delay else // It's a dash delay((elementDelay * 3) / WPM); // Dash delay dNdByte = dNdByte >> 1; digitalWrite(pinLED,LOW); // LED Off delay(elementDelay / WPM); // Inter-symbol delay }

When I executed this and observed my flashing LED, I realized that I only appeared to be seeing dashes. This struck me as being a bit strange, so I stuffed a lot of Serial.print() statements into my program to determine what was going on. Here's what I saw in the Serial I/O window:

As you see, all I'm generating is dashes. Eventually I focused my beady eye on the following statement:

if (dNdByte & B00000001 == 0) // It's a dot delay(elementDelay / WPM); // Dot delay else // It's a dash delay((elementDelay * 3) / WPM); // Dash delay
In particular, I considered the "== 0" logical comparison. I couldn’t see how this could be wrong, but when I changed it to read "== 1", I then saw the following results.

"Well, strike me with a kipper," I thought, "that's strange and no mistake." Now we are seeing dots and dashes, but they are inverted; i.e., a dot is displayed as a dash and vice versa. In fact, this is what we'd expect because we've inverted our test from "== 0 " to "== 1". But, in that case, why didn’t the "== 0" do what we expected in the first place?

I tell you, I stared and stared at this, but nothing came to mind, so eventually I called my chum Ivan Cowie over from the next bay to take a look. Ivan suggested taking my statement:

if (dNdByte & B00000001 == 0) // It's a dot

And adding parentheses as follows:

if ((dNdByte & B00000001) == 0) // It's a dot

So we did so, and everything worked as expected. "Well, blow me down with a feather," I thought, "who would have thunk?" I personally would have assumed that a bitwise operator like &, |, or ^ would have a higher precedence than a logical comparison (relational) operator like == or !=, but it appears that this is not the case (see also this list C++ operator preferences).

It took me a while to wrap my brain around what was happening here. This thing is that when we use an "if ( {condition is true} ) {do this}" type statement, then "true" equates to 1 and "false" equates to 0.

If the "==" comparison is performed first, then "1 == 0" returns 0, and "dNdByte & 0" also returns zero, which is seen by the "if" statement as "false", so it appears that we only ever have dashes.

But what happens if we change our comparison to "1 == 1"? In this case "1 == 1" returns 1, so "dNdByte & 1" will return 0 or 1 depending on the contents of the LSB, but a 1 will be seen as meaning "true", which equates to a dot, all of which explains why "== 1" does end up giving us dots and dashes, but swapped over. Phew!

I tell you, I learn something new every day. So, we now have two programs offering two different ways of storing and manipulating our Morse code symbols -- how do they compare?

Comparison (code size and execution speed)
The easiest comparison to make is that of code size and the amount of memory required to store any global variables, because this data is automatically reported at compile time.

Not surprisingly, storing our dot-and-dash symbol data as bytes consumed less space for the global variables as illustrated in the table below.

The interesting point to me is that the "storing as bytes" approach also resulted in a significant reduction in code size, even though it involves more processing when it comes to extracting the count (number of symbols) value and then extracting the dots and dashes one at a time. I'm not sure, but I'm assuming this is due to the fact that the "storing as strings" technique involves a lot of pointer de-referencing. Having said this, I really don’t have a clue, and I would welcome your thoughts on this matter.

Now, as we previously noted, transmitting Morse code a human-readable speeds results in the processor sitting around twiddling its metaphorical thumbs for a large portion of the time (observe the delay() function calls scattered throughout the two programs like confetti).

On this basis, we really don’t care how efficient our two approaches are. On the other hand, it's always interesting to know the nitty-gritty facts, so I decided to investigate. First of all, I commented out all of the delay() function calls, because these would totally overwhelm the amount of time spent doing the real work.

Next, I declared two unsigned long variables called "time1" and "time2", and then I modified the loop() function in both programs to read as follows:

void loop() { time1 = micros(); displayThis("Hello World"); time2 = micros();

Serial.print("Time = "); Serial.println(time2 - time1); }

FYI, I'm using an Arduino Uno for this project and micros() is a built-in function provided by the Arduino IDE. This function returns the number of microseconds since the Arduino began running the current program. This number will overflow and return to zero after approximately 70 minutes, which is more than enough time for my proposes. Also, in 16 MHz Arduinos like the Uno, this function has a resolution of four microseconds (i.e., the value returned is always a multiple of four).

I started by comparing my original "Hello World" strings, but then I thought that this really wasn't fair because some of the time was being spent converting lowercase characters to their uppercase equivalents, which isn’t really the point of the exercise. Thus, I also ran the comparison using strings comprising all of the alphanumeric characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

And, just to make sure, I performed a third comparison using a longer string comprising two copies of all the alphanumeric characters. The results of all these comparisons are shown in the table below.

As we see, the "storing as bytes" approach ends up being faster than its "storing as strings" equivalent, even though it involves more processing when it comes to unpacking the count and symbol data. Once again, I'm assuming that this is due to the fact that the "storing as strings" technique involves a lot of pointer de-referencing. And, once again, I really don’t have a clue, and I would welcome your thoughts on this matter.

Now, with regard to the other techniques folks suggested -- like the binary tree and the run-length encoding -- it would be great if you wanted to take my original "store as strings" program as the basis and modified it to use these other techniques. If you then emailed me your code at max.maxfield@ubm.com, I could run it on my Arduino and compare it against the two schemes discussed here.

In the meantime, should I go with "storing as strings" approach because this is easier to understand and maintain, or should I opt for the "storing as bytes" technique on the basis that the end user doesn’t need to know what's going on under the hood, and this method is slightly faster and -- more importantly -- uses less memory?