原创 Comparing techniques for storing Morse code in C (Part 1)

 2015-8-17 17:22  2242 15 15 分类: 消费电子

Several days ago I posted this blog describing how I wish to use a light-emitting diode LED to flash messages in Morse code. In these columns I inquired as to the best way to represent, store, and access the Morse code dots and dashes used to represent letters and numbers.

As you can see from the image above, the alphanumeric characters 'A' to 'Z' and '0' to '9' each require between one and five dot-dash symbols. What I neglected to mention was that I also wish to have two punctuation symbols at my disposal, each of which contains six symbols: ',' (comma = "--..--") and '.' (period = ".-.-.-").

Of course, you might say: "Does anyone still know Morse code these days?" Well, my chum Ivan from the next bay just wandered into my office, looked at a little proto-shield I threw together last weekend flashing away merrily on my desk and said "Oh, it's flashing 'Hello World'" (take a look at this video of the proto-shield doing its funky stuff).

Alternatively, you might ask: "Does anyone actually need to know Morse code these days." To these sad souls I would respond: "Hello! Didn’t you see the film Independence Day?" Then, as they sheepishly sidled away, I would call out: "Go away, or I shall taunt you a second time!"

But we digress... My original columns unleashed a maelstrom of comments. Community member Cdhmanning suggested using a binary decision tree, in which each node has a character and a link to the left for dot and one to the right for dash. I'd love to try this, but I fear it's beyond my limited programming skills. In fact, in one of his comments, Betajet presented the code for this; however, I'm still trying to wrap my bumbling brain around this little scamp (the code, not Betajet).

Someone else emailed me proposing a form of run-length encoding. Sad to say, however, I couldn’t wrap my brain around that one either ("I'm a bear of little brain," as Winnie-the-Pooh would say).

Several people advocated using two bits per symbol. My chum David Ashton, for example, suggested 01=dot, 10=dash, 11=space, and 00=end. David also noted that, using this scheme, a 5-symbol code would require 20 bits (5 dots/dashes x 2 bits, plus 4 inter-symbol spaces x 2 bits, plus 1 end character x 2 bits).

Actually, just a few minutes ago whilst I was in the process of writing this column, I received an email from Djones A. Boni in Brazil, who suggested the following scheme, which is very similar to David's:

We actually have four different symbols:
The dot (+1 implicit space between parts)
The dash (+1 implicit space between parts)
The space between letters (-1 implicit space between parts)
The space between words (-1 implicit space between parts)

We can encode them as 0b00, 0b01, 0b10 and 0b11, thereby getting four symbols into a byte.

Another community member, Jacob Larsen, suggested using 0 and 1 bits to represent units of light and darkness, so the character 'A' (".-") would be represented as 10111; i.e., dot (1) = LED On = one unit delay; inter-symbol space (0) = LED Off = one unit delay; and dash (111) = LED On = three unit delays. On the one hand, this has the advantage of being relatively simple; on the other hand, the numeric character '0' with five dashes ("-----") would occupy 5 x (3 + 1) = 20 bits, which seems a tad enthusiastic, as it were.

Another proposal from Cdhmanning was to store each Morse code character in an 8-bit byte. The three MSBs (most-significant bits) would be used to represent the number of bits in the character, while the five LSBs (least-significant bits) would be used to represent the dots and dashes.

Consider the character 'F' ("..-."), for example. This could be represented as 100x0010, where the 100 indicates that this character contains four dot-dash symbols and the 0010 indicates dot-dot-dash-dot. Meanwhile, the 'x' indicates that we don’t care about this character (we would set it to 0 in the code).

In fact, Cdhmanning, actually recommended that the dot-dash symbol bits be "swapped over" such that the first to be transmitted is stored in the LSB, because this will make any downstream processing easier. In this case, our 'F' character would actually be stored as 100x0100.

As I mentioned above, I also wish to have two punctuation symbols at my disposal, each of which contains six symbols: ',' (comma = "--..--") and '.' (period = ".-.-.-"). Happily, Cdhmanning's scheme can be adapted to accommodate this as shown below, where a 'x' character indicates "don't care" (we'll set these to 0 in the code) and a '?' character indicates a symbol (0 = dot, 1 = dash):

// 000 xxxxx 0 symbols (not used) // 001 xxxx? 1 symbol // 010 xxx?? 2 symbols // 011 xx??? 3 symbols // 100 x???? 4 symbols // 101 ????? 5 symbols // 11 ?????? 6 symbols

Jacob Larsen also commented: "You don't mention what you are optimizing for -- code size, code maintainability, data size, or instruction speed? I am going to assume maintainability, since Morse code generally doesn't go fast enough to need speed optimization."

Jacob makes a very good point. Transmitting Morse code at human-readable speeds means that -- most of the time -- the processor is sitting around twiddling its metaphorical thumbs, so speed optimization isn’t really an issue. On the other hand, it's always a good idea to minimize your code and data size where possible, and it's also gratifying to make your code run as efficiently and as quickly as possible, even if no one else really gives a hoot.

For this particular project, I feel that understandability and maintainability are the two key elements. The reason for this will become clear when I rent the veils asunder and explain the true purpose behind this capriciously cunning contraption, but we'll leave that for a future column.

For the purposes of this exercise, I've decided to compare two different techniques: (a) Storing the Morse code symbols as strings, and (b) packing the symbols for each character into a single byte. Now read on...

Storing symbols as strings
Before we plunge into the fray with gusto and abandon, let’s first discuss a few "housekeeping" issues. The thing about Morse code is that everything is relative, because some people can transmit it faster than others. As we know from our original Morse code chart, a dot equates to one element (unit) of delay (time) and a dash equates to three elements of delay.

Meanwhile, the inter-symbol (dot-dash) spacing is one element of delay; the inter-character spacing (i.e., the spacing between adjacent letters in a string like "Hello") is three elements of delay; and the inter-word spacing (i.e., the spacing between two words in a string like "Hello World") is seven elements of delay.

Baring this in mind, what does it actually mean when you hear people say that they can transmit 'n' words per minute in Morse code? This is obviously somewhat subjective, because different words have different numbers of characters; there's a heck of a difference between the number of characters in "A" and in "Antidisestablishmentarianism," for example.

For this reason, one of the "standard words" Morse code aficionados use to compare themselves against each other is PARIS. Consider the following breakdown, where {} represents the delay elements associated with a dot or a dash; [] represents the delay elements associated with inter-symbol spacing; and () represents the delay elements associated with inter-character and inter-word spacing:

P ".--." {1} [1] {3} [1] {3} [1] {1} (3) = 14 A ".-" {1} [1] {3} (3) = 8 R ".-." {1} [1] {3} [1] {1} (3) = 10 I ".." {1} [1] {1} (3) = 3 S "..." {1} [1] {1} [1] {1} (7) = 12 Total = 50

One minute equals 60 seconds equals 60,000 milliseconds (ms). Thus, if we were to transmit at one word per minute -- and assuming 50 elements per word based on PARIS as discussed above -- then each element would take 1,200ms. This explains why you will see the following definition in my code:

#define elementDelay 1200

By trial and error, I've discovered that 15 words per minute is pleasing to my eye, which is why you will also see the following definition in my code:

#define WPM 15

Thus, the way in which I've implemented my code means that you only have to tweak the WPM definition in order to modify the transmission rate. And, speaking of my code, you can access the entire program by clicking here.

The interesting part of this program starts with our definition of the dots and dashes, which appears as follows:

char* dotsNdashes[] = {".-", // 0 = A "-...", // 1 = B "-.-.", // 2 = C "-..", // 3 = D : etc.

This basically says that we're going to store an array of pointers to strings of characters, so dotsNdashes[0] contains a pointer to a string containing ".-" (plus a hidden terminating '\0' null character); dotsNdashes[1] contains a pointer to a string containing "-..."; and so forth.

Now, I want my final program to be able to display any ASCII string that comes its way. For the purpose of this example, however, my main loop is pretty simple as shown below:

void loop() { displayThis("Hello World"); }

The idea is that I can pass any ASCII text string to my displayThis() function. As you'll see, the first thing we do in the displayThis() function is to convert any lowercase alpha characters 'a' to 'z' into their uppercase counterparts 'A' to 'Z'. We could just tell people that they can only use uppercase letters, but we all know that they aren’t going to listen (LOL).

You'll also see that we check for any unexpected characters and -- if we find any -- we call a simple eeekError() function that flashes the LED in a certain way (we vary the LED's on-off times to reflect different errors).

The body of the displayThis() function is used to determine which entry (string) we wish to use in our dotsNdashes[] array. Once we've determined this, we call a flashLED() function and pass this string in as a parameter (whilst pondering this function, it may help to remember that a string is an array of characters, and that any type of array is automatically passed into a function as a pointer; of course, this may not help, in which case I'm sorry I mentioned it).

Finally, we use the flashLED() function to actually output the dots and dashes by flashing the LED. This first thing we do for each dot-dash symbol is to turn the LED on because the only difference between a dot and a dash is the duration it remains active.

OK. That pretty much covers the technique of storing our dot-dash Morse code symbols as strings. The advantage is that this is relatively easy to understand and maintain; the disadvantage is that our dotsNdashes[] array consumes a byte (char) for each dot and dash, which is a lot of memory (relatively speaking). Next, we'll consider packing all of the symbols forming a character into a single byte.

Continued at Comparing techniques for storing Morse code in C (Part 2)