Device drivers typically communicate with hardware devices through device registers. Many processors use memory-mapped I/O, which maps device registers to fixed addresses in the conventional memory space. A typical device employs a small collection of registers with closely-spaced memory addresses.
In my previous column, I presented some common alternatives for representing and manipulating memory-mapped devices in C. I recommended using a structure to represent each device's collection of registers as a distinct type.1 In another piece, I explained why C++ classes are even better than C structures for representing memory-mapped devices.2
Many readers reacted on both columns. A few alleged that using a pointer to access a class object representing a memory-mapped device incurs a performance penalty by somehow adding unnecessary indirect addressing.
Interestingly, no one complained that using C structures to represent memory-mapped devices incurs a similar performance penalty. This left me wondering if the allegation is that using a C++ class is more expensive than using a C structure? Or is it that using pointers to access memory-mapped class objects is more costly than using some other means? My impression is that the readers were more concerned about the latter—the alleged cost of using pointers. However, evaluating the cost of using pointers involves comparing classes with structures, so I might as well consider both questions.
If using pointers or references is slow, then what might be faster? I described the available alternatives for placing objects into memory-mapped locations in my previous column.3 I'll consider alternative implementations that eliminate the need to use pointers to access memory-mapped devices.
Classes vs. structures
As in my earlier columns on memory-mapped devices, I'll use as my example device a programmable timer that employs three device registers—TMOD, TDATA, and TCNT—in adjacent locations staring at 0xFFFF6000.
Each device register is a four-byte word aligned to an address that's a multiple of four, so you can manipulate each device register as a uint32_t. Device registers are volatile objects, so I recommend declaring each register with a type defined as.
typedef uint32_t volatile device_register;
The timer_registers C++ class encapsulates the entire collection of timer registers as a single abstract type. The class definition appears in Listing 1.
As I showed in my past column, you can define a pointer whose value is the memory-mapped address of the actual device registers, initialized using a reinterpret_cast, as:
timer_registers *const the_timer
= reinterpret_cast
*>(0xFFFF6000);
You can then use the pointer to designate the timer object in member function calls such as:
the_timer->disable();
the_timer->set(timer_registers::TICKS_PER_SEC);
the_timer->enable();
In the class definition, the access specifiers (public and private), the enumeration constants (TICKS_PER_SEC and TE), the type name (count_type), and the member functions (disable, enable, set, and get) don't occupy any storage in a timer_registers class object. Only the data members (TMOD, TDATA, and TCNT) do. Thus, for a given target platform, a timer_registers class object in C++ has the same layout as a timer_registers structure defined in C as:
typedef struct timer_registers timer_registers;
struct timer_registers
{
device_register TMOD;
device_register TDATA;
device_register TCNT;
};
All of the member functions in the timer_registers class are "ordinary"—neither static nor virtual. If we ignore access control and assume everything is public, every ordinary C++ class member function is conceptually equivalent to a C (nonmember) function with an additional parameter. That additional parameter is a pointer to the object (in this case, the device) upon which the member function acts.
For example, the timer_registers class member function:
void timer_registers::set(count_type c)
{
TDATA = c;
TCNT = 0;
}
translates into code that's nearly the same as, if not identical to, the code generated for a C function defined as:
void timer_registers_set
(timer_registers *this, count_type c)
{
this->TDATA = c;
this->TCNT = 0;
}
With a given compiler and target, the instructions generated for each function might appear in a slightly different order or use different CPU registers, but both functions should produce the same results and execute in roughly the same time. You can verify claims like this experimentally—something I'll do in an upcoming column.
In C++, each nonstatic class member function has an implicitly-declared pointer parameter whose actual name is this. Thus, you can write the body of a member function using this exactly the same as in the body of its equivalent nonmember function.
A member function call such as:
the_timer->set(timer_registers::TICKS_PER_SEC);
should translate into code that's nearly the same as, if not identical to, the code generated for the nonmember function call:
timer_registers_set(the_timer, TICKS_PER_SEC);
Again, for a given target, the instruction ordering or register usage for each call might be slightly different, but the computed results should be identical and the timing darn close.
Unnecessary indirection?
Using either a C++ class or a C structure to represent a memory-mapped device scales up easily on platforms with multiple instances of the device. For example, if the hardware supports two timers, then you can simply define one pointer to the base address of each device, as in:
timer_registers *const timer0
= reinterpret_cast
*>(0xFFFF6000);
timer_registers *const timer1
= reinterpret_cast
*>(0xFFFF6800);
Then a call such as:
timer0->disable();
disables one timer, while:
timer1->disable();
disables the other. Both expressions call the same function, but pass the address of a different device. Aside from the declaration of additional pointers, using even more timers adds nothing to the cost of using a timer. (Of course, you can't add more timers by just declaring more pointers. They have to be in the hardware.)
On the other hand, if your embedded system has only one timer, does this design (which allows for more than one timer) cost more than if it were written for only one timer? A few readers alleged that it does, and that an implementation using extern declarations would be faster. No one gave any specifics of what such an implementation might look like, but it's easy to conjure them up.
As I explained in a previous article, both C and C++ will let you declare a memory-mapped object using a standard extern declaration such as:
extern timer_registers the_timer;
and then use linker command options or linker scripts to force the_timer into the desired address. If you use either a C++ class, or a C structure and functions that accept a pointer to that structure, then using extern declarations will support more than one timer just as well as using pointers does.
For example, if your hardware has two timers, you can declare:
extern timer_registers timer0;
extern timer_registers timer1;
and then disable timer1 by calling:
timer_registers_disable(&timer1);
in C, or by calling:
timer1.disable();
in C++. The code generated for these calls should be pretty close to what you get when you use pointers instead.
The possible advantage of using extern declarations instead of pointers might be that, when you have only one instance of a particular device, you can write the functions to access the registers directly within the lone object and eliminate the pointer indirection. Let's see how this might work.
[Continued at Accessing memory-mapped classes directly (Part 2)]
文章评论(0条评论)
登录后参与讨论