One of the easiest ways to misuse a structure object in C is to fail to initialise it properly. In C++, you can reduce the incidence of uninitialized objects by using classes with constructors. A constructor is a special class member function that provides guaranteed initialisation for objects of its class type.
In my previous blog, I tried to take some of the mystery out of constructors by explaining what it is that constructors do and don't do.1 In essence, a constructor's job is to place appropriate initial values into an object's shallow part and, if there is a deep part, acquire and initialise it, too. (The shallow part of an object is the storage that contains the object's data members, as well as its base class sub-objects, vptr and padding, if any. The deep part of an object is any storage used to represent the object's state beyond the shallow part. An object usually accesses its deep part via pointers or other resource handles residing in the shallow part.)
I concluded the article by listing the places that you're likely to see constructors called. I'll now focus on two of those places: object definitions at local scope and at global (namespace) scope.
For my examples, I'll use a ring_buffer class similar to the one I used previously. The class definition looks, in part, like:
class ring_buffer
{
public:
ring_buffer();
ring_buffer(size_t n);
~~~
private:
char *base;
size_t size;
size_t head, tail;
};
The class has two constructors. The first constructor, the default constructor, has an empty parameter list. It initializes a ring_buffer whose capacity is some default number of characters. The second constructor has a parameter n of type size_t. It initializes a ring_buffer whose capacity is n characters.
In C, you can emulate the behaviour of the ring_buffer constructors using functions declared as:
rb_construct(ring_buffer *this);
rb_construct_size(ring_buffer *this, size_t n);
In C++, each constructor has an implicitly-declared parameter named this, which points to an object of the constructor's class. In C, you must declare that parameter explicitly. C doesn't support function overloading, so each C function must have a unique name.
Constructors and local objects
The most obvious place where constructor calls occur is in definitions for class objects at block scope (within function bodies). For example, given the following C++ code:
void f()
{
int i = 0;
size_t n = 64;
ring_buffer rb;
~~~
the compiler will generate a constructor call to initialise rb.
With most C++ compilers, the entry code for function f will allocate storage for all the local objects (i, j, and rb) at once, and then perform the initialisations in the order that the declarations appear. The generated code should be essentially the same as what you'd get from the following C code:
void f()
{
int i;
size_t n;
ring_buffer rb;
i = 0;
n = 64;
rb_construct(&rb);
~~~
In the C++ code, rb's definition doesn't specify any constructor arguments, so the definition invokes ring_buffer's default constructor—the one with the empty parameter list:
ring_buffer();
If rb's definition had been written instead as:
ring_buffer rb (n);
then the definition would invoke the constructor declared as:
ring_buffer(size_t n);
Each element of an array is itself an object. If the array element type has a constructor, then each element must be constructed. Thus, an array definition local to a C++ function, as in:
void f()
{
ring_buffer rba [N]; // for some constant N
~~~
typically generates a loop that applies a constructor to each element, much as the following C code does:
void f()
{
ring_buffer rba [N]; // for some constant N
ring_buffer *p;
for (p = rba; p != rba + N; ++p)
rb_construct(p);
~~~
Constructors and local static objects
A local object normally has automatic storage duration. The program creates the object upon function entry. That is, the program allocates storage and applies a constructor to that object each time it enters the function containing the object's definition. However, a local object can have static storage duration, as in:
void f()
{
size_t n = 64;
ring_buffer rb (n);
~~~
In this case, the program allocates the object's storage prior at build time and constructs the object only once, the first time execution passes through the object's definition. C++ compilers typically introduce a "first time through switch"—a statically allocated Boolean object that tracks whether the local object has been initialized. The equivalent C code looks something like:
static bool rb_initialized = false;
static ring_buffer rb;
void f()
{
size_t n;
n = 64;
if (!rb_initialized)
{
rb_construct_size(&rb, n);
rb_initialized = true;
}
~~~
This technique is not thread-safe. A C++ compiler that supports multiple threads would have to do something a little fancier to prevent two threads from accessing the switch concurrently.
Constructors and non-local static objects
Definitions for local objects appear as statements inside function bodies. As shown in the previous examples, the initialisation for such an object executes as part of the function containing the object definition. In contrast, definitions for non-local objects appear outside function bodies. The initialisation of such objects takes place at program start-up or shortly thereafter.
Some C++ compilers create a function for each translation unit that invokes the constructors for non-local static objects defined in that unit. For example, suppose a module contains the non-local definitions:
// xyz.cpp
~~~
ring_buffer rb1;
widget w;
ring_buffer rb2 (128);
A typical implementation technique is to generate an initialisation function that calls constructors for rb1, w, and rb2. The C equivalent of that function looks something like:
void __sti__xyz()
{
rb_construct(&rb1);
widget_construct(&w);
rb_construct_size(&rb2, 128);
}
Using the prefix __sti__ (for "static initialisation") to name these functions was a convention that began with the earliest C++ compilers. I believe some compilers still use it.
The C++ Standard requires that the program initialise all the non-local static objects in a given translation unit before the program uses any function or object in that unit. 2 When the implementation uses static initialisation functions such as __sti__xyz, the program must call this function before the program uses rb1, w, or rb2. Many compilers satisfy this requirement by planting a call to each static initialisation function somewhere in the program's start-up code.
The C++ Standard mandates that the constructors for non-local static objects in a given translation unit execute in the order that the objects are declared. Unfortunately, the Standard doesn't specify the initialisation order for objects in different translation units. A program written with the expectation that the modules will be initialized in a certain order could easily yield disappointing results. Steve Dewhurst describes this problem in some detail, and offers some workarounds.3 So does Scott Meyers.4 Some compilers offer #pragma directives, compiler and linker options, or other extensions to give you better control over initialisation order.
Constructors in other places
C++ compilers inject constructor calls into a number of other places in programs. I'll explain where those places are, and why they make sense, in upcoming columns.
Endnotes:
1. Saks, Dan, "Demystifying constructors," Embeddeddesignindia.com, April 2011, http://forum.embeddeddesignindia.co.in/BLOG_ARTICLE_7113.HTM
2. ISO/IEC Standard 14882:2003(E), Programming languages – C++.
3. Dewhurst, Stephen C., C++ Gotchas. Addison-Wesley, 2003
4. Meyers, Scott, Effective C++, 3rd ed. Addison-Wesley, 2005.
文章评论(0条评论)
登录后参与讨论