In my previous article, I explained a technique that C++ run-time systems may use to dynamically allocate and deallocate arrays whose elements are class objects with constructors and destructors.1 I also showed how you can implement essentially the same behavior in C by writing your own array allocation and deallocation functions.
I ended that column by explaining that my implementation of those C functions will work just fine on any processor in which no type has an alignment stricter than that of size_t. However, it could produce undefined behavior on any machine with more strictly aligned types. This month, I'll explain why, and what you can do about it.
Rewinding just a little
As in my previous column, I'll illustrate the problem by using arrays of widgets. In C++, widget is a class defined with a default constructor, a destructor, and an assortment of other members (both functions and data). In C, a widget is a struct:
typedef struct widget widget;
struct widget
{
// ...
};
along with functions that serve as the constructor and destructor:
void widget_construct(widget *pw);
void widget_destroy(widget *pw);
as well as other "member" functions.
In C++, you use an array new-expressions such as:
pw = new widget [n];
to allocate an array with n properly initialized widgets. In C, you can emulate the behavior of this array new-expression by implementing a function named new_widget_array and calling it in an expression such as:
pw = new_widget_array(n);
In C++, you use an array delete-expression such as:
delete [] pw;
to deallocate an array previously allocated by an array new-expression. In C, you can mimic the behavior of this array delete-expression by implementing a function named delete_widget_array and calling it in an expression such as:
delete_widget_array(pw);
Whereas the array dimension appears explicitly in each array new-expression, it never appears in an array delete-expression. Each array delete-expression has to obtain the array dimension another way. A common technique for passing the array dimension to the delete-expression is for the array new-expression to stash the array dimension in a location just before the array itself. Last month, I explained this technique by showing implementations of new_widget_array and delete_widget_ array that use it.
Inasmuch as the additional allocated storage holds an array dimension, it should be declared as type size_t, the same as the parameter to new_widget_array. Thus, new_widget_array should allocate storage for the array dimension along with the array itself using:
size_t size = sizeof(size_t) + n * sizeof(widget);
size_t *ps = (size_t *)malloc(size);
If ps is non-null (because malloc has returned a pointer to the requested storage), then new_widget_array can proceed to place the array dimension at the beginning of that storage:
*ps = n;
and then compute the address of the first element in the array itself:
++ps;
The allocated array is an array of widgets, but ps is a pointer to a size_t, so new_widget_array needs a cast (as part of an assignment) to obtain a pointer it can use to access the initial array element:
pw = (widget *)ps;
In my implementation of new_widget_array, I combined the increment and the assignment:
++ps;
pw = (widget *)ps;
into one expression:
pw = (widget *)++ps;
In the following discussion, it'll be easier to discuss the behavior if I keep the increment and assignment as two separate expressions.
So, what's the problem?
Either way you compute pw, the problem with the code is that casting a "pointer to size_t" into a "pointer to widget" will have undefined behavior if it yields a pointer with an improperly aligned value. Undefined behavior is the catch-all term that both the C and C++ standards use to describe a wide range a bad things that can happen when your program executes an erroneous operation.2
A multibyte objects often have an alignment.3 For example, a processor might require that a 4B integer or pointer referenced as a single object be word aligned—at an address that's a multiple of four. It also might require that an 8B floating point number be word aligned, or maybe even double-word aligned—at an address that's a multiple of eight.
An object whose address requirement is a higher multiple than another is said to have a stricter alignment. For example, an object that must be double-word aligned has a stricter alignment than an object that must be only word aligned. Each structure object (in C) or class object (in C++) must be at least as a strictly aligned as its most strictly aligned data member. Each call to the standard malloc function (in C) or the global operator new (in C++) must return a pointer that's aligned to the most strictly-aligned object that could be placed in the allocated storage.
As I mentioned earlier, my implementation of new_widget_array will work just fine on any processor in which no type has an alignment stricter than that of size_t. However, it could produce undefined behavior on any machine with more strictly aligned types. Here's how.
Suppose that the function is compiled for a processor in which integer types such as size_t must be word aligned (on a boundary that's a multiple of four), but larger types, such as long long int or double, must be double-word aligned (to a multiple of eight). Suppose also that widget contains at least one double-word aligned data member, so that widget objects must be double-word aligned as well.
Inside the new_widget_array function, the call to malloc in:
size_t *ps = (size_t *)malloc(size);
allocates storage to hold a size_t followed by an array of widgets. The size of the requested storage is big enough to hold at least a double-word aligned object, so this call to malloc must return a pointer whose value is at least double-word aligned. Thus, the value in ps will also be at least double-word aligned. This is more strictly aligned than necessary for referring to a size_t (which need only be word aligned), but poses no problem. The assignment:
*ps = n;
places the array dimension at the beginning of the allocated storage, as expected.
Again, I'm assuming for this example that sizeof(size_t) is four, so when new_widget_array executes:
++ps;
the increment has the same effect as:
ps = (size_t *)((char *)ps + 4);
Prior to the increment, the value of ps was double-word aligned. Afterwards, it's merely word-aligned. Nothing bad has happened yet, but it's about to.
The real trouble starts with next statement:
pw = (widget *)ps;
which is supposed to place into pw the address of the initial element in the array of widgets. Inasmuch as widget is a double-word aligned type, any value placed in pw should be double-word aligned as well. However, the value in ps is merely word-aligned at this point. Thus, the value in pw will be misaligned. According to the C Standard, "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined."4
I believe that, in practice, most processors will actually execute a pointer assignment, even if it leaves the pointer with a misaligned value. Nothing bad will happen until the program actually dereferences the misaligned pointer. Even then, whether anything bad happens, and how bad it turns out to be, depends very much on the target platform. Such is the nature of undefined behavior.
Using MAX_ALIGN_T
One way to keep the value of pw properly aligned is to ensure that the first widget in the array is at least as strictly aligned as the entire block of storage allocated by malloc. You can do this by using a union defined as:
typedef union widget_array_header widget_array_header;
union widget_array_header
{
size_t size;
max_align_t unused;
};
Here, max_align_t is a type that's at least as strictly aligned as every other type. (I'll explain how to obtain max_align_t shortly.) The size of a union is as big as its biggest member, so this union is big enough to hold a size_t. The alignment of a union is as strict as its strictest member, so this union is as strictly aligned as a max_align_t.
Instead of allocating storage for a size_t along with the array of widgets, the new_widget_array function should allocate storage for a widget_array_header along with the array (figure 1).
The size of the padding after the size_t member of the widget_array_header is the difference, if any, between sizeof(max_align_t) and sizeof(size_t).
The new_widget_array function now allocates storage for the array dimension along with the array itself using a slightly different formula:
size_t size
= sizeof(widget_array_header) + n * sizeof(widget);
widget_array_header *ph = (widget_array_header *)malloc(size);
If ph is non-null (because malloc has returned a pointer to the requested storage), then new_widget_array can proceed to place the array dimension at the beginning of that storage:
ph->size = n;
and then compute the address of the first element in the array itself:
++ph;
pw = (widget *)ph;
Using this approach, the value in pw will always be properly aligned.
So, how do you define max_align_t? If you're using a very recent implementation of C++, you may not need to—it may be defined for you. The draft standard for C++ specifies that the standard header defines std::max_align_t as a type whose alignment requirement is at least as great as that of every scalar type.5 If you're using an older C++ implementation, or C, you can define max_align_t as a typedef that's an alias for the most strictly aligned type in your target environment, such as:
typedef long double max_align_t;
You can write a somewhat more portable version of max_align_t as a union of likely candidates for the most strictly aligned type:
typedef union max_align_t max_align_t;
union max_align_t
{
long long int lli;
long double ld;
void *pv;
void (*pfvv)(void);
};
In any event, be prepared to tweak your definition for max_align_t when you port your code.
Using the widget_array_header type, the new version of new_widget_array looks like Listing 1.
More to come
Using a type such as max_align_t is an effective way to avoid alignment errors. However, under some conditions, it might waste a little storage. In an upcoming column, I'll show you yet another, arguably better, approach to solving alignment problems.
Endnotes:
1. Saks, Dan. "Allocating and deallocating arrays, in greater detail," Embeddeddesignindia.com, March 2011. http://forum.embeddeddesignindia.co.in/BLOG_ARTICLE_6889.HTM
2. Saks, Dan. "As Precise As Possible," Embedded Systems Programming, April, 2002, p. 43.
3. Saks, Dan. "Padding and rearranging structure members," Embedded Systems Design, May 2009, p. 11.
4. ISO/IEC Standard 9899:1999, Programming languages—C. ISO/IEC, 1999.
5. Becker, Pete, Working Draft, Standard for Programming Language C++. ISO/IEC, June 2009.
文章评论(0条评论)
登录后参与讨论