Some Thoughts about Aliasing in C++
Eric Lengyel • September 18, 2017
[Edit: A detailed specification about thedisjoint
qualifier is available here.]
The C++ standard explicitly states when pointers can alias at the end of Section 3.10. It does not allow simple things like this:
float f; ... int i = *reinterpret_cast(&f);
Pointers to int
and float
are not related in a way that supports aliasing, and this is technically undefined behavior.
At the end of Section 5.2.10, the standard talks about type puns and says you can do this:
int i = reinterpret_cast(f);
I interpret this as the intention to add enough expressiveness to the language to allow a programmer to reinterpret the bits of a float
as an int
.
However, the spec also states that this type pun is equivalent to the pointer reinterpretation above, which many view as implying that it’s still undefined behavior.
In my opinion, the spec is being too restrictive about what can alias and what must not alias. I don’t think it’s a good idea for any compiler to assume that two pointers don’t alias the same storage based solely on the types of objects they point to. There are plenty of good reasons for a programmer to interpret the bits in some chunk of memory in more than one way.
We would still like to be able to tell the compiler that some things don’t alias, though, in order to enable various optimizations. The C99 standard introduced the
restrict
keyword to C to address this a long time ago, but it has not been officially added to C++. Most compilers support it anyway through implementation-dependent
decorations such as __restrict
.
The restrict
keyword is applied to a pointer as in the following example:
int *restrict ptr;
This tells the compiler that ptr
points to storage that no other pointer could possibly point to. In my opinion, applying restrict
to the pointer
is the wrong approach. I submit that things would work out better if the storage itself could be marked as non-aliased, and this post just contains some notes about how that would work.
I will use the keyword disjoint
below, but it should be regarded as a stand-in for some potentially better choice to be determined later.
A new qualifier
The disjoint
keyword could be applied as a new qualifier to a type, similar to how const
and volatile
are currently applied.
disjoint int buffer[64]; // Declare non-aliased storage
Unlike the const
and volatile
qualifiers, the disjoint
qualifier can be implicitly removed. Suppose we want to pass the above buffer
to a function with the following signature.
void foo(int *data);
Making the call foo(buffer)
would be perfectly fine. Inside the function foo
, the compiler has to assume that the storage pointed to by data
could be aliased, so it takes the safe route. If the function foo
was instead declared as follows, then the compiler would be able to make extra optimizations under the
assumption that the storage pointed to by data
is not aliased.
void foo(disjoint int *data);
If buffer
had not been declared with the disjoint
qualifier, then it could not be passed to this version of foo
. This is the opposite of
const
and volatile
. The disjoint
qualifier cannot be implicitly added to a type.
The disjoint
qualifier changes the type of a pointer just like const
and volatile
. Any disjoint
qualifiers applied to function
parameters are included in the function’s signature, so the above two declarations of foo
represent two distinct functions.
A non-static member function could be declared disjoint
as follows to indicate that the storage pointed to by this
is not aliased.
Such a member function could be called only for an object that was itself declared disjoint
.
struct Bar { void f() disjoint; // *this has non-aliased storage }; Bar A; disjoint Bar B; A.f(); // error: can't call disjoint function B.f(); // OK
Since the disjoint
qualifier cannot be implicitly added to a type, we need a way to allocate storage on the heap as disjoint. It would not hurt anything to change the
behavior of the new
operator so that it always returns a disjoint-qualified type:
disjoint Bar *C = new Bar;
In existing code, the disjoint
qualifier would simply be implicitly removed on assignment without any consequences.
The disjoint
qualifier could appear in multi-level points anywhere that the const
or volatile
qualifiers could appear. For example:
int *disjoint *ptr;
Here, ptr
is a pointer to disjoint storage containing a pointer to (possibly aliased) int
.