Special Thanks

Special thanks to BHang, KFang, and EZhang for providing the much needed motivation and encouragement for this post.

Motivation

If you ever want to intimidate non-cpp programmers, just start a discussion on r-values and l-values. That’ll surely scare them.

However, this puts us a precarious position if we ever encounter someone who knows what a funny r-value is or if we ever try to explain what an r-value is, so we better learn how they work.

Vector Anatomy

In cpp, the standard container representing a dynamically sized contiguous array of objects is std::vector. For those unfamiliar, vector’s usage and ubiquity matches that of python lists or java arraylists.

vector<int> v;
for(int i=0; i<1'000'000; i++) 
    v.push_back(i);

The above code snippet depicts a vector of integers being created and populated with the first 1 million integers. With this many integers, how big should we expect the vector to be?

sizeof(v) // returns 24

This is a bit surprising at first given that we just populated the vector with a million integers, however a quick look at the underlying vector implementation should clarify everything.

what is a vector

The vector container holds just 3 pointers: start, end, capacity. When we computed sizeof(v), we only obtain the combined sizes of these pointers: 24.

The Stack and the Heap

Now let’s see how this works in conjunction with the process memory layout. Consider the code snippet below.

int main(){
    // Construct vector
    vector<int> v;
    for(int i=0; i<1'000'000; i++) 
        v.push_back(i);
}

this is a stack and heap

The space allocated for the vector data exists on the heap instead of the stack. This implementation allows for dynamic resizing without altering the amount of space needed on the stack. How intelligent.

Passing Arguments

Let’s build upon our example with a new function Compute. We would like to look into how the vector is passed as a parameter.

void Compute(vector<int> in) {
    // computation with in
}

int main(){
    // Construct vector
    vector<int> v;
    for(int i=0; i<1'000'000; i++) {
        v.push_back(i);
    }

    // perform computation
    Compute(v);
}

copying is bad

When we call Compute, every value within the vector v is copied to create vector in. Whether this is optimal depends on the context.

However, suppose we meet certain conditions:

  • Compute requires its own copy of the vector and may modify it. This disallows changing the method signature to use const reference.
  • The vector constructed in main is solely used for Compute. In this case, having two copies of the same vector in the heap is unnecessary is expensive and should be avoided if possible.
// perform computation with `std::move`
Compute(std::move(v));

The above change incorporating std::move does exactly what we want. No copies at all. The internal pointers inside v are copied to those for parameter in, so we have passed the vector as an argument without copying its entire data. Compute now owns the vector and can modify it however it sees fit :)

copying is bad

Notice that the pointers within vector v in main are invalidated. Using this value, or any value after moving it, is not recommended and may cause undefinied behavior1.

R-values

We have seen how we can optimize copies of data by coping pointers instead of copying all the data, but we still have not tackled the crucial question: What exactly is an r-value?

Historically, r-values are named as these values tend to be on the right side of the equal sign, whereas l-values are on the left.
i.e. auto lvalue = rvalue{}
Ok, now that we understand everything, we can all go home.
If only it were that easy.

Value Category Taxonomy

Each cpp expression has a value category. On the finest level, they are: lvalues, xvalues, and prvalues2.

funny binary tree

From the diagram, r-values are a superset of both x-values and pr-values.
gl-values are a superset of l-values and x-values.

  • gl-value (AKA: Generalized l-value) - determines the identity of an object.
    One general guideline: if you can take the address of it, it’s a gl-value.
  • pr-value (AKA: Pure r-value) - initializes an object. Typically a temporary. e.g. The following initializer list is a pr-value
    vector<int> v = {1,2,3} // {1,2,3} is the pr-value
    
  • x-value (AKA: eXpiring-value) - a gl-value that can be reused.
    e.g. std::move(v) in our earlier example is an xvalue
  • l-value - a gl-value that is not an r-value

std::move

Contrary to its name, std::move performs no computation. std::move sole purpose is to cast the value to an r-value. This can also be done manually with static_cast; std::move is just more convenient and ergonomic. Once an object has been cast to an r-value, we can optimize computation with the appropriate function overloads.

Function Overloads

// relevant vector constructors
vector(const vector& other); // const l-value ref
vector(vector&& other); // r-value ref

The && signifies that the argument is an r-value reference, so the function vector(vector&& other); can only be called by r-values. Attempting to construct a vector with a non-rvalue would invoke the vector(const vector& other); constructor. Even though vector(const vector& other); can theoretically also take r-values, r-values will always invoke the r-value reference overload.

In the example with vectors, we use the move constructor vector(vector&& other); When we construct the parameter vector in, this move constructor is being invoked to copy the internal pointers and not the values on the heap.

Limitations and Caveats

Moving is just an optimization and is not free.

  • Not all types support moving.
    For instance, a std::array in local scope will always live on the stack, so there are no pointers to data to copy. We can always cast an array to an r-value with std::move, but a copy will still be performed as there is no r-value ref overload for the array constructor.
  • Moving may not be cheap.
    Consider a small std::string. With small string optimization(SSO), the entire string will live on the stack, so in this case there are also no pointers to copy. Even though we may be invoking the correct move constuctor, we do not actually save on computation as the entire string will still have to be copied

References

  1. https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html
  2. https://en.cppreference.com/w/cpp/language/value_category
  3. https://en.cppreference.com/w/cpp/container/vector/vector