Modernizing your code with C++20

by phil nash|

    C++20 features, including the spaceship operator

    ​C++20 is here! In fact, as we head towards 2022, it’s been here a while. It may surprise some, but we’re only a few months from a freeze on new proposals for C++23! But let’s not get ahead of ourselves. C++20 is a big release - at least the biggest since C++11 - some have said it's the biggest since the first standard in 1998!

    Another possible surprise is that support for C++20 is currently better in GCC and MSVC++ than in Clang. Nonetheless, significant chunks of the new language and library features are widely available across the three major compilers, already. Many of them, including some less well known features, are there to make common things safer and easier. So we’ve been hard at work implementing analyzer rules to help us all take full advantage of the latest incarnation of “Modern C++”. This is just the start, but we already have 28 C++20-specific rules in the latest releases of all our products (with many more in development).

    Let’s take a peek at some of them.

    Beyond Compare

    Arguably the biggest new C++20 feature for making the code you often write safer and easier is the three-way comparison operator - A.K.A. the “Spaceship Operator” (because it’s written as <=>).

    This new operator has many special powers. For a start it provides new functionality: the operator itself can be used to specify less-than, greater-than and equal/equivalent relations in a single call and return value. The compiler can now also synthesize all the other relational operators based on it (in an overridable way, of course). In fact even the spaceship operator’s implementation can be synthesized in terms of all members, simply by using =default instead of an explicit implementation. And even that’s not all - but it’s not the purpose of this article to be exhaustive here. For a bit more depth see Sy Brand’s introduction.

    More importantly, for now, how can our analyzer help? Well, currently we have three rules relating to the spaceship operator.

    To illustrate them, let’s say you have some existing code that implements an equality operator, like this:

    class Swallow {
       Provenance provenance = Provenance::European;
       int weight = 0;
    Public:
       // As we’ll discuss: S6230 will be raised on the next line
       bool operator==( Swallow const& other ) const {
           return provenance == other.provenance && weight == other.weight;
       }
    }

    This will now trigger rule S6230, which tells us to ‘use “=default” instead of the default implementation for this comparison function’. We can address that by mostly deleting code (always a good change!):

    bool operator==( Swallow const& other ) const = default;

    Where’s the spaceship, you may ask? Well, this is one of the new powers we get alongside spaceship itself - being able to synthesize a default implementation of a specific comparison function. We only had == defined, before, so that is all we are advised to change.

    What if we had an implementation for <?

    bool operator<( Swallow const& other ) const { // S6187 raised here
       return provenance < other.provenance || 
              (provenance == other.provenance && weight < other.weight);
    }

    Ordering relationships are even more tedious to write - and get right. We could =default that, too - but at this point it’s already a better recommendation to just implement spaceship, so we get S6187, ‘define operator<=> and remove operators <, <=, >, >=’.

    auto operator<=>( const Swallow& ) const = default;

    Now, if we added that, but left in one or more of the other comparison operators - with a default implementation - we’ll trigger S6186, ‘Keep operator<=>, and remove any operator <, <=, >, >=’, or ‘[..] remove defaulted operator ==’. Mixing the operators can lead to complexity and a risk of divergence, so it’s a good idea to clean the redundancy up.

    Of course if you override a specific comparison with a non-default implementation, none of these rules will be triggered.

    Not Always Auto

    We’ve had generic lambdas since C++14. By using the auto keyword instead of parameter types we can make them act like templates. This is usually nicer than the explicit template syntax we’ve been stuck with for regular functions and methods. In fact C++20 now lets us use auto for function template parameters, too (an unconstrained concept - part of the bigger concepts feature). Meanwhile lambdas went the other way. You can now supply explicit template parameters there as well. So functions and lambdas have converged on the same options for template syntax. Nice for consistency, but is there ever a good reason to use the explicit syntax with lambas?

    There are at least two, actually.

    First, where we need two or more parameters to have the same type. We might previously have written something like:

    auto l = []( auto a1, decltype(a1) a2, decltype(a1) a3 ) { /* .. */ };

    This will now trigger S6189, recommending we ‘Replace "auto" with an explicit template parameter’. Leading to:

    auto l = []<typename T>( T a1, T a2, T a3 ) { /* .. */ };

    Definitely less clumsy.

    Secondly, if the type of an argument is needed within the body of the lambda the same thing applies:

    auto l = []( auto&& arg ) {  // S6189 raised here
       do_something( std::forward<decltype( arg )>( arg ));
    };

    This also triggers S6189, as it’s really just a variation on the same situation. But it’s worth calling out because this may be harder to spot at a glance - just the sort of thing you’d like to have a tool to find for you. Instead we can now write:

    auto l = []<typename T>( T&& arg ) {
       do_something( std::forward<T>( arg ));
    };

    Making things better, bit by bit

    So far we’ve looked at language features - but there are many interesting new library features in C++20 too. Many of them help us to better navigate undefined behavior. For example, it used to be common to reinterpret values by projecting the underlying bit pattern into different types, using a reinterpret_cast, or C-style cast - or as alternate members of a union. The problem there is that, due to type aliasing rules, this is undefined behavior (except in a few limited cases). In recent years compilers have increasingly relied on this undefined behavior to more aggressively optimize. So the recommendation became to memcpy the bits, e.g.:

    float const src = 1.0f;
    uint32_t dst;
    
    static_assert( sizeof(float) == sizeof(uint32_t) );
    std::memcpy( &dst, &src, sizeof(float) ); // S6181 raised here

    There are a few things to remember and get right, and in the more general case you should also check that both the types are trivially copyable. There are enough rough edges that it would be better to put that in a small library function (template). That’s what std::bit_cast is:

    float const src = 1.0f;
    auto dst = std::bit_cast<uint32_t>( src );

    Much tidier. And now S6181 looks for code that follows the memcpy pattern and suggests replacing it with std::bitcast.

    Attention, span

    As programmers we don’t like to write more code than we have to. As C++ programmers we don’t like our code to do more work than it has to. In the following code we do both:

    bool looking_for_these( std::vector<Droid const*> const& droids );
    
    void use_the_force() {
    
       std::vector<Droid*> droids = get_suspicious_droids();
    
       if( !looking_for_these(  // S6188 raised here >
          std::vector<Droid const*>{ droids.begin(), droids.end() } ) ) {
           std::cout << "these are not the droids you're looking for\n";
       }
    }

    To pass this vector of non-const pointers to a function that takes a vector of const pointers we previously had to take a copy of the vector. This is similar to the memcpy/bit_cast case - the types seem compatible, but the language rules don’t allow us to pass them directly.

    But now we have span, which handles several variations on this - and our analyzer can guide you through many of the transformations. First, S6188 suggests, on the first line, that we ‘replace this parameter with a more generic “std::span” object’:

    bool looking_for_these( std::span<Droid const* const> droids );

    But we’re still making that copy - now unnecessarily. So S6231 triggers, telling us to ‘Remove this redundant temporary object by constructing "std::span" directly’.

    if( !looking_for_these( droids ) ) {
       std::cout << "these are not the droids you're looking for\n";
    }

    This is the code we were looking for!

    And now the original function is more general, in that it can accept slices of the vector, or pointers held in other contiguous sequences, such as std::arrays.

    It starts_with cleaner code and ends_with better performance

    Along similar lines, we might previously have tested string prefixes and postfixes using code like this:

    std::string s = "long and winding road";
    
    if( s.substr( 0, 4 ) == "long" && // S6178 raised here
       s.size() > 4 && s.substr( s.size() - 4 ) == "road" ) {
       // ...
    }

    It does the job, but creates unnecessary temporary strings, and has several components that need to contain exactly the correct magic numbers (especially for the postfix test).

    Show this code to the analyzer and S6178 will remind us we can now ‘use starts_with() to check the prefix of a string’ and “use ends_with() to check the postfix of a string’. 

    if( s.starts_with("long") && s.ends_with("road") ) { /* .. */ }

    Easier, safer and more performant. A combination I particularly like, when I can achieve it!

    Cleaning up by erase-ing

    H.L. Mencken said that, “for every complex problem there is an answer that is clear, simple, and wrong”. Maybe he was reading the C++ standard when he wrote that.

    One such example is the interview question favorite: how do you remove elements from a vector? Using std::remove is only part of the answer (leading to another favorite saying: std::remove doesn’t remove!). The full answer involves passing the return from std::remove to an erase member function - known as the erase-remove idiom. For example, to remove all empty strings from a vector of strings:

    v.erase(std::remove( v.begin(), v.end(), std::string{}), v.end());

    Two algorithms and three iterators to achieve something so fundamental!

    This will actually trigger two rules now. First: S6197 - ‘Replace with "std::ranges::remove"’ reduces two of the iterators to just the vector:

    v.erase(std::ranges::remove( v, std::string()).begin(), v.end());

    But the one you really want is to go straight to S6165, ‘Replace this erase-remove idiom with a "std::erase" call’. In fact that's triggered with the ranges version, too - so if you did go with S6197 first you should still end up in the right place:

    std::erase(v, std::string());

    It works for std::remove_if (to std::erase_if) too, and even recognizes hand-rolled removal loops like this:

    auto it = m.begin();
    while( it != m.end() ) {
       if( it->second == "bad" ) {
           it = m.erase( it );
       } else {
           ++it;
       }
    }

    Finishing on the midpoint

    The last C++20 example I’d like to share with you is a nice combination of: recognizing the intent from the pattern of code, and a common, but hard to spot pitfall that involves undefined behavior. Given two (signed) integers, a and b, if you wrote:

    auto m = (a + b) / 2;

    then your intention was probably to find the midpoint between them. Most of the time it probably will. Unfortunately this code may lead to integer overflow, which is undefined behavior in C++ (even in C++20, where signed integers are twos complement)!

    A safer way to write it, by hand, is to split the difference, and so avoiding a large intermediate value:

    auto m = a + (b - a) / 2;

    Either way, this will now trigger S6179 - ‘Replace with "std::midpoint"’. std::midpoint usually performs the equivalent of the second form - but is always correct, and clearly conveys the intent.

    S6179 is also triggered on:

    auto i = a + (b - a) * 0.3f;

    this time encouraging us to ‘Replace with "std::lerp"’. std::lerp finds the linear interpolation of two numeric values and a coefficient.

    Wrapping up

    There’s so much more I haven’t covered! While I briefly mentioned concepts. It’s worth adding S6195, which encourages us to replace std::enable_if with a concept, requires clause or if constexpr, as appropriate. Then we have rules for source_location (S6190), std::is_constant_evaluated (S6169), [[no_unique_address]] (S6226) and [[nodiscard]] with a reason (S6166) - and many more, with plenty still in development. We also had to go back over many older rules to update them for C++20.

    Both C++20 and the C++ analysis in our products (SonarLint, SonarQube and SonarCloud), aim to make your coding easier and your code safer and more performant. Putting the two together is an unbeatable combination. This post offers a taste of what you can try today and we’ll have more to share in the future - so do keep an eye on the blog.