GCC vs Clang vs ICC for 32-bit register manipulation

Today godbolt released a new compiler comparison feature and I decided to quickly try to compare C++ vs C style register manipulation and see what kind of optimizations will be applied across the different compilers.

Source code

So I wrote a naive C++ class that would represent a 32-bit register.

Note that I am trying here to test compilers’ optimization on class member manipulation, rather than making the case for embedded C++ (where bits in registers would be volatile, since the order of bit setting is significant most of the time).

After that you can see I am using it to set the 0, 1, 3, 5, 13 and 30th bits of the register:

the corresponding C implementation is:

I am also calling the above functions from a third one to see if the compilers will inline it:

The compilers

I am comparing three compilers:

  • gcc 6.2
  • clang 3.9.0
  • icc 17

All are using the following options: -O3 -mtune=native -march=native

I would expect that the compilers “see trough” the C++ code and optimize it completely with a simple instruction.

Let’s see the assembly of GCC:

We see that the cFunction() was correctly optimized and the rdi register representing the register is set correctly with  one  ‘or‘ instruction.

The cppFunction() on the other is less than ideal – it uses BYTE_PTR addressing three times instead of one DWORD.

The above functions has been correctly inlined in the Caller() function, which does nothing but function calls.

Now lets see Clang:

Wow, we see correct optimization on all three points there… nothing could be done better.

What about the Intel Compiler:


The conclusion

So it seems that GCC does a good job on the C code, but not so much on the C++ one.

Clang makes the perfect optimization both in C and C++

And Intel… well that’s just embarrassing.

You can play with the godbolt comparison here.