Breaking News
Loading...
Tuesday 19 July 2011

Want speed? Use constexpr meta-programming!

17:33
It's official: C++11 has two meta-programming languages embedded in it! One is based on templates and other one using constexpr. Templates have been extensively used for meta-programming in C++03. C++11 now gives you one more option of writing compile-time meta-programs using constexpr. The capabilities differ, however.

The meta-programming language that uses templates was discovered accidently and since then countless techniques have been developed. It is a pure functional language which allows you to manipulate compile-time integral literals and types but not floating point literals. Most people find the syntax of template meta-programming quite abominable because meta-functions must be implemented as structures and nested typedefs. Compile-time performance is also a pain point for this language feature.

The generalized constant expressions (constexpr for short) feature allows C++11 compiler to peek into the implementation of a function (even classes) and perform optimizations if the function uses constants (literals) only. Constants can be integral, floating point, as well as string literals. The signature of constexpr functions is just like regular C++ functions but the body has several restrictions, such as only one return statement is allowed. Nevertheless, the syntax of constexpr functions is significantly friendlier than that of template-based meta-functions. Contrary to the design of templates, designers of generalized constant expressions are well-aware of its meta-programming capabilities.

In my view, the most interesting aspect of constexpr is its speed. constexpr functions can perform compile-time computations at lightening speed. To compare the performance I implemented an is_prime algorithm in 3 different ways. Here is the algorithm in regular C++:


static bool IsPrime(size_t number)
{
if (number <= 1)
return false;

for (size_t i = 2; i*i <= number; ++i)
if (number % i == 0)
return false;

return true;
}


Here is the a template version of the same algorithm. Obviously, it is implemented as a collection of meta-functions in terms of structures.


struct false_type
{
typedef false_type type;
enum { value = 0 };
};

struct true_type
{
typedef true_type type;
enum { value = 1 };
};

template<bool condition, class T, class U>
struct if_
{
typedef U type;
};

template <class T, class U>
struct if_<true, T, U>
{
typedef T type;
};

template<size_t N, size_t c>
struct is_prime_impl
{
typedef typename if_<(c*c > N),
true_type,
typename if_<(N % c == 0),
false_type,
is_prime_impl<N, c+1> >::type >::type type;
enum { value = type::value };
};

template<size_t N>
struct is_prime
{
enum { value = is_prime_impl<N, 2>::type::value };
};

template <>
struct is_prime<0>
{
enum { value = 0 };
};

template <>
struct is_prime<1>
{
enum { value = 0 };
};


The above meta-program is a recursive implementation because that's how pure functional languages work. The is_prime_impl meta-function does all the heavy lifting. To prevent infinite regression, lazy instantiation technique is used. All I can say at this point is you've to stare at the code to make sense out of it.

And that's precisely the point: C++03 meta-programs are often unreadable and hard to comprehend. Not anymore! Here is the constexpr version of the same algorithm.


constexpr bool is_prime_recursive(size_t number, size_t c)
{
return (c*c > number) ? true :
(number % c == 0) ? false :
is_prime_recursive(number, c+1);
}

constexpr bool is_prime_func(size_t number)
{
return (number <= 1) ? false : is_prime_recursive(number, 2);
}


Well, I agree, although this version uses friendlier syntax, is not quite easy to read. Just like the previous one, it is recursive. I would say just get used to it! You can't live without recursion in C++ meta-programming world. Secondly, every function has a single return statement and the codifying logic for detecting primality requires abundant use of the ternary operator.

As long as parameter number is an integral constant, this constexpr version will compute the result at compile-time (C++11 compilers only). And when the number is a run-time integer, this same function is perfectly capable of computing the result at run-time. So you don't need two different versions of the same program: one for compile-time and another for run-time. One implementation does it all!


int main(void)
{
static_assert(is_prime_func(7), "..."); // Computed at compile-time
int i = 11;
int j = is_prime_func(i)); // Computed at run-time
}


Now that we have the implementations, lets talk performance. Take 4256233. This is 30,000th prime number! How long does the template version take to check if it is prime? It cannot! Yes, g++ 4.7 fails to compile is_prime<4256233>::value because the computation exceeds the default (900) limit of the recursive template instantiations. So I used -ftemplate-depth-2100 allowing 2100 recursive instantiations! Does it work? Yes! How long does it take? About 1 second! That's not bad at all, you say. How fast is constexpr? About 0.154 seconds!

Seems like templates are not too far behind. Wrong! How about fifth million prime number! It is 86028121. I could not get it to work with g++ 4.7. Somewhere in the range of 9300 recursive instantiations g++ seg-faults. That's brutal! isn't it? 9000+ recursive instantiations? There has to be a better way.

So how long does the constexpr take for our fifth million prime number? 0.220 seconds! That's right! This large prime number makes almost no impact on constexpr compilation time. What could be the reason? There are no template instantiations. C++ compilers always did compile-time computations whenever it was possible. constexpr now allows user-defined abstractions to hide those computations behind functions and yet allow compilers to see through them.
By the way, I had to increase to depth of default allowed constexpr recursion using -fconstexpr-depth=9300. The compiler bails out after this limit and resorts to run-time computation. Remember, the function is perfectly good for run-time computation as well (unlike templates version).

I hope this little exercise will convince you that constexpr is the way to go for static meta-programming in C++ as long as you are dealing with literals (integral, floating point numbers, and strings). It is not clear whether constexpr functions are suitable for manipulating complex meta-programmable types, such as mpl::vector and related meta-functions, such as mpl::contains, mpl::push_back, and so on. If you are interested in playing with the above example more, here is the code: is_prime.cpp and prime-test.sh

0 comments:

Post a Comment

 
Toggle Footer