Do you want to speed up your C++ application up to 4 times? Read on.
Recently I happened to be on a summer holiday with a friend who is a C++ guru. We discussed my language performance benchmark post, which had just received a lot of attention in regards to C++. It took him a few hours to demonstrate something really important and often underestimated by developers — that we need to allocate only as much memory as needed, and nothing more.
The C++ algorithm which we use at the benchmark blog page uses an array of “int” (which on Linux also happens to be 4 bytes == 32 bits). However, if you review the code you’ll notice that the “int” value is used as a pure boolean variable for the given array index. So my friend proposed the following optimized version, which uses 1 bit of memory for the boolean value and thus doesn’t waste the other 31-bits left if we used “int” instead:
--- primes.cpp 2013-06-17 17:16:09.000000000 +0300 +++ primes-bool.cpp 2013-06-17 18:43:28.000000000 +0300 @@ -12,9 +12,9 @@ res.push_back(2); return res; } - vector<int> s; + vector<bool> s; for (int i = 3; i < n + 1; i += 2) { - s.push_back(i); + s.push_back(true); } int mroot = sqrt(n); int half = (int)s.size(); @@ -23,9 +23,9 @@ while (m <= mroot) { if (s[i]) { int j = (int)((m*m - 3)/2); - s[j] = 0; + s[j] = false; while (j < half) { - s[j] = 0; + s[j] = false; j += m; } } @@ -33,9 +33,9 @@ m = 2*i + 3; } res.push_back(2); - for (vector<int>::iterator it = s.begin() ; it < s.end(); ++it) { - if (*it) { - res.push_back(*it); + for (size_t it = 0; it < s.size(); ++it) { + if (s[it]) { + res.push_back(2*it + 3); } }
The end result is that the original C++ implementation finishes for 18.913 CPU seconds, while the memory-optimized completes in 5.354 seconds (28%). Impressive!
The bottom line is that even RAM is very fast and memory allocation is efficient, using only as few memory as really needed could vastly improve the performance of your C++ application. Which applies for programs developed in any language.