Do you want to speed up your C++ application up to 4 times? Read on.
Recently I happened to be on a summer holiday with a friend who is a C++ guru. We discussed my language performance benchmark post, which had just received a lot of attention in regards to C++. It took him a few hours to demonstrate something really important and often underestimated by developers — that we need to allocate only as much memory as needed, and nothing more.
The C++ algorithm which we use at the benchmark blog page uses an array of “int” (which on Linux also happens to be 4 bytes == 32 bits). However, if you review the code you’ll notice that the “int” value is used as a pure boolean variable for the given array index. So my friend proposed the following optimized version, which uses 1 bit of memory for the boolean value and thus doesn’t waste the other 31-bits left if we used “int” instead:
--- primes.cpp 2013-06-17 17:16:09.000000000 +0300
+++ primes-bool.cpp 2013-06-17 18:43:28.000000000 +0300
@@ -12,9 +12,9 @@
res.push_back(2);
return res;
}
- vector<int> s;
+ vector<bool> s;
for (int i = 3; i < n + 1; i += 2) {
- s.push_back(i);
+ s.push_back(true);
}
int mroot = sqrt(n);
int half = (int)s.size();
@@ -23,9 +23,9 @@
while (m <= mroot) {
if (s[i]) {
int j = (int)((m*m - 3)/2);
- s[j] = 0;
+ s[j] = false;
while (j < half) {
- s[j] = 0;
+ s[j] = false;
j += m;
}
}
@@ -33,9 +33,9 @@
m = 2*i + 3;
}
res.push_back(2);
- for (vector<int>::iterator it = s.begin() ; it < s.end(); ++it) {
- if (*it) {
- res.push_back(*it);
+ for (size_t it = 0; it < s.size(); ++it) {
+ if (s[it]) {
+ res.push_back(2*it + 3);
}
}
The end result is that the original C++ implementation finishes for 18.913 CPU seconds, while the memory-optimized completes in 5.354 seconds (28%). Impressive!
The bottom line is that even RAM is very fast and memory allocation is efficient, using only as few memory as really needed could vastly improve the performance of your C++ application. Which applies for programs developed in any language.