<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for /contrib/famzah</title>
	<atom:link href="http://blog.famzah.net/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.famzah.net</link>
	<description>Enthusiasm never stops</description>
	<lastBuildDate>Tue, 18 Jun 2013 14:56:24 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by Ivan Zahariev</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5405</link>
		<dc:creator><![CDATA[Ivan Zahariev]]></dc:creator>
		<pubDate>Tue, 18 Jun 2013 14:56:24 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5405</guid>
		<description><![CDATA[Hey, all this was very enlightening. I&#039;ve done some tests with GCC and here are the speed-up numbers compared to my original implementation (every next case includes the previous):
* pass by reference: 4,4%
* static_cast: 4,2% (no speed-up, some statistical error)
* multiply instead of divide: 4,6% (no speed-up)
* use std instead of loop: 5,3% (some more speed-up)
* hint C++ by reserve() about the vector size: 13,0% (much faster)
* pre-allocate memory by resize(): 16,1% (the fastest)

By combining all your suggestions, I got at most a 16% speed-up. Great!

I had a good talk with another C++ guru, and he also suggested that we should always hint about the expected memory usage with reserve(), and when possible to use resize(). Which is exactly the same thing that you said. This is a good practice but I won&#039;t include it in the blog results mainly because this is somehow specific to the used algorithm and we are doing benchmarks for a completely generic use-case. I&#039;ll add a note about this at the page though.

Passing by reference is by far the most obvious optimization and every C/C++ developer must use it whenever possible. I&#039;ve updated my source code, even though this is a feature not present in the script languages.]]></description>
		<content:encoded><![CDATA[<p>Hey, all this was very enlightening. I&#8217;ve done some tests with GCC and here are the speed-up numbers compared to my original implementation (every next case includes the previous):<br />
* pass by reference: 4,4%<br />
* static_cast: 4,2% (no speed-up, some statistical error)<br />
* multiply instead of divide: 4,6% (no speed-up)<br />
* use std instead of loop: 5,3% (some more speed-up)<br />
* hint C++ by reserve() about the vector size: 13,0% (much faster)<br />
* pre-allocate memory by resize(): 16,1% (the fastest)</p>
<p>By combining all your suggestions, I got at most a 16% speed-up. Great!</p>
<p>I had a good talk with another C++ guru, and he also suggested that we should always hint about the expected memory usage with reserve(), and when possible to use resize(). Which is exactly the same thing that you said. This is a good practice but I won&#8217;t include it in the blog results mainly because this is somehow specific to the used algorithm and we are doing benchmarks for a completely generic use-case. I&#8217;ll add a note about this at the page though.</p>
<p>Passing by reference is by far the most obvious optimization and every C/C++ developer must use it whenever possible. I&#8217;ve updated my source code, even though this is a feature not present in the script languages.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by viniciusvmb</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5353</link>
		<dc:creator><![CDATA[viniciusvmb]]></dc:creator>
		<pubDate>Wed, 12 Jun 2013 17:50:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5353</guid>
		<description><![CDATA[but look 10% of the time is lost in allocation
reak 0m0.700s
sys	0m0.088s

is the 10% you gain by pre-allocation as I said :)
best]]></description>
		<content:encoded><![CDATA[<p>but look 10% of the time is lost in allocation<br />
reak 0m0.700s<br />
sys	0m0.088s</p>
<p>is the 10% you gain by pre-allocation as I said <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
best</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by Vinicius Miranda</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5352</link>
		<dc:creator><![CDATA[Vinicius Miranda]]></dc:creator>
		<pubDate>Wed, 12 Jun 2013 17:48:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5352</guid>
		<description><![CDATA[Here is my code with a  second vector. It  is already much faster than your impl.

[sourcecode language=&quot;cpp&quot; collapse=&quot;true&quot;]
#include&lt;vector&gt;
#include&lt;algorithm&gt;
#include&lt;iostream&gt;
#include&lt;cmath&gt;

using namespace std;

void get_primes7( int n, std::vector&lt;int&gt;&amp; res ) {

    if (n &lt; 2) {
        res.resize(0);
        return;
    }

    if (n == 2) {
        res.resize(1);
        res[0] = 2;
        return;
    }
    
    res.resize(0);   
    std::vector&lt;int&gt; s;
    size_t j = 0;
	
    s.resize( static_cast&lt;int&gt;(n/2.0) );
    for (int i = 3; i &lt; n + 1; i += 2, ++j) s[j] = i;
    s.resize(j);
        
    /* Alternatively: */
    //s.reserve( static_cast&lt;int&gt;(n/2.0)  );
    //for (int i = 3; i &lt; n + 1; i += 2) s.push_back(i);

    int mroot = sqrt(n);
    int half = static_cast&lt;int&gt;(s.size());
    int i = 0;
    int m = 3;

    while (m &lt;= mroot) {
        if (s[i]) {
            int j = static_cast&lt;int&gt;(0.5*(m*m - 3));
            s[j] = 0;
            while (j &lt; half) {
                s[j] = 0;
                j += m;
            }
        }
        i++;
        m = 2*i + 3;
    }

    res.push_back(2);
    std::vector&lt;int&gt;::iterator pend = std::remove (s.begin(), s.end(), 0);
    res.insert(res.begin()+1, s.begin(), pend);               
}

int main() {
    std::vector&lt;int&gt; res;
    for (int i = 1; i &lt;= 10; ++i) {
        get_primes7(10000000, res);       
        std::cout &lt;&lt; &quot;Number of primes &quot; &lt;&lt; res.size() &lt;&lt; &quot;\n&quot;;
    }
    return 0;
} 
[/sourcecode]]]></description>
		<content:encoded><![CDATA[<p>Here is my code with a  second vector. It  is already much faster than your impl.</p>
<pre class="brush: cpp; collapse: true; light: false; title: ; toolbar: true; notranslate">
#include&lt;vector&gt;
#include&lt;algorithm&gt;
#include&lt;iostream&gt;
#include&lt;cmath&gt;

using namespace std;

void get_primes7( int n, std::vector&lt;int&gt;&amp; res ) {

    if (n &lt; 2) {
        res.resize(0);
        return;
    }

    if (n == 2) {
        res.resize(1);
        res[0] = 2;
        return;
    }
    
    res.resize(0);   
    std::vector&lt;int&gt; s;
    size_t j = 0;
	
    s.resize( static_cast&lt;int&gt;(n/2.0) );
    for (int i = 3; i &lt; n + 1; i += 2, ++j) s[j] = i;
    s.resize(j);
        
    /* Alternatively: */
    //s.reserve( static_cast&lt;int&gt;(n/2.0)  );
    //for (int i = 3; i &lt; n + 1; i += 2) s.push_back(i);

    int mroot = sqrt(n);
    int half = static_cast&lt;int&gt;(s.size());
    int i = 0;
    int m = 3;

    while (m &lt;= mroot) {
        if (s[i]) {
            int j = static_cast&lt;int&gt;(0.5*(m*m - 3));
            s[j] = 0;
            while (j &lt; half) {
                s[j] = 0;
                j += m;
            }
        }
        i++;
        m = 2*i + 3;
    }

    res.push_back(2);
    std::vector&lt;int&gt;::iterator pend = std::remove (s.begin(), s.end(), 0);
    res.insert(res.begin()+1, s.begin(), pend);               
}

int main() {
    std::vector&lt;int&gt; res;
    for (int i = 1; i &lt;= 10; ++i) {
        get_primes7(10000000, res);       
        std::cout &lt;&lt; &quot;Number of primes &quot; &lt;&lt; res.size() &lt;&lt; &quot;\n&quot;;
    }
    return 0;
} 
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by Vinicius Miranda</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5351</link>
		<dc:creator><![CDATA[Vinicius Miranda]]></dc:creator>
		<pubDate>Wed, 12 Jun 2013 17:34:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5351</guid>
		<description><![CDATA[Hi.

Nice reply. Thanks. I understand your point. However, even you want a second vector, I made some crucial changes in that implementation that optimize a lot your code (30%). Actually, the pre allocation of the answer only made the code 10% faster in comparison to things I will show in this message

&quot;2. Operator [] is faster than push_back() — ditto. However, this requires an array with a predefined size, correct?&quot;. 

That is the most imp optimization !!! The difference between push_back and operator [] can be huge for two reasons. First, push_back does SEVERAL reallocation. (you started from size 0 and every time it does a new allocation, C++ doubles the vector size. So the vector needs ~log2(10000000) to allocate 100000. Resize, on the other hand, will allocate memory only once. Second, push_back is a function and there is more calling overhead than operator. To avoid the first problem, you could reserve memory for s first using the .reserve() member function. Then, you would only have the push_back calling overhead.  Another option If you dont know how many iterations you will have (in your case it was  easy to calculate but pretend it is not): you can resize the vector with a bigger size first and then, shrink to the size after the loop, as I did. Shrink a vector with resize() does not change its capacity. It only moves the pointer .end().

About 3. Not sure here, but I saw places in my own research code where it made a huge difference. Division takes almost 30 cycles to  calculate. Multiplication can be done in 1.

Another point : you were coding by yourself things that are better optimized in the standard. For example. If  you really want a second vector. Then the last step could be something like 

[sourcecode language=&quot;cpp&quot;]
res.push_back(2);
std::vector::iterator pend = std::remove (s.begin(), s.end(), 0); 
res.insert(res.begin()+1, s.begin(), pend);
[/sourcecode]

Insert is much faster than a loop with push_back, because, as I said before,  the latter will do  log2(10000000) reallocations. Insert, on the other hand, will only allocate once. But even if you reserve memory first, insert is faster than hand coded loop. Lemma is: standard is always better!

Last, do not pass a vector by value. It is rule 1 of C++. Create a second vector, ok. But the first one you pass as a reference argument and resize to zero to reset it at each call (in the first line of the function you resize it to zero - resize to zero maintains its old capacity!)

If you follow this, your code will speed up by 25%-30% even with a second vector! ( I tested this)


Last comment: I completely understand your points, but to be fair with C++ you could made an observation that it is still possible to speed up the code using pre-allocation (10% more over the 30% you gain by following the points I wrote in this comment). I say this because the other languages you tested dont offer this choice and this is a great advantage of C++. Java is great because of its libraries, Python is great because it is easy and has awesome libraries. C++ is great because allows you to control low level things like memory. If you code C++ like you code Java, then you may think that Java is almost as fast as C++ and that is really not the case. Or you can think that C is much faster, which is also not the case. When you are comparing language, it is important to try to get the best of each language. And note that I did not made any hard core template code. I just changed your implementation a little bit.]]></description>
		<content:encoded><![CDATA[<p>Hi.</p>
<p>Nice reply. Thanks. I understand your point. However, even you want a second vector, I made some crucial changes in that implementation that optimize a lot your code (30%). Actually, the pre allocation of the answer only made the code 10% faster in comparison to things I will show in this message</p>
<p>&#8220;2. Operator [] is faster than push_back() — ditto. However, this requires an array with a predefined size, correct?&#8221;. </p>
<p>That is the most imp optimization !!! The difference between push_back and operator [] can be huge for two reasons. First, push_back does SEVERAL reallocation. (you started from size 0 and every time it does a new allocation, C++ doubles the vector size. So the vector needs ~log2(10000000) to allocate 100000. Resize, on the other hand, will allocate memory only once. Second, push_back is a function and there is more calling overhead than operator. To avoid the first problem, you could reserve memory for s first using the .reserve() member function. Then, you would only have the push_back calling overhead.  Another option If you dont know how many iterations you will have (in your case it was  easy to calculate but pretend it is not): you can resize the vector with a bigger size first and then, shrink to the size after the loop, as I did. Shrink a vector with resize() does not change its capacity. It only moves the pointer .end().</p>
<p>About 3. Not sure here, but I saw places in my own research code where it made a huge difference. Division takes almost 30 cycles to  calculate. Multiplication can be done in 1.</p>
<p>Another point : you were coding by yourself things that are better optimized in the standard. For example. If  you really want a second vector. Then the last step could be something like </p>
<pre class="brush: cpp; title: ; notranslate">
res.push_back(2);
std::vector::iterator pend = std::remove (s.begin(), s.end(), 0); 
res.insert(res.begin()+1, s.begin(), pend);
</pre>
<p>Insert is much faster than a loop with push_back, because, as I said before,  the latter will do  log2(10000000) reallocations. Insert, on the other hand, will only allocate once. But even if you reserve memory first, insert is faster than hand coded loop. Lemma is: standard is always better!</p>
<p>Last, do not pass a vector by value. It is rule 1 of C++. Create a second vector, ok. But the first one you pass as a reference argument and resize to zero to reset it at each call (in the first line of the function you resize it to zero &#8211; resize to zero maintains its old capacity!)</p>
<p>If you follow this, your code will speed up by 25%-30% even with a second vector! ( I tested this)</p>
<p>Last comment: I completely understand your points, but to be fair with C++ you could made an observation that it is still possible to speed up the code using pre-allocation (10% more over the 30% you gain by following the points I wrote in this comment). I say this because the other languages you tested dont offer this choice and this is a great advantage of C++. Java is great because of its libraries, Python is great because it is easy and has awesome libraries. C++ is great because allows you to control low level things like memory. If you code C++ like you code Java, then you may think that Java is almost as fast as C++ and that is really not the case. Or you can think that C is much faster, which is also not the case. When you are comparing language, it is important to try to get the best of each language. And note that I did not made any hard core template code. I just changed your implementation a little bit.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by Ivan Zahariev</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5348</link>
		<dc:creator><![CDATA[Ivan Zahariev]]></dc:creator>
		<pubDate>Wed, 12 Jun 2013 10:11:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5348</guid>
		<description><![CDATA[Hi, thanks for the comment and source code. You have, however, changed the algorithm which makes this source code not comparable to the others. I&#039;ve already had such a &lt;a href=&quot;http://www.famzah.net/download/langs-performance/java-discussion-by-Isaac-Gouy/&quot; rel=&quot;nofollow&quot;&gt;discussion about Java&lt;/a&gt;, please take a look at it, especially my first reply. I&#039;m quoting the first paragraph here:

[begin quote]
This is a fair competition with fair rules -- every language uses the very same algorithm. I know that the algorithm can be adjusted/optimized specifically for each language, but that&#039;s not the point. The idea is, if we used another algorithm which *did* require that we use dynamic arrays everywhere, to compare how each language does the job in regards to performance. Hence I didn&#039;t try to think if I can optimize the algorithm in any language (not that I can for all of them). I tried to explain that by &quot;The correctness of the implementation is not so important, as we just want to check how fast the languages perform&quot;. Which in plain English means -- we try to use the very same usage pattern, as if the task really required it. So let&#039;s pretend that we really needed the boxing/unboxing, just as we did for the other languages -- all languages do the same algorithm complexity using the same (dynamic) structures.
[end quote]

What I find incompatible in your implementation, compared to the other languages:
1. Let&#039;s pretend that we needed the second vector -- just as we have it in all other implementations. If we weren&#039;t implementing the Sieve of Eratosthenes algorithm, we may have needed it for something else. The idea here is to see how languages operate with two different arrays.
2. Pre-allocation of memory/heap/array-size is not what we were after. As already mentioned, we want to see how each language manages to allocate memory in a dynamic way.

The other proposed changes are good points:
1. static_cast() -- if that&#039;s faster, we should use it indeed.
2. Operator [] is faster than push_back() -- ditto. However, this requires an array with a predefined size, correct?
3. Replace division /2 by mult. 0.5 -- does this really make any difference?

In a nutshell, we could use the static_cast() and the division trick, in order to speed up the original C++ algorithm, and at the same time to keep it comparable with other languages, right?]]></description>
		<content:encoded><![CDATA[<p>Hi, thanks for the comment and source code. You have, however, changed the algorithm which makes this source code not comparable to the others. I&#8217;ve already had such a <a href="http://www.famzah.net/download/langs-performance/java-discussion-by-Isaac-Gouy/" rel="nofollow">discussion about Java</a>, please take a look at it, especially my first reply. I&#8217;m quoting the first paragraph here:</p>
<p>[begin quote]<br />
This is a fair competition with fair rules &#8212; every language uses the very same algorithm. I know that the algorithm can be adjusted/optimized specifically for each language, but that&#8217;s not the point. The idea is, if we used another algorithm which *did* require that we use dynamic arrays everywhere, to compare how each language does the job in regards to performance. Hence I didn&#8217;t try to think if I can optimize the algorithm in any language (not that I can for all of them). I tried to explain that by &#8220;The correctness of the implementation is not so important, as we just want to check how fast the languages perform&#8221;. Which in plain English means &#8212; we try to use the very same usage pattern, as if the task really required it. So let&#8217;s pretend that we really needed the boxing/unboxing, just as we did for the other languages &#8212; all languages do the same algorithm complexity using the same (dynamic) structures.<br />
[end quote]</p>
<p>What I find incompatible in your implementation, compared to the other languages:<br />
1. Let&#8217;s pretend that we needed the second vector &#8212; just as we have it in all other implementations. If we weren&#8217;t implementing the Sieve of Eratosthenes algorithm, we may have needed it for something else. The idea here is to see how languages operate with two different arrays.<br />
2. Pre-allocation of memory/heap/array-size is not what we were after. As already mentioned, we want to see how each language manages to allocate memory in a dynamic way.</p>
<p>The other proposed changes are good points:<br />
1. static_cast() &#8212; if that&#8217;s faster, we should use it indeed.<br />
2. Operator [] is faster than push_back() &#8212; ditto. However, this requires an array with a predefined size, correct?<br />
3. Replace division /2 by mult. 0.5 &#8212; does this really make any difference?</p>
<p>In a nutshell, we could use the static_cast() and the division trick, in order to speed up the original C++ algorithm, and at the same time to keep it comparable with other languages, right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on C++ vs. Python vs. Perl vs. PHP performance benchmark by Vinicius Miranda</title>
		<link>http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/#comment-5347</link>
		<dc:creator><![CDATA[Vinicius Miranda]]></dc:creator>
		<pubDate>Wed, 12 Jun 2013 08:37:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=732#comment-5347</guid>
		<description><![CDATA[Hello, 
I just want to say that your C++ code is really not the way that c++ coders implement algorithms. To exemplify that, I made really some small changes in your code and I was able to reduce runtime by 40% (I use intel compiler. Should check on gcc)!! Remember: one of the BIGGEST advantages of C++ in comparison to JAVA (and also python) is that you can control allocation of heap memory (which is a very slow process). So don&#039;t create unnecessary vectors and do unnecessary reallocation if you don&#039;t need. With one call of the vector function &quot;reserve&quot;, the use of iterators and the use of references, my version only allocates memory ONCE! Here is my code

Another thing to remember: push_back is slower than operator [] even when there is  no reallocation!!!

Here is my code

[sourcecode language=&quot;cpp&quot; collapse=&quot;true&quot;]
#include 
#include 
#include 
#include 

void get_primes7( int n, std::vector&amp; res ) {

	if (n &lt; 2) {
		res.resize(0);
		return;
	}

	if (n == 2) {
		res.resize(1);
		res[0] = 2;
		return;
	}
	
	// we dont need a second vector
	res.resize( static_cast(n/2.0) );       // resize vector only changes .end() 
                                                //  iterator. because we have enough capacity
                                                // it does NOT allocate or deallocate heap memory	
	size_t j = 0;
	for (int i = 3; i &lt; n + 1; i += 2, ++j) res[j] = i;  // operator [] is faster than push_back 
	res.resize(j); // Now we need the correct size for the algorithm below

	int mroot = sqrt(n);
	int half = static_cast(res.size()); // static_cast is always better than reinterpret cast!
	int i = 0;
	int m = 3;

	while (m &lt;= mroot) {
		if (res[i]) {
			int j = static_cast(0.5*(m*m - 3)); // inside a loop, replace division /2 by mult. 0.5
			res[j] = 0;
			while (j &lt; half) {
				res[j] = 0;
				j += m;
			}
		}
		i++; 
		m = 2*i + 3;
	}

        res.push_back(2);
	std::vector::iterator pend  = 
        std::remove (res.begin(), res.end(), 0);  // remove does not change capacity
			                          // or size. Only reorder the vector and returns the 
			                          // iterators that points to the 
		                                  // last non zero position
	res.resize(pend-res.begin()); // again - no allocation or reallocation of memory!
}

int main() {
	std::vector res;
	res.reserve(10000001);  // after you reserve memory, resize() only plays with its internal 
                                // pointers as long as it does not exceed vector previous capacity
	for (int i = 1; i &lt;= 10; ++i) {
		get_primes7(10000000, res);		
		std::cout &lt;&lt; &quot;Found &quot; &lt;&lt; res.size() &lt;&lt; &quot; prime numbers.\n&quot;; 
	}

	return 0;
}
[/sourcecode]]]></description>
		<content:encoded><![CDATA[<p>Hello,<br />
I just want to say that your C++ code is really not the way that c++ coders implement algorithms. To exemplify that, I made really some small changes in your code and I was able to reduce runtime by 40% (I use intel compiler. Should check on gcc)!! Remember: one of the BIGGEST advantages of C++ in comparison to JAVA (and also python) is that you can control allocation of heap memory (which is a very slow process). So don&#8217;t create unnecessary vectors and do unnecessary reallocation if you don&#8217;t need. With one call of the vector function &#8220;reserve&#8221;, the use of iterators and the use of references, my version only allocates memory ONCE! Here is my code</p>
<p>Another thing to remember: push_back is slower than operator [] even when there is  no reallocation!!!</p>
<p>Here is my code</p>
<pre class="brush: cpp; collapse: true; light: false; title: ; toolbar: true; notranslate">
#include 
#include 
#include 
#include 

void get_primes7( int n, std::vector&amp; res ) {

	if (n &lt; 2) {
		res.resize(0);
		return;
	}

	if (n == 2) {
		res.resize(1);
		res[0] = 2;
		return;
	}
	
	// we dont need a second vector
	res.resize( static_cast(n/2.0) );       // resize vector only changes .end() 
                                                //  iterator. because we have enough capacity
                                                // it does NOT allocate or deallocate heap memory	
	size_t j = 0;
	for (int i = 3; i &lt; n + 1; i += 2, ++j) res[j] = i;  // operator [] is faster than push_back 
	res.resize(j); // Now we need the correct size for the algorithm below

	int mroot = sqrt(n);
	int half = static_cast(res.size()); // static_cast is always better than reinterpret cast!
	int i = 0;
	int m = 3;

	while (m &lt;= mroot) {
		if (res[i]) {
			int j = static_cast(0.5*(m*m - 3)); // inside a loop, replace division /2 by mult. 0.5
			res[j] = 0;
			while (j &lt; half) {
				res[j] = 0;
				j += m;
			}
		}
		i++; 
		m = 2*i + 3;
	}

        res.push_back(2);
	std::vector::iterator pend  = 
        std::remove (res.begin(), res.end(), 0);  // remove does not change capacity
			                          // or size. Only reorder the vector and returns the 
			                          // iterators that points to the 
		                                  // last non zero position
	res.resize(pend-res.begin()); // again - no allocation or reallocation of memory!
}

int main() {
	std::vector res;
	res.reserve(10000001);  // after you reserve memory, resize() only plays with its internal 
                                // pointers as long as it does not exceed vector previous capacity
	for (int i = 1; i &lt;= 10; ++i) {
		get_primes7(10000000, res);		
		std::cout &lt;&lt; &quot;Found &quot; &lt;&lt; res.size() &lt;&lt; &quot; prime numbers.\n&quot;; 
	}

	return 0;
}
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Boot Linux using Windows 7 boot loader by Tommy</title>
		<link>http://blog.famzah.net/2011/11/12/boot-linux-using-windows-7-boot-loader/#comment-5338</link>
		<dc:creator><![CDATA[Tommy]]></dc:creator>
		<pubDate>Tue, 11 Jun 2013 07:11:33 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=1059#comment-5338</guid>
		<description><![CDATA[These instructions worked beautifully, thanks!

In the Ubuntu setup, I manually made the first partiton of free space of about 100 MB with a mount point of /boot (in my case, was /dev/sda3).  Made remaining partitions for / (using most of the free space) and for swap (using a few gigs), then used the drop-down menu to select /dev/sda3 (where /boot would be installed) for installing GRUB, rather than /dev/sda.]]></description>
		<content:encoded><![CDATA[<p>These instructions worked beautifully, thanks!</p>
<p>In the Ubuntu setup, I manually made the first partiton of free space of about 100 MB with a mount point of /boot (in my case, was /dev/sda3).  Made remaining partitions for / (using most of the free space) and for swap (using a few gigs), then used the drop-down menu to select /dev/sda3 (where /boot would be installed) for installing GRUB, rather than /dev/sda.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Linux Cached/Buffers memory by Ivan Zahariev</title>
		<link>http://blog.famzah.net/2010/09/14/linux-cached-buffers-memory/#comment-5331</link>
		<dc:creator><![CDATA[Ivan Zahariev]]></dc:creator>
		<pubDate>Mon, 10 Jun 2013 17:58:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=905#comment-5331</guid>
		<description><![CDATA[Actually the graph is generated by RRDtool which many monitoring solutions use. S2Mon has however advanced a lot since I wrote this post: https://www.s2mon.com/static/images/previews/4.png

And you don&#039;t need to install S2Mon on a server of yours, as opposed to Munin or Nagios (thus you have less expenses, zero maintenance, etc).]]></description>
		<content:encoded><![CDATA[<p>Actually the graph is generated by RRDtool which many monitoring solutions use. S2Mon has however advanced a lot since I wrote this post: <a href="https://www.s2mon.com/static/images/previews/4.png" rel="nofollow">https://www.s2mon.com/static/images/previews/4.png</a></p>
<p>And you don&#8217;t need to install S2Mon on a server of yours, as opposed to Munin or Nagios (thus you have less expenses, zero maintenance, etc).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Linux Cached/Buffers memory by Andy</title>
		<link>http://blog.famzah.net/2010/09/14/linux-cached-buffers-memory/#comment-5325</link>
		<dc:creator><![CDATA[Andy]]></dc:creator>
		<pubDate>Mon, 10 Jun 2013 13:17:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=905#comment-5325</guid>
		<description><![CDATA[The grap looks like munin. Very easy to install. You can open resulting index.html manually or let the result be served by apache.]]></description>
		<content:encoded><![CDATA[<p>The grap looks like munin. Very easy to install. You can open resulting index.html manually or let the result be served by apache.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Super Micro IPMI Console + Java are killing me by pixelab</title>
		<link>http://blog.famzah.net/2010/07/05/the-super-micro-ipmi-console-plus-java-are-killing-me/#comment-5210</link>
		<dc:creator><![CDATA[pixelab]]></dc:creator>
		<pubDate>Tue, 07 May 2013 13:41:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.famzah.net/?p=761#comment-5210</guid>
		<description><![CDATA[This thing is driving me crazy. At least it&#039;s working for several weeks but quite often we have a rack that suddenly doesn&#039;t want to be accesed by the console, and we have either a black screen showing up in jviewer, or other nice bugs... I keep updating the firmwares but nothing seems to be solved yet...]]></description>
		<content:encoded><![CDATA[<p>This thing is driving me crazy. At least it&#8217;s working for several weeks but quite often we have a rack that suddenly doesn&#8217;t want to be accesed by the console, and we have either a black screen showing up in jviewer, or other nice bugs&#8230; I keep updating the firmwares but nothing seems to be solved yet&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
