Several days ago, I’ve got a discussion with another developer. We talked about whether we should use the input parameters as loop variables in a C program.

To illustrate the problem, suppose we have two functions as below.

void testlocalb(int *a, int lcnt) {

    int i;

    for (i = 0; i < lcnt; ++i) {

    }

    *a = i;

}

 

void testlocal(int *a, int n) {

    for (*a = 0; *a < n; ++(*a)) {

    }

}

The first function use a local variable as the loop counter, while the second one use the input parameter as the loop counter. The questions is: which one is more efficient?

In terms of space complexity, the second function doesn’t use local variable, so it saves some memory space at the stack. But how about execution time?

1. The local variable will be put into registers, so the first function is more efficient???

I wrote three different programs to verify this is true or not. The code is as below,

#include <stdio.h>

#include <stdlib.h>

#include <sys/time.h>

 

void testlocala(int *a, int n) {

    for (*a = 0; *a < n; ++(*a)) {

         //dummy: do nothing

    }

}

 

int main(int argc, char **argv) {

    int n = atoi(argv[1]);

    int rv;

    struct timeval stTime, edTime;

    gettimeofday(&stTime, NULL);

    testlocala(&rv, n);

    gettimeofday(&edTime, NULL);

    printf("%d\n", rv);    

    printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

    return 0;

}

The program above use the input parameter as the loop counter.

#include <stdio.h>

#include <stdlib.h>

 

void testlocalb(int *a, int lcnt) {

    int i;

    for (i = 0; i < lcnt; ++i) {

    }

    *a = i;

}

 

int main(int argc, char **argv) {

   int n = atoi(argv[1]);

   int rv;

   struct timeval stTime, edTime;

   gettimeofday(&stTime, NULL);

   testlocalb(&rv, n);

   gettimeofday(&edTime, NULL);

   printf("%d\n", rv);

   printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

}

The second program listed above use a local variable as the loop counter.

#include <stdio.h>

#include <stdlib.h>

 

void testlocalb(int *a, int lcnt) {

    register int i;

    for (i = 0; i < lcnt; ++i) {

    }

    *a = i;

}

 

int main(int argc, char **argv) {

   int n = atoi(argv[1]);

   int rv;

   struct timeval stTime, edTime;

   gettimeofday(&stTime, NULL);

   testlocalb(&rv, n);

   gettimeofday(&edTime, NULL);

   printf("%d\n", rv);

   printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

}

The third program also uses the local variable, but declares it as register.

To check whether the loop counter is put into register, we compile the code into assembly code using the command below,

gcc -S local1.c

The loop essentially contains a read and a write access to the loop counter. The read happens at the comparison and the write happens at the loop counter increment. We show the assembly code for the three functions above as following.

Firstly, the loop counter increment, which is the write access.

First function: using input parameter as loop counter

movl    8(%ebp), %eax                           #1

movl    (%eax), %eax                             #2

leal    1(%eax), %edx                           #3

movl    8(%ebp), %eax                           #4

movl    %edx, (%eax)                             #5

Second function: using local variable

addl $1, -4(%ebp)

Third function: using local variable with register declaration

addl $1, %ebx

From the assembly code, we can tell that the local variable without register declaration is not put into register, while the third function with register declaration is put into register.

The first function is a bit complicated, but the ++(*a) are actually two operations, including * (the first two lines) and ++(line 3 to 5). The increment is actually performed at register by the third line and then write back to memory at 5th line.

Secondly, the comparison, which is the read access.

First function: using input parameter as loop counter

movl 8(%ebp), %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax

Second function: using local variable

movl -4(%ebp), %eax
cmpl 12(%ebp), %eax

Third function: using local variable with register declaration

cmpl 12(%ebp), %ebx

Again, the first function requires two operations because we’re accessing it using a pointer, and the value is loaded from memory to register eax. The second function also loads the value from stack memory to register eax. The third function just do the comparison as the loop counter is already in register.

Which function is fast? It’s not difficult to tell that function 3 will be the fastest as it’s a single instruction performed at register for both read and write. How about the first one and second one?

Compile the code using the command below,

gcc -o local1 local1.c
gcc -o local2 local2.c
gcc -o local3 local3.c

Below is a sample execution result,

Figure 1. Running 3 testing programs with default compiler options

The third function runs fastest, as expected, follows by the first function and the second function goes last.

One can add code blocks inside the loop as below to further verify the performance difference is due to read or write.

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

d = *a;

 

++(*a);

--(*a);

++(*a);

--(*a);

++(*a);

--(*a);

++(*a);

--(*a);

++(*a);

--(*a);

On my machine, the result is that input parameter (function 1) performs better for both read and write.

To conclude the first doubt, the local variable is not put into register under the default compiler options. Declare the local variable as “register” gives hint to the compiler that the variable should be put into register and it helps sometimes. Without “register”, input parameter performs better than local variable for both read and write accesses.

2. What if the compiler optimization options are used?

With the optimization options enabled, the loop in the code above will be taken away. I changed the code to below to avoid removal of loop.

First function: using input parameter as loop counter

#include <stdio.h>

#include <stdlib.h>

#include <limits.h>

 

void testlocala(int *a, int n, int *b) {

    unsigned int x = 1, y = 1;

    for (*a = 0; *a < n; ++(*a)) {

        x += y;

        y = x - y;

        if (y > INT_MAX) {

            x = 1;

            y = 1;

        }

    }

    *b = y;

}

 

int main(int argc, char **argv) {

    int n = atoi(argv[1]);

    int rv, rv1;

    struct timeval stTime, edTime;

    gettimeofday(&stTime, NULL);

    testlocala(&rv, n, &rv1);

    gettimeofday(&edTime, NULL);

    printf("%d:%d\n", rv, rv1);    

    printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

    return 0;

}

Second function: using local variable

#include <stdio.h>

#include <stdlib.h>

#include <limits.h>

 

void testlocalb(int *a, int lcnt, int *b) {

    int i;

    unsigned int x = 1, y = 1;

    for (i = 0; i < lcnt; ++i) {

        x += y;

        y = x - y;

        if (y > INT_MAX) {

            x = 1; 

            y = 1;

        }

    }

    *a = i;

    *b = y;

}

 

int main(int argc, char **argv) {

   int n = atoi(argv[1]);

   int rv, rv1;

   struct timeval stTime, edTime;

   gettimeofday(&stTime, NULL);

   testlocalb(&rv, n, &rv1);

   gettimeofday(&edTime, NULL);

   printf("%d:%d\n", rv, rv1);

   printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

}

Third function: using local variable with register declaration

#include <stdio.h>

#include <stdlib.h>

#include <limits.h>

 

void testlocalb(int *a, int lcnt, int *b) {

    register int i;

    unsigned int x = 1, y = 1;

    for (i = 0; i < lcnt; ++i) {

        x += y;

        y = x - y;

        if (y > INT_MAX) {

            x = 1;

            y = 1;

        }

    }

    *a = i;

    *b = y;

}

 

int main(int argc, char **argv) {

   int n = atoi(argv[1]);

   int rv, rv1;

   struct timeval stTime, edTime;

   gettimeofday(&stTime, NULL);

   testlocalb(&rv, n, &rv1);

   gettimeofday(&edTime, NULL);

   printf("%d:%d\n", rv, rv1);

   printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));

}

We compile the 3 functions into assembly using the commands below,

gcc -O1 -S local11.c
gcc -O1 -S local22.c
gcc -O1 -S local33.c

The optimized assembly code for increment of loop counter and loop condition test for all three versions are the same,

addl $1, %ecx
cmpl %ebx, %ecx

The code for loading the loop counter values to register ecx are not listed for simplicity. The execution time for the 3 programs are as below,

function 1: input parameter,  3:172661
function 2: local variable, 4:4294877639
function 3: local variable with register declaration, 4:4294786759

We can also compile the code using O2 and O3 optimization options. The execution time are as below,

For O2,

function 1: input parameter,  3:669508
function 2: local variable, 3:572726
function 3: local variable with register declaration, 3:641155

For O3,

function 1: input parameter,  4:4294628948
function 2: local variable, 3:612397
function 3: local variable with register declaration, 3:688610

For O3 + funroll-loops, (gcc -O3 -funroll-loops local11.c -o local11)

function 1: input parameter,  4:4294060070
function 2: local variable, 3:69106
function 3: local variable with register declaration, 3:62729

In conclusion, the proper compiler flags can optimize the code, and the usage of input parameter and local variable can achieve similar run time.

Note that the experiment are carried once only. For more accurate results, one should execute the program many times and take the average.

References:
1. http://www.network-theory.co.uk/docs/gccintro/gccintro_50.html
2. http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

 

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.