Shall We Use the Input Parameters as Loop Variable?
Several days ago, I’ve got a discussion with another developer. We talked about whether we should use the input parameters as loop variables in a C program.
To illustrate the problem, suppose we have two functions as below.
void testlocalb(int *a, int lcnt) {
int i;
for (i = 0; i < lcnt; ++i) {
}
*a = i;
}
void testlocal(int *a, int n) {
for (*a = 0; *a < n; ++(*a)) {
}
}
The first function use a local variable as the loop counter, while the second one use the input parameter as the loop counter. The questions is: which one is more efficient?
In terms of space complexity, the second function doesn’t use local variable, so it saves some memory space at the stack. But how about execution time?
1. The local variable will be put into registers, so the first function is more efficient???
I wrote three different programs to verify this is true or not. The code is as below,
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
void testlocala(int *a, int n) {
for (*a = 0; *a < n; ++(*a)) {
//dummy: do nothing
}
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocala(&rv, n);
gettimeofday(&edTime, NULL);
printf("%d\n", rv);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
return 0;
}
The program above use the input parameter as the loop counter.
#include <stdio.h>
#include <stdlib.h>
void testlocalb(int *a, int lcnt) {
int i;
for (i = 0; i < lcnt; ++i) {
}
*a = i;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocalb(&rv, n);
gettimeofday(&edTime, NULL);
printf("%d\n", rv);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
}
The second program listed above use a local variable as the loop counter.
#include <stdio.h>
#include <stdlib.h>
void testlocalb(int *a, int lcnt) {
register int i;
for (i = 0; i < lcnt; ++i) {
}
*a = i;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocalb(&rv, n);
gettimeofday(&edTime, NULL);
printf("%d\n", rv);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
}
The third program also uses the local variable, but declares it as register.
To check whether the loop counter is put into register, we compile the code into assembly code using the command below,
gcc -S local1.c
The loop essentially contains a read and a write access to the loop counter. The read happens at the comparison and the write happens at the loop counter increment. We show the assembly code for the three functions above as following.
Firstly, the loop counter increment, which is the write access.
First function: using input parameter as loop counter
movl 8(%ebp), %eax #1
movl (%eax), %eax #2
leal 1(%eax), %edx #3
movl 8(%ebp), %eax #4
movl %edx, (%eax) #5
Second function: using local variable
addl $1, -4(%ebp)
Third function: using local variable with register declaration
addl $1, %ebx
From the assembly code, we can tell that the local variable without register declaration is not put into register, while the third function with register declaration is put into register.
The first function is a bit complicated, but the ++(*a) are actually two operations, including * (the first two lines) and ++(line 3 to 5). The increment is actually performed at register by the third line and then write back to memory at 5th line.
Secondly, the comparison, which is the read access.
First function: using input parameter as loop counter
movl 8(%ebp), %eax
movl (%eax), %eax
cmpl 12(%ebp), %eax
Second function: using local variable
movl -4(%ebp), %eax
cmpl 12(%ebp), %eax
Third function: using local variable with register declaration
cmpl 12(%ebp), %ebx
Again, the first function requires two operations because we’re accessing it using a pointer, and the value is loaded from memory to register eax. The second function also loads the value from stack memory to register eax. The third function just do the comparison as the loop counter is already in register.
Which function is fast? It’s not difficult to tell that function 3 will be the fastest as it’s a single instruction performed at register for both read and write. How about the first one and second one?
Compile the code using the command below,
gcc -o local1 local1.c
gcc -o local2 local2.c
gcc -o local3 local3.c
Below is a sample execution result,
Figure 1. Running 3 testing programs with default compiler options
The third function runs fastest, as expected, follows by the first function and the second function goes last.
One can add code blocks inside the loop as below to further verify the performance difference is due to read or write.
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
d = *a;
++(*a);
--(*a);
++(*a);
--(*a);
++(*a);
--(*a);
++(*a);
--(*a);
++(*a);
--(*a);
On my machine, the result is that input parameter (function 1) performs better for both read and write.
To conclude the first doubt, the local variable is not put into register under the default compiler options. Declare the local variable as “register” gives hint to the compiler that the variable should be put into register and it helps sometimes. Without “register”, input parameter performs better than local variable for both read and write accesses.
2. What if the compiler optimization options are used?
With the optimization options enabled, the loop in the code above will be taken away. I changed the code to below to avoid removal of loop.
First function: using input parameter as loop counter
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
void testlocala(int *a, int n, int *b) {
unsigned int x = 1, y = 1;
for (*a = 0; *a < n; ++(*a)) {
x += y;
y = x - y;
if (y > INT_MAX) {
x = 1;
y = 1;
}
}
*b = y;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv, rv1;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocala(&rv, n, &rv1);
gettimeofday(&edTime, NULL);
printf("%d:%d\n", rv, rv1);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
return 0;
}
Second function: using local variable
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
void testlocalb(int *a, int lcnt, int *b) {
int i;
unsigned int x = 1, y = 1;
for (i = 0; i < lcnt; ++i) {
x += y;
y = x - y;
if (y > INT_MAX) {
x = 1;
y = 1;
}
}
*a = i;
*b = y;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv, rv1;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocalb(&rv, n, &rv1);
gettimeofday(&edTime, NULL);
printf("%d:%d\n", rv, rv1);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
}
Third function: using local variable with register declaration
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
void testlocalb(int *a, int lcnt, int *b) {
register int i;
unsigned int x = 1, y = 1;
for (i = 0; i < lcnt; ++i) {
x += y;
y = x - y;
if (y > INT_MAX) {
x = 1;
y = 1;
}
}
*a = i;
*b = y;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int rv, rv1;
struct timeval stTime, edTime;
gettimeofday(&stTime, NULL);
testlocalb(&rv, n, &rv1);
gettimeofday(&edTime, NULL);
printf("%d:%d\n", rv, rv1);
printf("%u:%u\n", (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));
}
We compile the 3 functions into assembly using the commands below,
gcc -O1 -S local11.c
gcc -O1 -S local22.c
gcc -O1 -S local33.c
The optimized assembly code for increment of loop counter and loop condition test for all three versions are the same,
addl $1, %ecx
cmpl %ebx, %ecx
The code for loading the loop counter values to register ecx are not listed for simplicity. The execution time for the 3 programs are as below,
function 1: input parameter, 3:172661
function 2: local variable, 4:4294877639
function 3: local variable with register declaration, 4:4294786759
We can also compile the code using O2 and O3 optimization options. The execution time are as below,
For O2,
function 1: input parameter, 3:669508
function 2: local variable, 3:572726
function 3: local variable with register declaration, 3:641155
For O3,
function 1: input parameter, 4:4294628948
function 2: local variable, 3:612397
function 3: local variable with register declaration, 3:688610
For O3 + funroll-loops, (gcc -O3 -funroll-loops local11.c -o local11)
function 1: input parameter, 4:4294060070
function 2: local variable, 3:69106
function 3: local variable with register declaration, 3:62729
In conclusion, the proper compiler flags can optimize the code, and the usage of input parameter and local variable can achieve similar run time.
Note that the experiment are carried once only. For more accurate results, one should execute the program many times and take the average.
References:
1. http://www.network-theory.co.uk/docs/gccintro/gccintro_50.html
2. http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Leave a Reply Cancel reply
40% Discount on My Book — Android NDK Cookbook
Android NDK Cookbook ebook 40% discount with promotion code MREANC40 at Packt Publishing The promotion code is valid until 15th June.Categories
- Android Apps (18)
- Android Audio Editor (1)
- TS 2 (3)
- Video Converter Android (8)
- Video2Gif (1)
- Android Tutorial (26)
- Android Dev Tools (1)
- API illustrated (8)
- Multimedia API (3)
- ffmpeg on Android (4)
- NDK (6)
- UI (5)
- Animation (1)
- Code Snippet (2)
- Coding Beyond Technique (18)
- a word, a world (4)
- Bug Rectified (4)
- Programming Habit (1)
- Software as a Career (1)
- Software as User Experience (1)
- Compilers and Related (2)
- ELF (2)
- Computer Languages (31)
- C/C++ (13)
- Java (9)
- JavaScript (2)
- PHP (1)
- Python (8)
- Data Structure & Algorithms (29)
- Bits (1)
- Data Structure (5)
- Integers (10)
- BigInteger (1)
- Prime (4)
- Search (3)
- Sorting (5)
- Strings (5)
- Database (1)
- SQLite (1)
- Digital Signal Processing (33)
- Distributed Systems (17)
- Apache Cassandra (6)
- Apache Hadoop (8)
- Apache Avro (3)
- Apache Nutch (3)
- Apache Solr (1)
- Linux Study Notes (40)
- crontab (1)
- Linux Kernel Programming (8)
- Linux Programming (12)
- IPC (2)
- Linux Network Programming (5)
- Linux Signals (2)
- Linux Shell Scripting (1)
- ssh (3)
- Machinery (30)
- misc (1)
- My Ideas (1)
- My Project (3)
- Mobile Caching (1)
- Selective Decoding (2)
- My Publication (1)
- My Readings (1)
- Networking (15)
- Program for Performance (8)
- Uncategorized (1)
- Virtual Machine (2)
- Web Dev (8)
- web components (3)
- Android Apps (18)
Recent Comments
Archives
- May 2013 (1)
- April 2013 (1)
- March 2013 (4)
- December 2012 (2)
- November 2012 (6)
- October 2012 (6)
- September 2012 (3)
- August 2012 (13)
- July 2012 (15)
- June 2012 (3)
- May 2012 (8)
- April 2012 (4)
- March 2012 (13)
- February 2012 (19)
- January 2012 (9)
- December 2011 (11)
- November 2011 (12)
- October 2011 (4)
- September 2011 (12)
- August 2011 (16)
- July 2011 (15)
- June 2011 (6)
- May 2011 (10)
- April 2011 (13)
- March 2011 (20)
- February 2011 (4)
- November 2010 (2)
- May 2010 (1)
- April 2010 (1)
- February 2010 (1)




