This is the second post for million integer problems. The first post is about generating million integers. You can find it here. This post deals with sorting of million integers.
It can be proven that the best that a comparison-based sorting can do has average run time of nlogn. However, given specific inputs, there’re algorithms that can do better than nlogn. The million integer sorting is one of such cases.
1. Sorting a million integers in the range of [0, 1 000 000]. Numbers can appear many times.
Since the comparison-based sorting algorithms can only do nlogn, we’ll see if we can sort without compare the one million elements.
The inputs are all integers, and it’s within a specific range [0, 1 000 000]. Integers can be the index of arrays. If we have an array with 1 million indices, every time an input appears, we mark the element in the array. If an integer appear N times, we mark the array element N times.
After reading all inputs, the indices corresponding to all input integers are marked. Because indices are ordered, we simply read from array element 0 to 1 million, if the array element is marked as N, we output the index N times.
The C implementation is as below,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXV 1000000
unsigned int buckets[MAXV+1];
int main(int argc, char **argv) {
FILE* inputF, *outputF;
int i;
char aline[15];
inputF = fopen("input1.txt", "r");
if (inputF == NULL) {
printf("cannot open input file\n");
return 0;
}
memset(buckets, 0x00, MAXV+1);
while (fgets(aline, 15, inputF) != NULL) {
i = atoi(aline);
// printf("...%d\n", i);
++buckets[i];
}
outputF = fopen("output1.txt", "w");
for (i = 0; i <= MAXV; ++i) {
while (buckets[i]--) {
fprintf(outputF, "%d\n", i);
}
}
fclose(inputF);
fclose(outputF);
}
You can use the code in part 1 generate 1 million integers to generate the input files for the above code.
2. Sorting a million integers in the range of [0, 1 000 000]. No number appears twice.
Compare with first problem, there’s one additional condition, the input integers are unique. This means the array element we used in first problem has either value 0 or 1. A single bit is enough to represent the array element.
The C implementation is as below,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXV 1000000
unsigned char buckets[(MAXV+7)/8];
void setb(int i) {
int idx = i/8;
int bpos = i%8;
buckets[idx] |= (0x01 << bpos);
}
int checkb(int i) {
int idx = i/8;
int bpos = i%8;
return (buckets[idx] & (0x01 << bpos));
}
int main(int argc, char **argv) {
FILE* inputF, *outputF;
int i;
char aline[15];
inputF = fopen("input.txt", "r");
if (inputF == NULL) {
printf("cannot open input file\n");
return 0;
}
memset(buckets, 0x00, (MAXV+7)/8);
while (fgets(aline, 15, inputF) != NULL) {
i = atoi(aline);
// printf("...%d\n", i);
setb(i);
}
outputF = fopen("output.txt", "w");
for (i = 0; i <= MAXV; ++i) {
if (checkb(i)) {
fprintf(outputF, "%d\n", i);
}
}
fclose(inputF);
fclose(outputF);
}
You can use the code in part 1 generate 1 million integers to generate the input files for the above code.
2.1 Implementation from Programming Perls
The classic book Programming Perls has a implementation that sorts the input integers. The code is as below,
/* Copyright (C) 1999 Lucent Technologies */
/* From 'Programming Pearls' by Jon Bentley */
/* bitsort.c -- bitmap sort from Column 1
* Sort distinct integers in the range [0..N-1]
*/
#include <stdio.h>
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
int main()
{ int i;
for (i = 0; i < N; i++)
clr(i);
/* Replace above 2 lines with below 3 for word-parallel init
int top = 1 + N/BITSPERWORD;
for (i = 0; i < top; i++)
a[i] = 0;
*/
while (scanf("%d", &i) != EOF)
set(i);
for (i = 0; i < N; i++)
if (test(i))
printf("%d\n", i);
return 0;
}
3. Related problems
3.1 There are 1 000 001 integers in the range of [0, 1 000 000]. Given a million unique integers in the range of [0, 1 000 000], find the integer that is missing.
This is actually the same as integer sorting problems. Once you sort the integers with the array, you can iterate through the array to find the array index whose element is not marked.
There are other problems. If you happens to encounter one, you’re welcome to add it in the comments.
References
1. Programming Perls. Second Edition.
2. Programming Perls Source Code website: http://www.cs.bell-labs.com/cm/cs/pearls/code.html
One comment on “Million Integer Problems — Part 2. Sorting a Million Integers”
Leave a Reply Cancel reply
40% Discount on My Book — Android NDK Cookbook
Android NDK Cookbook ebook 40% discount with promotion code MREANC40 at Packt Publishing The promotion code is valid until 15th June.Categories
- Android Apps (18)
- Android Audio Editor (1)
- TS 2 (3)
- Video Converter Android (8)
- Video2Gif (1)
- Android Tutorial (27)
- Android Dev Tools (1)
- API illustrated (8)
- Multimedia API (3)
- ffmpeg on Android (4)
- NDK (6)
- UI (6)
- Animation (2)
- Code Snippet (2)
- Coding Beyond Technique (18)
- a word, a world (4)
- Bug Rectified (4)
- Programming Habit (1)
- Software as a Career (1)
- Software as User Experience (1)
- Compilers and Related (2)
- ELF (2)
- Computer Languages (31)
- C/C++ (13)
- Java (9)
- JavaScript (2)
- PHP (1)
- Python (8)
- Data Structure & Algorithms (29)
- Bits (1)
- Data Structure (5)
- Integers (10)
- BigInteger (1)
- Prime (4)
- Search (3)
- Sorting (5)
- Strings (5)
- Database (1)
- SQLite (1)
- Digital Signal Processing (33)
- Distributed Systems (17)
- Apache Cassandra (6)
- Apache Hadoop (8)
- Apache Avro (3)
- Apache Nutch (3)
- Apache Solr (1)
- Linux Study Notes (40)
- crontab (1)
- Linux Kernel Programming (8)
- Linux Programming (12)
- IPC (2)
- Linux Network Programming (5)
- Linux Signals (2)
- Linux Shell Scripting (1)
- ssh (3)
- Machinery (30)
- misc (1)
- My Ideas (1)
- My Project (3)
- Mobile Caching (1)
- Selective Decoding (2)
- My Publication (1)
- My Readings (1)
- Networking (15)
- Program for Performance (8)
- Uncategorized (1)
- Virtual Machine (2)
- Web Dev (8)
- web components (3)
- Android Apps (18)
Recent Comments
Archives
- May 2013 (2)
- April 2013 (1)
- March 2013 (4)
- December 2012 (2)
- November 2012 (6)
- October 2012 (6)
- September 2012 (3)
- August 2012 (13)
- July 2012 (15)
- June 2012 (3)
- May 2012 (8)
- April 2012 (4)
- March 2012 (13)
- February 2012 (19)
- January 2012 (9)
- December 2011 (11)
- November 2011 (12)
- October 2011 (4)
- September 2011 (12)
- August 2011 (16)
- July 2011 (15)
- June 2011 (6)
- May 2011 (10)
- April 2011 (13)
- March 2011 (20)
- February 2011 (4)
- November 2010 (2)
- May 2010 (1)
- April 2010 (1)
- February 2010 (1)





For the Missing Integer problem, i duno if you know this. But there’s a linear algorithm for it.
Simply calculate the sum of all the numbers. and subtract it from
n(n+1)/2. Presto your missing number.Be sure to handle Integer overflows.