This is the second post for million integer problems. The first post is about generating million integers. You can find it here. This post deals with sorting of million integers.

It can be proven that the best that a comparison-based sorting can do has average run time of nlogn. However, given specific inputs, there’re algorithms that can do better than nlogn. The million integer sorting is one of such cases.

1. Sorting a million integers in the range of [0, 1 000 000]. Numbers can appear many times.

Since the comparison-based sorting algorithms can only do nlogn, we’ll see if we can sort without compare the one million elements.

The inputs are all integers, and it’s within a specific range [0, 1 000 000]. Integers can be the index of arrays. If we have an array with 1 million indices, every time an input appears, we mark the element in the array. If an integer appear N times, we mark the array element N times.

After reading all inputs, the indices corresponding to all input integers are marked. Because indices are ordered, we simply read from array element 0 to 1 million, if the array element is marked as N, we output the index N times.

The C implementation is as below,

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

 

#define MAXV 1000000

 

unsigned int buckets[MAXV+1];

 

int main(int argc, char **argv) {

   FILE* inputF, *outputF;

   int i;

   char aline[15];

   inputF = fopen("input1.txt", "r");

   if (inputF == NULL) {

       printf("cannot open input file\n");

       return 0;

   }

   memset(buckets, 0x00, MAXV+1);

   while (fgets(aline, 15, inputF) != NULL) {

       i = atoi(aline);

//       printf("...%d\n", i);

       ++buckets[i];

   }

   outputF = fopen("output1.txt", "w");

   for (i = 0; i <= MAXV; ++i) {

       while (buckets[i]--) {

           fprintf(outputF, "%d\n", i);

       }

   }

   fclose(inputF);

   fclose(outputF);

}

You can use the code in part 1 generate 1 million integers to generate the input files for the above code.

2. Sorting a million integers in the range of [0, 1 000 000]. No number appears twice.

Compare with first problem, there’s one additional condition, the input integers are unique. This means the array element we used in first problem has either value 0 or 1. A single bit is enough to represent the array element.

The C implementation is as below,

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

 

#define MAXV 1000000

 

unsigned char buckets[(MAXV+7)/8];

 

void setb(int i) {

    int idx = i/8;

    int bpos = i%8;

    buckets[idx] |= (0x01 << bpos);

}

 

int checkb(int i) {

    int idx = i/8;

    int bpos = i%8;

    return (buckets[idx] & (0x01 << bpos));

}

 

int main(int argc, char **argv) {

   FILE* inputF, *outputF;

   int i;

   char aline[15];

   inputF = fopen("input.txt", "r");

   if (inputF == NULL) {

       printf("cannot open input file\n");

       return 0;

   }

   memset(buckets, 0x00, (MAXV+7)/8);

   while (fgets(aline, 15, inputF) != NULL) {

       i = atoi(aline);

//       printf("...%d\n", i);

       setb(i);

   }

   outputF = fopen("output.txt", "w");

   for (i = 0; i <= MAXV; ++i) {

       if (checkb(i)) {

           fprintf(outputF, "%d\n", i);

       }

   }

   fclose(inputF);

   fclose(outputF);

}

You can use the code in part 1 generate 1 million integers to generate the input files for the above code.

2.1 Implementation from Programming Perls

The classic book Programming Perls has a implementation that sorts the input integers. The code is as below,

/* Copyright (C) 1999 Lucent Technologies */

/* From 'Programming Pearls' by Jon Bentley */

 

/* bitsort.c -- bitmap sort from Column 1

 *   Sort distinct integers in the range [0..N-1]

 */

 

#include <stdio.h>

 

#define BITSPERWORD 32

#define SHIFT 5

#define MASK 0x1F

#define N 10000000

int a[1 + N/BITSPERWORD];

 

void set(int i) {        a[i>>SHIFT] |=  (1<<(i & MASK)); }

void clr(int i) {        a[i>>SHIFT] &= ~(1<<(i & MASK)); }

int  test(int i){ return a[i>>SHIFT] &   (1<<(i & MASK)); }

 

int main()

{    int i;

    for (i = 0; i < N; i++)

        clr(i);

/*    Replace above 2 lines with below 3 for word-parallel init

    int top = 1 + N/BITSPERWORD;

    for (i = 0; i < top; i++)

        a[i] = 0;

 */

    while (scanf("%d", &i) != EOF)

        set(i);

    for (i = 0; i < N; i++)

        if (test(i))

            printf("%d\n", i);

    return 0;

}

3. Related problems
3.1 There are 1 000 001 integers in the range of [0, 1 000 000]. Given a million unique integers in the range of [0, 1 000 000], find the integer that is missing.

This is actually the same as integer sorting problems. Once you sort the integers with the array, you can iterate through the array to find the array index whose element is not marked.

There are other problems. If you happens to encounter one, you’re welcome to add it in the comments. :)

References

1. Programming Perls. Second Edition.

2. Programming Perls Source Code website: http://www.cs.bell-labs.com/cm/cs/pearls/code.html

 

One comment on “Million Integer Problems — Part 2. Sorting a Million Integers

  1. st0le on said:

    For the Missing Integer problem, i duno if you know this. But there’s a linear algorithm for it.

    Simply calculate the sum of all the numbers. and subtract it from n(n+1)/2. Presto your missing number.

    Be sure to handle Integer overflows.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.