## Suffix Array Part 2–String Matching/Searching

This is part 2 of the post “suffix array”, which covers the string matching/searching problem that can be solved by suffix array. You may want to read part 1 first.

String Matching/Searching

Once we have the suffix array, it’s easy to check if another string appears in the input string/text or not. If the string appears in the input text, if will be the prefix of one or more its suffix. Since the suffix array are sorted suffix of the input string, binary search can be applied.

Below is an implementation that finds all occurrence of a string in the input text,

`/**`

`this program demonstrate how to use suffix array to search for a particular substring `

`*/`

` `

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

` `

`int pstrcmp(const void *a, const void *b) {`

`    //printf("pstrcmp: %s %sn", (const char*)*(char **)a, (const char*)*(char **)b);`

`    return strcmp((const char *)*(char **)a, (const char *)*(char **)b);`

`}`

` `

`int mybsearch(const char *target, char **ap, int len) {`

`    int l, r, m;`

`    int tsize = strlen(target);`

`    for (l = 0, r = len - 1; l < len && r >= 0 && l <= r;){`

`        m = (l+r)/2;`

`        if (strncmp(target, ap[m], tsize) < 0) {`

`            r = m - 1;`

`        } else if (strncmp(target, ap[m], tsize) > 0) {`

`            l = m + 1;`

`        } else {`

`            return m;`

`        }`

`    }`

`    return -1;`

`}`

` `

`int main(int argc, char **argv) {`

`    int i;`

`    char **ap;      //suffix pointers array`

`    int len;`

`    int bpos;`

`    len = strlen(argv[1]);`

`    ap = (char**)malloc(len*sizeof(char*));`

`    for (i = 0; i < len; ++i) {`

`        ap[i] = &(argv[1][i]);`

`    }`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`    printf("sort the sufficesn");`

`    qsort(ap, len, sizeof(char *), pstrcmp);`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`    printf("searching for string:%sn", argv[2]);`

`    bpos = mybsearch(argv[2], ap, len);`

`    printf("found at suffix array pos, original string pos: %d, %dn", bpos, (ap[bpos]-argv[1])/sizeof(char));`

`    if (bpos > 0) {`

`        printf("found %sn", ap[bpos]);`

`        //after binary search, we searching in both left directions and right directions for suffix`

`        //that also has the target string as prefix`

`        for (i = bpos-1; i >= 0; --i){`

`            if (strncmp(argv[2], ap[i], strlen(argv[2])) == 0) {`

`                printf("found at suffix array pos, original string pos: %d, %dn", i, (ap[i]-argv[1])/sizeof(char));`

`            } else {`

`                break;`

`            }`

`        }`

`        for (i = bpos + 1; i < len; ++i) {`

`            if (strncmp(argv[2], ap[i], strlen(argv[2])) == 0) {`

`                printf("found at suffix array pos, original string pos: %d, %dn", i, (ap[i]-argv[1])/sizeof(char));`

`            } else {`

`                break;`

`            }`

`        }`

`    }`

`}`

The program accepts two input parameters, the input text and the string to search for.
Compile the code using the command below,

gcc -o sufsearch sufsearch.c

Below is a screenshot of the execution,

Figure 1. Searching Strings using Suffix Array

Suppose the length of the string is m, the input text is n. Note that every comparison of the string with suffix of the input text consumes O(m) and O(logn) comparisons is needed to perform the binary search. Therefore, the run time for the algorithm above is O(mlogn), excluding the cost of building the suffix array.

There’re are better algorithms that use the longest common prefix to help the search process. The run time is O(m + logn). Interested users can refer to reference 6 for details.

Note that the input text can be preprocessed to build the suffix array, and the search target string can be given later. This differs from KMP algorithm that the pattern must be given before the computation can begin. This difference makes them suitable for different applications.

References:
1. Suffix Array, Wikipedia. http://en.wikipedia.org/wiki/Suffix_array
2. Brief Introduction to Suffix Array: http://sary.sourceforge.net/docs/suffix-array.html
3. Algorithms, 4th edition website: http://algs4.cs.princeton.edu/63suffix/
4. Programming Perls. 2nd Edition.
6. Suffix Arrays: a new method for On-Line String Searches: http://delivery.acm.org/10.1145/330000/320218/p319-manber.pdf?ip=137.132.250.14&acc=ACTIVE%20SERVICE&CFID=67974067&CFTOKEN=52233823&__acm__=1330495809_b1bcb53c53d3d7a276f870cb1e0fdf89

## Suffix Array–Part 1

Suffix Array

Suffix array is an array which contains the suffix of a string sorted in lexicographical (alphabetical) order. It’s used in several string matching/searching problems. This post briefly explains suffix array, its applications and provides C implementations to some of them.

To compute the suffix array, we first find all the suffix. In C, this is simply finding the start position of each suffix and use a pointer pointing to it. We then sort the array of suffix. This is implemented as code below,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

` `

`int pstrcmp(const void *a, const void *b) {`

`    //printf("pstrcmp: %s %sn", (const char*)*(char **)a, (const char*)*(char **)b);`

`    return strcmp((const char *)*(char **)a, (const char *)*(char **)b);`

`}`

` `

`int main(int argc, char **argv) {`

`    int i;`

`    char **ap;      //suffix pointers array`

`    int len;`

`    len = strlen(argv[1]);`

`    ap = (char**)malloc(len*sizeof(char*));`

`    for (i = 0; i < len; ++i) {`

`        ap[i] = &(argv[1][i]);`

`    }`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`    printf("sort the sufficesn");`

`    qsort(ap, len, sizeof(char *), pstrcmp);`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`}`

The code accepts an input string and compute the suffix array according to the description above. The C standard library qsort is used to sort the suffix.

Compile the code using the command below,

gcc -o suffix suffix.c

Below is a screenshot of the code executed,

Figure 1. Running Suffix Array Computation

As shown above, the program first output all suffix for the input array “sewfds”, then sort it and output the suffix array.

Note that get the suffix consumes O(n), sort them consumes O(nlogn) of suffix comparison. But the suffix comparison (which is string comparison) requires O(n) time. So the overall runtime is O(n) + O(n^2logn) = O(n^2logn). There’re algorithms can compute suffix array in faster ways, but they’re not discussed in this post.

Longest Repeated String

The first problem that suffix array can be used is the longest repeated string/substring: given an input array, find the longest duplicated substring in it. For example, given the input string “Ask not what your country can do for you, but what you can do for your country”, the longest duplicated substring is “ can do for you “, and “ your country “ is a close second place. (This is an example given in reference 4.)

If we can compute the suffix array for the input string, then all the substrings of the input string become the prefix of some suffix. If the substring is duplicated, then it appears in multiple suffix. Since the suffix array is sorted, then finding the duplicated repeated string becomes finding the prefix of the neighboring suffix. The longest duplicated repeated string is simply the longest common prefix of neighboring suffix.

The C implementation is given as below,

`/**`

`find longest repeated substring using suffix array`

`*/`

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

` `

`int pstrcmp(const void *a, const void *b) {`

`    //printf("pstrcmp: %s %sn", (const char*)*(char **)a, (const char*)*(char **)b);`

`    return strcmp((const char *)*(char **)a, (const char *)*(char **)b);`

`}`

` `

`int comlen(const char *a, const char *b) {`

`    int i = 0;`

`    while ((a[i]!='') && (b[i] != '') && (a[i] == b[i])) {`

`        ++i;`

`    }`

`    return i;`

`}`

` `

`int lrs(char **ap, int len, char **lrsp) {`

`    int i;`

`    int maxlen = 0;`

`    int clen;`

`    for (i = 0; i < len-1; ++i) {`

`        if ((clen = comlen(ap[i], ap[i+1])) > maxlen) {`

`            *lrsp = ap[i];`

`            maxlen = clen;`

`        }`

`    }`

`    return maxlen;`

`}`

` `

`int main(int argc, char **argv) {`

`    int i;`

`    char **ap;      //suffix pointers array`

`    int len, lrslen;`

`    char *lrsp;`

`    len = strlen(argv[1]);`

`    ap = (char**)malloc(len*sizeof(char*));`

`    for (i = 0; i < len; ++i) {`

`        ap[i] = &(argv[1][i]);`

`    }`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`    printf("sort the sufficesn");`

`    qsort(ap, len, sizeof(char *), pstrcmp);`

`    for (i = 0; i < len; ++i) {`

`        printf("%sn", ap[i]);`

`    }`

`    printf("find the longest repeated subtringn");`

`    lrslen = lrs(ap, len, &lrsp);`

`    printf("%.*sn", lrslen, lrsp); `

`}`

The above code is written according to the description above. Save the code to suflrs.c and compile the code using the command below,

gcc -o suflrs suflrs.c

The code accepts an input string and outputs the longest repeated substring. A simple execution (with some printf disabled) is as below,

Figure 2. Finding Longest Repeated Substring using Suffix Array

Note that building the suffix array consumes O(n^2logn). Finding the longest commong substring takes O(n^2). So the total time is O(n^2logn).

For a fast suffix array construction library, please refer to reference 5.

More applications of suffix array are covered in part 2.

References:
1. Suffix Array, Wikipedia. http://en.wikipedia.org/wiki/Suffix_array
2. Brief Introduction to Suffix Array: http://sary.sourceforge.net/docs/suffix-array.html
3. Algorithms, 4th edition website: http://algs4.cs.princeton.edu/63suffix/
4. Programming Perls. 2nd Edition.

## Merge Sort

The Algorithm

Merge Sort is a stable (preserve the order of equal elements), divide and conquer, comparison-based sorting algorithm. It has the worst-case run time of nlogn. The primary disadvantage of merge sort is usage of additional memory space.
The basic idea of merge sort is as below,

• Divide the input array into sub arrays, each sub array contains 1 element.
• Merge every two sub arrays into one sorted array. Repeat this process until there’s only one array left.

To ensure the merged array is sorted, we always compare the first element of the two sorted sub arrays and take the smaller one of the two. For example, we have an input array of “efdgabnm”. The merge process will be as below,
e        f        d        g        a         b      n         m
ef                dg                 ab               mn
defg                                  abmn
abdefgmn

There’re two common ways to implement merge sort. The first is called Top-down implementation, where recursion is used to divided the input array into sub arrays until each sub array contains 1 element and then merge them back. The second implementation is called Bottom-up implementation, where the input array is treated as an array of 1-element sub arrays. The algorithm merge the sub arrays iteratively between two buffers.

An Implementation in C

Below is the Top-down implementation of merge sort in C,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`void mergesort(int *base, int size, int **sorted) {`

`    int i, j;`

`    int *mp, *smp, *sbase;`

`    int *tmp;`

`    if (size <= 1) {`

`        if (size == 1) {`

`            *sorted = (int *)malloc(sizeof(int)*size);`

`            (*sorted)[0] = base[0];`

`        }`

`        return;`

`    }`

`    mp = base + size/2;`

`    mergesort(base, size/2, &sbase);`

`    mergesort(mp, size - size/2, &smp);`

`    printf("before merge:n");`

`    for (i = 0; i < size/2; ++i) {`

`        printf("%d ", sbase[i]);`

`    }`

`    printf("n");`

`    for (i = 0; i < size-size/2; ++i) {`

`        printf("%d ", smp[i]);`

`    }`

`    printf("n");`

`    tmp = (int *)malloc(sizeof(int)*size);`

`    for (i = 0, j = 0; i < size/2 && j < size-size/2; ) {`

`        if (sbase[i] < smp[j]) {`

`            tmp[i+j] = sbase[i];`

`            ++i;`

`        } else {`

`            tmp[i+j] = smp[j];`

`            ++j;`

`        }`

`    }`

`    for (; i < size/2; ++i) {`

`        tmp[i+j] = sbase[i];`

`    }`

`    for (; j < size-size/2; ++j) {`

`        tmp[i+j] = smp[j];`

`    }`

`    printf("after merge:n");`

`    for (i = 0; i < size; ++i) {`

`        printf("%d ", tmp[i]);`

`    }`

`    printf("n");`

`    free(smp);`

`    free(sbase);`

`    *sorted = tmp;`

`}`

` `

`int main(int argc, char **argv) {`

`    int n;`

`    int i;`

`    int* testArray, *sorted;`

`    n = atoi(argv[1]);`

`    testArray = (int*)malloc(sizeof(int)*n);`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000;`

`    }`

`    printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");`

`    mergesort(testArray, n, &sorted);`

`    printf("nafter sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", sorted[i]);`

`    }`

`    printf("n");`

`}`

Compile the code with the command below,

gcc -o ms mergesort.c

Here is a screenshot of a sample execution,

Figure 1. Execution of Top-down Implementation of Merge Sort

Below is the Bottom-up implementation of merge sort in C,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`int merge(int *a, int sa, int *b, int sb, int *res) {`

`    int i, j;`

`    for (i = 0, j = 0; i < sa && j < sb; ){`

`        if (a[i] < b[j]) {`

`            res[i+j] = a[i];`

`            ++i;`

`        } else {`

`            res[i+j] = b[j];`

`            ++j;`

`        }`

`    }`

`    for (; i < sa; ++i) {`

`        res[i+j] = a[i];`

`    }`

`    for (; j < sb; ++j) {`

`        res[i+j] = b[j];`

`    }`

`}`

` `

`void mergesort(int *base, int size, int **sorted) {`

`    int i, j, k;`

`    int sa, sb;`

`    *sorted = (int*)malloc(sizeof(int)*size);`

`    if (size == 1) {`

`         (*sorted)[0] = base[0];`

`         return;`

`    }`

`    for (i = 1; i < size; i *= 2) {   //every round of merge size*2`

`        for (j = 0; j < size; j += 2*i) {`

`            sa = (j + i >= size)? (size - j):i;`

`            sa = (sa < 0) ? 0:sa;`

`            sb = (j + i + i >= size)?(size - (j + i)):i;`

`            sb = (sb < 0) ? 0:sb;`

`            merge(base + j, sa, base + j + i, sb, (*sorted) + j);`

`        }`

`        memcpy(base, *sorted, sizeof(int)*size);`

`        printf("i=%d;n", i, j);`

`        for (k = 0; k < size; ++k) {`

`            printf("%d ", base[k]);`

`        }`

`        printf("n");`

`    } `

`}`

` `

`int main(int argc, char **argv) {`

`    int n;`

`    int i;`

`    int* testArray, *sorted;`

`    n = atoi(argv[1]);`

`    testArray = (int*)malloc(sizeof(int)*n);`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000;`

`    }`

`    printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");`

`    mergesort(testArray, n, &sorted);`

`    printf("nafter sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", sorted[i]);`

`    }`

`    printf("n");`

`}`

Compile the code with the command below,

gcc -o ms2 mergesort2.c

Here is a screenshot of a sample execution,

Figure 2. Execution of Bottom-up Implementation of Merge Sort

References:
1. Programming Perls. 2nd Edition.
2. Wikipedia Merge Sort: http://en.wikipedia.org/wiki/Merge_sort
3. Algorithms. 2.2 Mergesort. http://algs4.cs.princeton.edu/22mergesort/

## Java Programming Cheat Sheet 4 — Collections

This is the 4th part of the Java Programming Cheat Sheet. You can find the first three parts here:

Collection: sometimes called a container, an object that groups multiple elements into a single unit. The Java collection framework contains the following,

• interfaces: abstract data types that represent collections. it allows collections to be manipulated independently of the details of their representation.
• implementations: concrete implementations of the collection interfaces.
• algorithms: they operates on objects that implements the collection interfaces. The algorithms are polymorphic, in the sense that they can operate on different implementations of appropriate interface.

Interfaces: Java Collection Framework  has the following interfaces, note that a map is not a true collection.

Figure 1. Java Collection Interfaces Hierarchy

• All the core collection interfaces are generic.
• Collection: the root of the collection hierarchy. Java doesn’t provide any direct implementations of this interface.

Set: a collection with unique elements. It models the set concept in mathematics. Java contains three general purpose implementations for Set

• HashSet: stores elements in a hash table. It has best performance, but offers no order of iteration.
• TreeSet: stores elements in a red-black tree, a self-balancing binary search tree. The elements are ordered based on their values, but it performs much slower than HashSet.
• LinkedHashSet: stores elements in a hash table with linked list running through it. The elements are ordered according to insertion order. The performance is slightly slower than HashSet, but much better than TreeSet.

List: ordered collection, may contain duplicate elements. It includes interfaces for following operations,

• positional access: access elements based on their numerical position in the list
• search: search for a specific element in the list and returns its numerical position
• iteration: use iterator to go through the list
• range view: performs arbitrary range operations on the list, e.g. sublist
• List provides two general purpose implementations
• ArrayList: implemented using a dynamic array. Just like array.

Queue: holds elements before processing. It provides insertion, removal and inspection operations. It has two general implementations

• LinkedList: LinkedList implements both List and Queue. It provides a FIFO (first in, first out) queue operations.
• PriorityQueue: implemented based on a heap. Orders elements according to the comparator specified at construction.

Map: an object that maps keys to values. It cannot contain duplicate keys, and each key can map to at most one value.

• Java doesn’t provide interface for a multimap, where each key of a map can map to multiple values. However, one can use a map whose values are List instances to emulate a multimap.
• Map has three general-purpose implementations, similar to Set
• HashMap: no order is maintained
• TreeMap: sorted according to key
• LinkedHashMap: insertion order or key access order (look up the value associated with a key brings the key to the end of the map). It also has removeEldestEntry method, which may be overriden to impose a policy to remove stale mapping automatically when new mappings are added.

SortedSet: a set that maintains its elements in ascending order. It provides additional operations compared with a normal set,

• range view: arbitrary range operations on the sorted map
• endpoints: return the first or last key in the sorted map
• comparator access: return the Comparator, if any, used to sort the map

SortedMap: a map that maintains its entries in ascending order. It provides additional operations
compared with a normal map,

• range view: arbitrary range operations on the sorted map
• endpoints: return the first or last key in the sorted map
• comparator access: return the Comparator, if any, used to sort the map

Algorithms: Java platform comes with a few algorithms that operates on the colletion objects. All are static methods of Collections class, with first argument is the collection on which the method is to be performed. The list of methods include,

• Collections.sort: use merge sort, which is stable and fast
• Collections.shuffle
• Collections.reverse, fill, copy, swap, addAll
• Collections.binarySearch
• Collections.frequency
• Collections.disjoint: check if two collections have some elements in common
• Collections.max, Collections.min

References:

Java tutorial: http://docs.oracle.com/javase/tutorial/collections/

## Heapsort

You may want to read the previous post “heap” first.

Heapsort Algorithm

Heapsort is a in-place, not stable, worst-case nlogn sorting algorithm.

The simple implementation of heapsort is simply insert all elements into the heap and then remove the root repeatedly until the heap is empty. This is implemented in the code of post “heap” and “priority queue”.

However, this implementation consumes additional memory due of the usage of the heap/priority queue. Heapsort can actually be implemented using the input data array as the memory for building the heap. Therefore, heapsort is a in-place sorting algorithm.
The in-place version has two phases. Firstly we need to build a heap. This is done by dividing the input array as two parts, the first part contains the heapified data, and the second part contains other inputs. We loop through the input array, adds the current element to the heapified part at every iteration.

The second phase is to retrieve the element from the heap. Every time we remove the root element from the heap, the heap size is reduced by 1, so the space occupied by the last element before removal will be available for storing the removed element. (Essentially, we simply swap the first element and last element of the fully-heapifed array). We do this repeatedly until the heap is empty.

If we’re using a min binary heap, the input will be sorted into descending order as we’re repeatedly putting the smallest element from the heap to the end of the heapified part. If we want to sort the array into ascending order, we simply build a max binary heap. This is easy in code as we only need to change the value comparison from “<” to “>” or “>” to “<”.

C Implementation

Below we show an implementation of heapsort to sort a randomly generated array of integers to ascending order. The heap insert and remove functions are similar to the code in the post “heap”, except that the signs are changed to build a max binary heap instead of min binary heap.

`#include <stdio.h>`

`#include <stdlib.h>`

` `

`typedef struct heapData {`

`    //dummy`

`} heapData;`

` `

`typedef struct heapNode {`

`    int value;`

`    heapData data;               //dummy`

`} heapNode;`

` `

`void insert(heapNode aNode, heapNode* heap, int size) {`

`    int idx;`

`    heapNode tmp;`

`    idx = size + 1;`

`    heap[idx] = aNode;`

`    while (heap[idx].value > heap[idx/2].value && idx > 1) {         //change "<" to ">"`

`    tmp = heap[idx];`

`    heap[idx] = heap[idx/2];`

`    heap[idx/2] = tmp;`

`    idx /= 2;`

`    }`

`}`

` `

`void shiftdown(heapNode* heap, int size, int idx) {`

`    int cidx;        //index for child`

`    heapNode tmp;`

`    for (;;) {`

`        cidx = idx*2;`

`        if (cidx > size) {`

`            break;   //it has no child`

`        }`

`        if (cidx < size) {`

`            if (heap[cidx].value < heap[cidx+1].value) {//change ">" to "<"`

`                ++cidx;`

`            }`

`        }`

`        //swap if necessary`

`        if (heap[cidx].value > heap[idx].value) {    //change "<" to ">"`

`            tmp = heap[cidx];`

`            heap[cidx] = heap[idx];`

`            heap[idx] = tmp;`

`            idx = cidx;`

`        } else {`

`            break;`

`        }`

`    }`

`}`

` `

`heapNode removeMax(heapNode* heap, int size) {`

`    int cidx;`

`    heapNode rv = heap[1];`

`    //printf("%d:%d:%dn", size, heap[1].value, heap[size].value);`

`    heap[1] = heap[size];`

`    --size;`

`    shiftdown(heap, size, 1);`

`    return rv;`

`}`

` `

`int main(int argc, char **argv) {`

`    int n; `

`    int i;`

`    heapNode *heap;`

`    n = atoi(argv[1]);`

`    heap = (heapNode*)malloc(sizeof(heapNode)*(n+1));`

`    srand(time(NULL));`

`    for (i = 1; i <= n; ++i) {`

`        heap[i].value = rand()%10000;`

`    }`

`    printf("input:n");`

`    for (i = 1; i <= n; ++i) {`

`        printf("%d ", heap[i].value);`

`    }`

`    printf("nheapify:n");`

`    for (i = 2; i <=n ;++i) {`

`        insert(heap[i], heap, i-1);`

`    } `

`    for (i = 1; i <= n; ++i) {`

`        printf("%d ", heap[i].value);`

`    }`

`    printf("nsort the heapified array:n");`

`    for (i = n; i > 0; --i) {`

`        heap[i] = removeMax(heap, i);        //use the end of the heapified part to store the removed element`

`    }`

`    for (i = 1; i <= n; ++i) {`

`        printf("%d ", heap[i].value);`

`    }`

`    printf("n");`

`}`

Sample Execution

Compile the code using the command,

gcc -o hs heapsort.c

Below is a screenshot of the sample execution,

Figure 1. Heapsort Sample Run

Run Time

Heapsort has worst-case nlogn run time, which is better than quicksort’s n^2 worst case run time. However, quicksort performs better than heapsort in practice at most cases.

## Priority Queue and an Implementation using Heap in C

You may want to read the post “heap” first.

Priority Queue

A priority queue is a data structure that supports at least the following two operations,

1. insert: insert an item with priority

2. pull_min_element/pull_max_element: depends on your priority queue type (min priority-queue or max-priority-queue), the operation find the item with min/max priority, return the item and remove it from the queue.

A priority queue is usually implemented using a heap, but it doesn’t have to be.
A priority queue can be implemented using a simple linked list/array. Every time an insert operation occurs, we simply append it to the end of the list/array and when a pull request is issued, we iterate through the list/array and find the element to return and remove. Alternatively, we can keep the linked list/array sorted and simply remove the first item every time.

A priority queue can also be implemented using a self-balancing binary search tree (e.g. red-black tree). In this case, both insert and pull take O(logn) worst case running time.
But the most commonly used data structure for priority queue is heap. The simplest heap is binary heap. For the convenient of discussion, the description below assumes min binary heap.

When binary heap is used to implement a priority queue, both insert and pull can be done in O(logN) worst case time. Insert has an average of constant time. To build the heap from already exist N items, it takes O(N) worst case time.

C Implementation using Heap

The implementation of priority queue is just a simple wrapper of the heap implementation in the post “heap”.

`#include <stdio.h>`

`#include <stdlib.h>`

` `

`typedef struct heapData {`

`    //dummy`

`} heapData;`

` `

`typedef struct heapNode {`

`    int value;`

`    heapData data;               //dummy`

`} heapNode;`

` `

`typedef struct PQ {`

`    heapNode* heap;`

`    int size;`

`} PQ;`

` `

`void insert(heapNode aNode, heapNode* heap, int size) {`

`    int idx;`

`    heapNode tmp;`

`    idx = size + 1;`

`    heap[idx] = aNode;`

`    while (heap[idx].value < heap[idx/2].value && idx > 1) {`

`    tmp = heap[idx];`

`    heap[idx] = heap[idx/2];`

`    heap[idx/2] = tmp;`

`    idx /= 2;`

`    }`

`}`

` `

`void shiftdown(heapNode* heap, int size, int idx) {`

`    int cidx;        //index for child`

`    heapNode tmp;`

`    for (;;) {`

`        cidx = idx*2;`

`        if (cidx > size) {`

`            break;   //it has no child`

`        }`

`        if (cidx < size) {`

`            if (heap[cidx].value > heap[cidx+1].value) {`

`                ++cidx;`

`            }`

`        }`

`        //swap if necessary`

`        if (heap[cidx].value < heap[idx].value) {`

`            tmp = heap[cidx];`

`            heap[cidx] = heap[idx];`

`            heap[idx] = tmp;`

`            idx = cidx;`

`        } else {`

`            break;`

`        }`

`    }`

`}`

` `

`heapNode removeMin(heapNode* heap, int size) {`

`    int cidx;`

`    heapNode rv = heap[1];`

`    //printf("%d:%d:%dn", size, heap[1].value, heap[size].value);`

`    heap[1] = heap[size];`

`    --size;`

`    shiftdown(heap, size, 1);`

`    return rv;`

`}`

`void enqueue(heapNode node, PQ *q) {`

`    insert(node, q->heap, q->size);`

`    ++q->size;`

`}`

` `

`heapNode dequeue(PQ *q) {`

`   heapNode rv = removeMin(q->heap, q->size);`

`   --q->size;`

`   return rv; `

`}`

` `

`void initQueue(PQ *q, int n) {`

`   q->size = 0;`

`   q->heap = (heapNode*)malloc(sizeof(heapNode)*(n+1));`

`}`

` `

`int main(int argc, char **argv) {`

`    int n; `

`    int i;`

`    PQ q;`

`    heapNode hn;`

`    n = atoi(argv[1]);`

`    initQueue(&q, n);`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        hn.value = rand()%10000;`

`        printf("enqueue node with value: %dn", hn.value);`

`        enqueue(hn, &q);`

`    }`

`    printf("ndequeue all values:n");`

`    for (i = 0; i < n; ++i) {`

`        hn = dequeue(&q);`

`        printf("dequeued node with value: %d, queue size after removal: %dn", hn.value, q.size);`

`    }`

`}`

Compile the code using,

gcc -o pq pq.c

A sample run would be as below,

References:
1. Wikipedia: http://en.wikipedia.org/wiki/Binary_heap
2. Programming Perls. 2nd Edition

## Heap

Heap and Its Properties
Heap is a data structure frequently used to implement heap sort and priority queue. It has structure property and heap-order property. The simplest heap is binary heap. For the convenient of discussion, the description below assumes min binary heap.

Structure property: A heap is a complete binary tree. It means all except the bottom level of the heap is completely filled, and the bottom level is filled from left to right.

Heap-order property: For every node N in a heap, the value of its parent is smaller than or equal to the value of N, except for the root node, which has no parent. This means the smallest element is the root.

Insertion and Removal

Both insertion and root removal at a heap can be done in O(logN) worst case time. Insert has an average of constant time. To build the heap from already exist N items, it takes O(N) worst case time.

Insertion: to insert an element into a heap, first put the element at the first empty slot at the bottom of the heap binary tree. Then we check if the element is smaller than its parent. If no, the element insertion is done. If yes, we swap the element and its parent and the check is done again. The process repeats until the element is greater than or equal to its parent or it has reached the root.

Removal of Root: to remove root from a heap, we use the last element M at the bottom level to replace the element to be removed. After that, we check if M is smaller than all its children. If yes, the removal is done; otherwise, we select the smallest element from its children and swap M with that element. The process is done repeatedly until M is smaller than or equal to all its children or M has no child.

Build a Heap: to build a heap for existing N elements, one can insert N elements into heap, this takes NlogN time. There is better approach than this. We can initialize the heap without considering the order property. Then try to build the order property from bottom up to the root of the element. It’s proven that this consumes only O(n) time.

Implementation

Heap is often implemented using array. If we put the root at index 1, then the following relationship holds,

left child index = parent index * 2
right child index = parent index * 2 + 1
parent index = child index / 2

This makes traversal through the heap easy.

Below is a sample implementation of heap in C,

`#include <stdio.h>`

`#include <stdlib.h>`

` `

`typedef struct heapData {`

`    //dummy`

`} heapData;`

` `

`typedef struct heapNode {`

`    int value;`

`    heapData data;               //dummy`

`} heapNode;`

` `

`void insert(heapNode aNode, heapNode* heap, int size) {`

`    int idx;`

`    heapNode tmp;`

`    idx = size + 1;`

`    heap[idx] = aNode;`

`    while (heap[idx].value < heap[idx/2].value && idx > 1) {`

`    tmp = heap[idx];`

`    heap[idx] = heap[idx/2];`

`    heap[idx/2] = tmp;`

`    idx /= 2;`

`    }`

`}`

` `

`void shiftdown(heapNode* heap, int size, int idx) {`

`    int cidx;        //index for child`

`    heapNode tmp;`

`    for (;;) {`

`        cidx = idx*2;`

`        if (cidx > size) {`

`            break;   //it has no child`

`        }`

`        if (cidx < size) {`

`            if (heap[cidx].value > heap[cidx+1].value) {`

`                ++cidx;`

`            }`

`        }`

`        //swap if necessary`

`        if (heap[cidx].value < heap[idx].value) {`

`            tmp = heap[cidx];`

`            heap[cidx] = heap[idx];`

`            heap[idx] = tmp;`

`            idx = cidx;`

`        } else {`

`            break;`

`        }`

`    }`

`}`

` `

`heapNode removeMin(heapNode* heap, int size) {`

`    int cidx;`

`    heapNode rv = heap[1];`

`    //printf("%d:%d:%dn", size, heap[1].value, heap[size].value);`

`    heap[1] = heap[size];`

`    --size;`

`    shiftdown(heap, size, 1);`

`    return rv;`

`}`

` `

`void buildHeap(heapNode* heap, int size) {`

`    int i;`

`    //apply shiftdown to all elements that has children, `

`    //it can be proven that it runs on O(n) worst-cast time`

`    for (i = size/2; i > 0; --i) { `

`        shiftdown(heap, size, i);`

`    }`

`}`

` `

`int main(int argc, char **argv) {`

`    int n; `

`    int i;`

`    heapNode *heap, *heap2;`

`    heapNode hn;`

`    n = atoi(argv[1]);`

`    //insert the element one by one to heap`

`    //store elements in heap2 without keeping the order-property`

`    //then buildHeap`

`    heap = (heapNode*)malloc(sizeof(heapNode)*(n+1));`

`    heap2 = (heapNode*)malloc(sizeof(heapNode)*(n+1));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        hn.value = rand()%10000;`

`        printf("insert node: %dn", hn.value);`

`        insert(hn, heap, i); `

`        heap2[i+1] = hn;`

`    }`

`    buildHeap(heap2, n);`

`    for (i = 1; i <= n; ++i) {`

`        printf("%d:%dn", heap[i].value, heap2[i].value);`

`    }`

`    `

`    printf("remove root values:n");`

`    for (i = n; i > 0; --i) {`

`        hn = removeMin(heap, i);`

`        printf("heap: min: %d, heap size before removal: %dn", hn.value, i);`

`        hn = removeMin(heap2, i);`

`        printf("heap2: min: %d, heap size before removal: %dn", hn.value, i);`

`    }`

`}`

Compile the code with

gcc -o heap heap.c

A sample run would be as below,

Figure 1. Sample Run of Heap Functions

The program builds two heaps. The first one insert the element one by one, the second heap is built with all elements available. As shown in the screenshot, the heap structures are different, but both the structure and order property are maintained in both heaps.

Heap is frequently used to implement priority queue and heapsort.

References:

1. Programming Pearls, 2nd edition.

2. Data Structures and Algorithms Analysis in C++, 3rd edition.

## Quick Sort and a Sample Implementation for C Standard Library qsort Function

Brief Notes of the Algorithm
Quicksort has an average run time of nlogn and worst case run time of n^2. Quicksort is often faster than other O(nlogn) algorithms in practice. And it’s used in implementation of C standard library qsort function.

Quicksort is not a stable sort, which means if two elements are equal, their order is not guaranteed to be preserved.

Quicksort is a divide and conquer algorithm. The basic idea of Quicksort is as follows,

• pick up an element, called a pivot
• reorder the list such that all elements that smaller than pivot is placed before the pivot, and the elements that bigger than pivot is placed after pivot. This operation is called partitioning. After partitioning, the pivot is at its final position.
• recursively sort the sub-list that contains all the smaller elements and the sub-list contains all the bigger elements.
• When a sub-list contains 0 or 1 element, there’s no need to parition.

C Implementation

The C standard library has a function “qsort” to sort an array. The function prototype is as below,

void qsort(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*));

qsort sorts n elements of array pointed by pointer base, where each element is size bytes long, using cmp function to determine the order. It arranges array in ascending order. sss
The function cmp must return negative value if first item is less than second, zero if equal and positive if greater.

Below is the first implementation of qsort,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`int intcmp(const void *a, const void *b) {`

`    return *(const int *)a - *(const int *)b;`

`}`

` `

`void myswap(void *e1, void *e2, int size) {`

`    void* sp = (void*) malloc(size);`

`    memcpy(sp, e1, size);`

`    memcpy(e1, e2, size);`

`    memcpy(e2, sp, size);`

`    free(sp);`

`}`

` `

`void myqsort(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*)) {`

`    int i;`

`    int rv;`

`    void* mp;`

`    void* sp;  //for swap`

` `

`    if (n <= 1) {return;}`

`    //printf("sort %d elementsn", n);`

`    mp = base;`

`    for (i = 1; i < n; ++i) {`

`        rv = cmp(base, base + size*i);`

`        if (rv > 0) {`

`            mp += size;`

`            if (mp != base+size*i) {`

`        //swap element at base+size*i with element at mp`

`        //printf("swap %d and %dn", *(int*)(base+size*i), *(int*)mp);`

`        myswap(base+size*i, mp, size);`

`            }`

`        }`

`    }`

`    //swap the base element with the element at mp`

`    myswap(base, mp, size);`

`    myqsort(base, (mp - base)/size, size, cmp);`

`    myqsort(mp + size, n - 1 - (mp-base)/size, size, cmp);`

`}`

` `

`int main(int argc, char **argv) {`

`    int* testArray;`

`    int i;`

`    int n = atoi(argv[1]);`

`    struct timeval stTime, edTime;`

`    testArray = (int*) malloc(n*sizeof(int));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000; `

`    }`

`    /*printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    gettimeofday(&stTime, NULL);`

`    myqsort(testArray, n, sizeof(int), &intcmp);`

`    gettimeofday(&edTime, NULL);`

`    printf("%d: %u:%un", n, (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));`

` `

`    /*printf("after sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    free(testArray);`

`    return 0;`

`}`

The myqsort function always use the first element of a list/sub-list as the pivot. The pointer mp always points to the end of the sub-array with all elements smaller than pivot. If an element A is smaller than the pivot, then the pointer mp is incremented to pointing to a element B bigger than pivot, and A is swapped with B. After we scan through the entire array, we swap the pivot with the element pointed by mp.

The implementation has average run time of nlogn, however it cannot handle n equal elements efficiently (n^2). An alternative implementation is as below,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`int intcmp(const void *a, const void *b) {`

`    return *(const int *)a - *(const int *)b;`

`}`

` `

`void myswap(void *e1, void *e2, int size) {`

`    void* sp = (void*) malloc(size);`

`    memcpy(sp, e1, size);`

`    memcpy(e1, e2, size);`

`    memcpy(e2, sp, size);`

`    free(sp);`

`}`

` `

`void myqsort2(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*)) {`

`    int i, j;`

` `

`    if (n <= 1) {return;}`

`   // printf("sort %d elementsn", n);`

`    i = 0; j = n;`

`    while (1) {`

`        do {`

`            ++i;`

`        } while(cmp(base + size*i, base) < 0 && i < n);`

`        do {`

`            --j;`

`        } while(cmp(base + size*j, base) > 0);   //no need to check j >= 0 because it will fail test when it reaches base`

`        if (i > j) break;`

`        myswap(base + size*i, base + size*j, size);`

`        //printf("%d:%dn", *(int*)(base+size*i), *(int*)(base+size*j));`

`    }`

`    //swap the base element with the j element`

`    if (j != 0) {`

`        myswap(base, base+size*j, size);`

`    }`

`    if (j > 0) {`

`        myqsort2(base, j, size, cmp);`

`    }`

`    myqsort2(base + size*(j+1), n - 1 - j, size, cmp);`

`}`

` `

`int main(int argc, char **argv) {`

`    int* testArray;`

`    int i;`

`    int n = atoi(argv[1]);`

`    struct timeval stTime, edTime;`

`    testArray = (int*) malloc(n*sizeof(int));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000; `

`    }`

`    /*printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    gettimeofday(&stTime, NULL);`

`    myqsort2(testArray, n, sizeof(int), &intcmp);`

`    gettimeofday(&edTime, NULL);`

`    printf("%d: %u:%un", n, (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));`

` `

`    /*printf("after sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    free(testArray);`

`    return 0;`

`}`

The function myqsort2 uses two-sided partitioning code. The indices i and j are initialized to pointing to first element and last element of the array, and the first inner do-while loop goes from first element to the end of the array to check if the element it goes through are smaller than pivot, while the second inner do-while loop goes from the end to the beginning of the array to check if the element it goes through are bigger than pivot.

Every time the first inner loop encounters an element this is bigger than or equal to the pivot, it stops; every time the second inner loop encounters an element that is smaller than or equal to the pivot, it stops. Then we swap the two elements at i and j. When i and j are crossed over, the partition operation is done.

The second implementation can handle the n equal elements, but it still degrade to n^2 when the input n elements are already sorted. An alternative is to use a randomly picked element as pivot. This is implemented as below,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`int intcmp(const void *a, const void *b) {`

`    return *(const int *)a - *(const int *)b;`

`}`

` `

`void myswap(void *e1, void *e2, int size) {`

`    void* sp = (void*) malloc(size);`

`    memcpy(sp, e1, size);`

`    memcpy(e1, e2, size);`

`    memcpy(e2, sp, size);`

`    free(sp);`

`}`

` `

`void myqsort3(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*)) {`

`    int i, j;`

`    int ipos;`

`    if (n <= 1) {return;}`

`    ipos = rand()%n;`

`    myswap(base, base+ipos*size, size);`

`   // printf("sort %d elementsn", n);`

`    i = 0; j = n;`

`    while (1) {`

`        do {`

`            ++i;`

`        } while(cmp(base + size*i, base) < 0 && i < n);`

`        do {`

`            --j;`

`        } while(cmp(base + size*j, base) > 0);   //no need to check j >= 0 because it will fail test when it reaches base`

`        if (i > j) break;`

`        myswap(base + size*i, base + size*j, size);`

`        //printf("%d:%dn", *(int*)(base+size*i), *(int*)(base+size*j));`

`    }`

`    //swap the base element with the j element`

`    if (j != 0) {`

`        myswap(base, base+size*j, size);`

`    }`

`    if (j > 0) {`

`        myqsort3(base, j, size, cmp);`

`    }`

`    myqsort3(base + size*(j+1), n - 1 - j, size, cmp);`

`}`

` `

`int main(int argc, char **argv) {`

`    int* testArray;`

`    int i;`

`    int n = atoi(argv[1]);`

`    struct timeval stTime, edTime;`

`    testArray = (int*) malloc(n*sizeof(int));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000; `

`    }`

`    /*printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    gettimeofday(&stTime, NULL);`

`    myqsort3(testArray, n, sizeof(int), &intcmp);`

`    gettimeofday(&edTime, NULL);`

`    printf("%d: %u:%un", n, (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));`

` `

`    /*printf("after sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    free(testArray);`

`    return 0;`

`}`

At begining of myqsort3, we swap the first element with a randomly picked element, and then use it as the pivot.

Quicksort is efficient when dealing with large arrays. If the array is small, quicksort’s recursive implementation may not perform better than other sorting algorithms like insertion sort. Insertion sort is efficient to sort an array that is almost sorted because the sequential memory access pattern.

To further improve the quick sort above, we first use quick sort to sort the array in a almost-sorted state, and then use insertion sort to sort the almost-sorted array. This is implemented as below,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`unsigned int cutoff = 1;`

` `

`int intcmp(const void *a, const void *b) {`

`    return *(const int *)a - *(const int *)b;`

`}`

` `

`void myswap(void *e1, void *e2, int size) {`

`    void* sp = (void*) malloc(size);`

`    memcpy(sp, e1, size);`

`    memcpy(e1, e2, size);`

`    memcpy(e2, sp, size);`

`    free(sp);`

`}`

` `

`void insersionSort(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*)) {`

`   int i, j;`

`   void *tmp = (void*)malloc(size);`

`   for (i = 1; i < n; ++i) {`

`       memcpy(tmp, base + i*size, size);`

`       for (j = i-1; j >= 0 && cmp(tmp, base+j*size) < 0; --j) {`

`           memcpy(base+(j+1)*size, base+j*size, size);`

`       }`

`       memcpy(base+(j+1)*size, tmp, size);`

`   } `

`}`

` `

`void myqsort4(void* base, size_t n, size_t size, int (*cmp)(const void*, const void*)) {`

`    int i, j;`

`    int ipos;`

`    if (n <= cutoff) {return;}`

`    ipos = rand()%n;`

`    myswap(base, base+ipos*size, size);`

`   // printf("sort %d elementsn", n);`

`    i = 0; j = n;`

`    while (1) {`

`        do {`

`            ++i;`

`        } while(cmp(base + size*i, base) < 0 && i < n);`

`        do {`

`            --j;`

`        } while(cmp(base + size*j, base) > 0);   //no need to check j >= 0 because it will fail test when it reaches base`

`        if (i > j) break;`

`        myswap(base + size*i, base + size*j, size);`

`        //printf("%d:%dn", *(int*)(base+size*i), *(int*)(base+size*j));`

`    }`

`    //swap the base element with the j element`

`    if (j != 0) {`

`        myswap(base, base+size*j, size);`

`    }`

`    if (j > 0) {`

`        myqsort4(base, j, size, cmp);`

`    }`

`    myqsort4(base + size*(j+1), n - 1 - j, size, cmp);`

`}`

` `

`int main(int argc, char **argv) {`

`    int* testArray;`

`    int i;`

`    int n = atoi(argv[1]);`

`    cutoff = atoi(argv[2]);`

`    struct timeval stTime, edTime;`

`    testArray = (int*) malloc(n*sizeof(int));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000; `

`    }`

`    /*printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    gettimeofday(&stTime, NULL);`

`    myqsort4(testArray, n, sizeof(int), &intcmp);`

` `

`    /*printf("after sort 1:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    insersionSort(testArray, n, sizeof(int), &intcmp);`

`    gettimeofday(&edTime, NULL);`

`    printf("%d: %u:%un", n, (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));`

` `

`    /*printf("after sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    free(testArray);`

`    return 0;`

`}`

The program accepts an additional input parameter “cutoff” to stop the qsort when the sub-array is short.

To compare the performance of our implementations with the C standard library implementation, an implementation using C standard library qsort function is as below,

`#include <stdio.h>`

`#include <stdlib.h>`

`#include <string.h>`

` `

`int intcmp(const void *a, const void *b) {`

`    return *(const int *)a - *(const int *)b;`

`}`

` `

`int main(int argc, char **argv) {`

`    int* testArray;`

`    int i;`

`    int n = atoi(argv[1]);`

`    struct timeval stTime, edTime;`

`    testArray = (int*) malloc(n*sizeof(int));`

`    srand(time(NULL));`

`    for (i = 0; i < n; ++i) {`

`        testArray[i] = rand()%1000; `

`    }`

`    /*printf("before sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    gettimeofday(&stTime, NULL);`

`    qsort(testArray, n, sizeof(int), &intcmp);`

`    gettimeofday(&edTime, NULL);`

`    printf("%d: %u:%un", n, (unsigned int)(edTime.tv_sec - stTime.tv_sec), (unsigned int)(edTime.tv_usec - stTime.tv_usec));`

` `

`    /*printf("after sort:n");`

`    for (i = 0; i < n; ++i) {`

`        printf("%d ", testArray[i]);`

`    }`

`    printf("n");*/`

` `

`    free(testArray);`

`    return 0;`

`}`

Sample Execution

Sample execution of the above programs are as below (suppose the five programs are compiled as qs1, qs2, qs3, qs4, and qs5 respectively),

The C standard library qsort is highly optimized and outperforms the implementations above.

References:

Programming Perls, second edition

## Java Programming Cheat Sheet 3–Basic I/O

This is the third part of the Java programming cheat sheet. You may also interested in the first part OO concepts and second part Syntax.

I/O from Command Line: Java provides two ways, through the standard streams and through the Console

• standard streams: Java supports three standard streams: standard input (System.in), standard output (System.out) and Standard Error (System.err). The standard streams are byte stream.
`BufferedReader Reader cin`

` = new BufferedReader(new InputStreamReader(System.in));`

• Console c = System.console();
• c.printf();

Byte Streams: for I/O of 8-bit bytes. All byte streams are descended from InputStream and OutputStream. All other stream types are built on byte streams
Character Streams: Java stores character values using Unicode. Character stream I/O automatically translates the internal format to and from the local character set. e.g. Unicode ⇔ ASCII.

• all character stream classes are descended from Reader and Writer.
• there’re two general-purpose byte-to-character bridge streams: InputStreamReader and OutputStreamWriter. One can use them to create character streams when there’re no prepackaged character stream classes that meet our needs.

Buffered Streams: buffered streams are designed to reduce I/O overhead by maintain a I/O buffer at memory, and the I/O is committed when necessary. There’re four buffered stream classes.

• BufferedInputStream and BufferedOutputStream: creates buffered byte streams
• BufferedReader and BufferedWriter: creates buffered character streams

Formatting Streams: there’re two classes with formatting abilities, PrintWriter and PrintStream

• PrintStream: a byte stream class. System.out and System.err are instances of this class.
• PrintWriter: a character stream class.
• Both PrintStream and PrintWriter implement same set of methods for converting internal data into formatted output.
• print and println format individual values in a standard way
• format formats almost any number of values based on a format string, with many options for precise formatting, just like printf

References:

1. Oracle Java Tutorial: http://docs.oracle.com/javase/tutorial/
2. javapapers blog: http://javapapers.com/oops/association-aggregation-composition-abstraction-generalization-realization-dependency/

## Java Programming Cheat Sheet 2–Syntax

This is the second part of java programming cheat sheet. You may also interested in the first part OO Concepts and third part Basic I/O.

Primitive Data Types: Java compiler assign a default value for class fields, but not local variables.

 Data Type Sign Default Value (for fields) byte 8-bit signed 0 short 16-bit signed 0 int 32-bit signed 0 long 64-bit signed 0L float 32-bit signle-precision signed 0.0f double 64-bit double-precision signed 0.0d char 16-bit unsigned ‘u0000’ (or 0) boolean true or false false

Array

• declaration: int myArray[];
• allocation: myArray = new int[5];
• declaration and allocation: int myArray[] = new int[5];
• initialization: myArray = {1, 2, 3, 4, 5}
• declaration, allocation and initialization: int myArray[] = {1, 2, 3, 4, 5}
• access: myArray[0] = 1; Java array starts at index 0.

Operators:

• bitwise operators
• ~: invert a bit pattern
• &: bitwise AND
• ^: bitwise exclusive OR
• |: bitwise inclusive OR
• >>: signed right shift
• <<: signed shift left
• >>>: unsigned right shift, which shifts a zero into the leftmost position
• assignment, arithmetic and unary
• assignment: =
• arithmetic: +, -, *, /, %
• unary: + (indicates positive), -(negates and expression), ++, –, ! (inverts the value of a boolean)
• equality, relational and conditional
• equality and relational operators: ==, !=, >, >=, <, <=
• conditional: &&, ||, ?:
• type comparison: instanceof

Control Flow

• if-else
`if(condition){`

`    statement;`

`} else {`

`    statement;`

`}`

• switch
`switch (expression) {`

`    case <option 1>:`

`        <statement>;`

`        break;`

`    case <option 2>:`

`        <statement>;`

`        break;`

`    default:`

`        <statement>;`

`}`

• while
`while (expression) {`

`     statement;`

`}`

• do-while
`do {`

`    statement;`

`} while (expression);`

• for
`for (initialization; termination; increment) {`

`    statements;`

`}`