How to Calculate IP/TCP/UDP Checksum–Part 3 Usage Example and Validation

This is a follow up of the previous post IP/TCP/UDP Checksum Calculation part 1 theory, and part 2 implementation.

This post gives an example using libnetfiler_queue library and the checksum code we implemented in part 2 to illustrate how to use the checksum code and verify that our code is actually computing correctly.

Note this post is not a post for libnetfilter_queue library. So the usage of this library is not covered here. I’ve written separate posts for libnetfilter_queue.

The code (let’s call it test.c) is as below,

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <linux/types.h>
#include <linux/netfilter.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <libnetfilter_queue/libnetfilter_queue.h>

#include "checksum.h"

static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg, struct nfq_data *nfa, void *data) {
    printf("entering callbackn");
    struct nfqnl_msg_packet_hdr *ph;
    int payload_len;
    unsigned char *payloadData;
    struct iphdr *ipHeader;
    struct tcphdr *tcpHeader;
    struct udphdr *udpHeader;
    unsigned short ipCheck, udpCheck, tcpCheck;
    ph = nfq_get_msg_packet_hdr(nfa);
    u_int32_t id = ntohl(ph->packet_id);
    payload_len = nfq_get_payload(nfa, &payloadData);
    printf("ip datagram len = %dn", payload_len);
    ipHeader = (struct iphdr *)payloadData;
    ipCheck = ipHeader->check;
    printf("ip checksum: %04xn", ipHeader->check);
    //calculate ip checksum, and see if the calculation matches
    printf("calculated ip checksum: %04xn", ipHeader->check);
    if (ipCheck != ipHeader->check) {
    printf("-------------ip checksum calculation is wrong-----------n");
    if (ipHeader->protocol == IPPROTO_TCP) {
        tcpHeader = (struct tcphdr *)(payloadData + (ipHeader->ihl<<2));
    tcpCheck = tcpHeader->check;
    printf("tcp checksum: %04xn", tcpHeader->check);
    //calculate tcp checksum, and see if the calculation matches with original tcp checksum
        compute_tcp_checksum(ipHeader, (unsigned short*)tcpHeader);
    printf("calculated tcp checksum: %04xn", tcpHeader->check);
    if (tcpHeader->check != tcpCheck) {
        printf("-----------calculation is wrong-------n");
    } else if (ipHeader->protocol == IPPROTO_UDP) {
    udpHeader = (struct udphdr *)(payloadData + (ipHeader->ihl<<2));
    udpCheck = udpHeader->check;
    printf("udp checksum: %04xn", udpHeader->check);
    //calculate udp checksum, and see if the calculation matches with original udp checksum
    compute_udp_checksum(ipHeader, (unsigned short*)udpHeader);
    printf("calculated udp checksum: %04xn", udpHeader->check);
        if (udpHeader->check != udpCheck) {
        printf("-----------calculation is wrong-------n");
    //issue a verdict on a packet
    //qh: netfilter queue handle; id: ID assigned to packet by netfilter; verdict: verdict to return to netfilter, data_len: number
    //of bytes of data pointed by buf, buf: the buffer that contains the packet data (payload)
// return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
    return nfq_set_verdict(qh, id, NF_ACCEPT, payload_len, payloadData);

int main(int argc, char **argv) {
    struct nfq_handle *h;
    struct nfq_q_handle *qh;
    //struct nfnl_handle *nh;
    int fd;
    int rv;
    char buf[4096] __attribute__((aligned));

    /*netfilter_queue library set up step 1: call nfq_open to open a NFQUEUE handler*/
    printf("opening library handlen");
    h = nfq_open();
    if (!h) {
        fprintf(stderr, "error during nfq_open()n");
    /*set up step 2: tell the kernel that userspace queuing is handled by NFQUEUE for the selected protocol. This is 
 made by calling nfq_unbind_pf and nfq_bind_pf with protocol info. The idea behind this is to enable simulataneously loaded modules to be used for queuing*/
    printf("unbinding existing nf_queue handler for AF_INET (if any)n");
    if (nfq_unbind_pf(h, AF_INET) < 0) {
        fprintf(stderr, "error during nfq_unbind_pf()n");
    printf("binding nfnetlink_queue as nf_queue handler for AF_INETn");
    if (nfq_bind_pf(h, AF_INET) < 0) {
        fprintf(stderr, "error during nfq_bind_pf()n");
    /*after the above two steps, we can set up and use a queue*/
    /*bind the program to a specific queue, cb is callback function to call for each queued packet
 *h: netfilter queue handle, 0: the number of the queue to bind to, cb: callback function for each queued packet, data: custom data to pass to callback function*/
    printf("binding this socket to queue '0'n");
    qh = nfq_create_queue(h, 0, &cb, NULL);
    if (!qh) {
        fprintf(stderr, "error during nfq_create_queue()n");
    /*set mode: NFQNL_COPY_PACKET: copy entire packet, it defines the part of data that nfqueue copies to userspace
 0xffff: size of the packet that we want to get*/
    printf("setting copy_packet moden");
    if (nfq_set_mode(qh, NFQNL_COPY_PACKET, 0xffff) < 0) {
        fprintf(stderr, "cannot set packet_copy moden");
    /*handle the incoming packets*/
    fd = nfq_fd(h);
    while ((rv = recv(fd, buf, sizeof(buf), 0)) && rv >= 0) {
    printf("pkt receivedn");
    nfq_handle_packet(h, buf, rv);
    printf("unbinding from queue 0n");
    /*the program has finished with libnetfilter_queue, it can call nfq_close to free all associated resources*/
    printf("closing library handlen");

The code basically use libnetfilter_queue to get the IP datagram of TCP/UDP packets, and retrieve their IP, TCP/UDP checksum. It then use the checksum code we implemented in part 2 to recalculate the checksum. If the calculated the checksum matches with the checksum we retrieved from the original packet, then we’re sure the computation is done correctly.

Compile and Run the Code

To compile the code, you can run the command below,

gcc -Wall -o testsum checksum.c test.c -lnfnetlink -lnetfilter_queue

Note that you’ll need libnfnetlink and libnetfilter_queue to compile the code. Please google for more info.

To run the code, save the following command to, and then use sudo ./,

sudo iptables -t mangle –flush
sudo iptables -t mangle -A OUTPUT -p tcp  -j NFQUEUE –queue-num 0
sudo iptables -t mangle -A OUTPUT -p udp -j NFQUEUE –queue-num 0
sudo ./testsum

Note that after you finish running the code and killed the program, you won’t be able to access Internet, you’ll need to run the following command to bring your Internet back,

sudo iptables -t mangle –flush

Note that the above procedure basically set up Linux iptable rules to queue TCP and UDP packets, so the test.c program code can receive them at userspace.


You can download the source code and scripts from here.

How to Calculate IP/TCP/UDP Checksum–Part 2 Implementation

This is a follow up of the previous post, how to calculate IP/TCP/UDP checksum part 1 — theory.

IP Header Checksum Calculation Implementation
To calculate the IP checksum, one can use the code below,

/* set ip checksum of a given ip header*/

void compute_ip_checksum(struct iphdr* iphdrp){

  iphdrp->check = 0;

  iphdrp->check = compute_checksum((unsigned short*)iphdrp, iphdrp->ihl<<2);


/* Compute checksum for count bytes starting at addr, using one's complement of one's complement sum*/

static unsigned short compute_checksum(unsigned short *addr, unsigned int count) {

  register unsigned long sum = 0;

  while (count > 1) {

    sum += * addr++;

    count -= 2;


  //if any bytes left, pad the bytes and add

  if(count > 0) {

    sum += ((*addr)&htons(0xFF00));


  //Fold sum to 16 bits: add carrier to result

  while (sum>>16) {

      sum = (sum & 0xffff) + (sum >> 16);


  //one's complement

  sum = ~sum;

  return ((unsigned short)sum);


The method compute_ip_checksum initialize the checksum field of IP header to zeros. Then calls a method compute_checksum. The mothod compute_checksum accepts the computation data and computation length as two input parameters. It sum up all 16-bit words, if there’s odd number of bytes, it adds a padding byte. After summing up all words, it folds the sum to 16 bits by adding the carrier to the results. At last, it takes the one’s complement of sum and cast it to 16-bit unsigned short type. You can refer to part 1 for more detailed description of the algorithm.

Note that the data structure iphdr and tcphdr and udphdr are Linux data structures representing IP header, TCP header and UDP header respectively. You may want to Google for more information in order to understand the code.

TCP Header Checksum Calculation Implementation
To calculate the TCP checksum, you can use the code below,

/* set tcp checksum: given IP header and tcp segment */

void compute_tcp_checksum(struct iphdr *pIph, unsigned short *ipPayload) {

    register unsigned long sum = 0;

    unsigned short tcpLen = ntohs(pIph->tot_len) - (pIph->ihl<<2);

    struct tcphdr *tcphdrp = (struct tcphdr*)(ipPayload);

    //add the pseudo header 

    //the source ip

    sum += (pIph->saddr>>16)&0xFFFF;

    sum += (pIph->saddr)&0xFFFF;

    //the dest ip

    sum += (pIph->daddr>>16)&0xFFFF;

    sum += (pIph->daddr)&0xFFFF;

    //protocol and reserved: 6

    sum += htons(IPPROTO_TCP);

    //the length

    sum += htons(tcpLen);


    //add the IP payload

    //initialize checksum to 0

    tcphdrp->check = 0;

    while (tcpLen > 1) {

        sum += * ipPayload++;

        tcpLen -= 2;


    //if any bytes left, pad the bytes and add

    if(tcpLen > 0) {

        //printf("+++++++++++padding, %dn", tcpLen);

        sum += ((*ipPayload)&htons(0xFF00));


      //Fold 32-bit sum to 16 bits: add carrier to result

      while (sum>>16) {

          sum = (sum & 0xffff) + (sum >> 16);


      sum = ~sum;

    //set computation result

    tcphdrp->check = (unsigned short)sum;


The mothod sums the pseudo TCP header first, then the IP payload, which is the TCP segment. It also pads the last byte if there’re odd number of bytes. For detailed description of the algorithm, please refer to comments in the code and part 1.

UDP Header Checksum Calculation Implementation
To calculate the UDP checksum, one can follow the code below,

/* set tcp checksum: given IP header and UDP datagram */

void compute_udp_checksum(struct iphdr *pIph, unsigned short *ipPayload) {

    register unsigned long sum = 0;

    struct udphdr *udphdrp = (struct udphdr*)(ipPayload);

    unsigned short udpLen = htons(udphdrp->len);

    //printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~udp len=%dn", udpLen);

    //add the pseudo header 

    //printf("add pseudo headern");

    //the source ip

    sum += (pIph->saddr>>16)&0xFFFF;

    sum += (pIph->saddr)&0xFFFF;

    //the dest ip

    sum += (pIph->daddr>>16)&0xFFFF;

    sum += (pIph->daddr)&0xFFFF;

    //protocol and reserved: 17

    sum += htons(IPPROTO_UDP);

    //the length

    sum += udphdrp->len;


    //add the IP payload

    //printf("add ip payloadn");

    //initialize checksum to 0

    udphdrp->check = 0;

    while (udpLen > 1) {

        sum += * ipPayload++;

        udpLen -= 2;


    //if any bytes left, pad the bytes and add

    if(udpLen > 0) {

        //printf("+++++++++++++++padding: %dn", udpLen);

        sum += ((*ipPayload)&htons(0xFF00));


      //Fold sum to 16 bits: add carrier to result

    //printf("add carriern");

      while (sum>>16) {

          sum = (sum & 0xffff) + (sum >> 16);


    //printf("one's complementn");

      sum = ~sum;

    //set computation result

    udphdrp->check = ((unsigned short)sum == 0x0000)?0xFFFF:(unsigned short)sum;

The code is similar to TCP checksum computation. Except that when the checksum is compted as all 0s, we set them to 1s. As 0x0000 is already reserved for indicating that the checksum is not computed. Also please refer to part 1 for detailed description of the algorithm.

All the code can be downloaded at part 3.

How to Calculate IP/TCP/UDP Checksum–Part 1 Theory

This is the first part of How to Calculate IP/TCP/UDP Checksum. The 2 parts followed are Part 2 Implementation and Part 3 Usage Example and Validation.

This part focuses on the algorithms. We’ll go through IP/TCP/UDP one by one.

IP Header Checksum Calculation

IP checksum is a 16-bit field in IP header used for error detection for IP header. It equals to the one’s complement of the one’s complement sum of all 16 bit words in the IP header. The checksum field is initialized to all zeros at computation.

One’s complement sum is calculated by summing all numbers and adding the carry (or carries) to the result. And one’s complement is defined by inverting all 0s and 1s in the number’s bit representation.

For example, if an IP header is 0x4500003044224000800600008c7c19acae241e2b.
We start by calculating the one’s complement sum. First, divide the header hex into 16 bits each and sum them up,

4500 + 0030 + 4422 + 4000 + 8006 + 0000 + 8c7c + 19ac + ae24 + 1e2b = 2BBCF

Next fold the result into 16 bits by adding the carry to the result,

2 +  BBCF  = BBD1

The final step is to compute the one’s complement of the one’s complement’s sum,

BBD1 = 1011101111010001

IP checksum = one’s complement(1011101111010001) = 0100010000101110 = 442E

Note that IP header needs to be parsed at each hop, because IP addresses are needed to route the packet. To detect the errors at IP header, the checksum is validated at every hop.

The validation is done using the same algorithm. But this time the initialized checksum value is 442E.

2BBCF + 442E = 2FFFD, then 2 + FFFD = FFFF

Take the one’s complement of FFFF = 0.

At validation, the checksum computation should evaluate to 0 if the IP header is correct.

TCP Checksum Calculation

TCP Checksum is a 16-bit field in TCP header used for error detection. It is computed over the TCP segment (might plus some padding) and a 12-byte TCP pseudo header created on the fly. Same as IP checksum, TCP checksum is also one’s complement of the one’s complement sum of all 16 bit words in the computation data.

Below is a figure that illustrates the data used to calculate TCP checksum,

Figure 1. TCP Checksum Computation Data

As shown in the figure, the pseudo header consists of 5 fields,

  • source address: 32 bits/4 bytes, taken from IP header
  • destination address: 32bits/4 bytes, taken from IP header
  • resevered: 8 bits/1 byte, all zeros
  • protocol: 8 bits/1 byte, taken from IP header. In case of TCP, this should always be 6, which is the assigned protocol number for TCP.
  • TCP Length: The length of the TCP segment, including TCP header and TCP data. Note that this field is not available in TCP header, therefore is computed on the fly.

Note that TCP pseudo header does not really exist, and it’s not transmitted over the network. It’s constructed on the fly to compute the checksum.

If a TCP segment contains an odd number of octets to be checksummed, the last octect is padded on the right with zeros to form a 16-bit word. But the padding is not part of the TCP segment and therefore not transmitted.

Also note the checksum field of the TCP header needs to be initialized to zeros before checksum calculation. And it’s set to the computed value after the computation.

When TCP packet is received at the destination, the receiving TCP code also performs the TCP calculation and see if there’s a mismatch. If there is, it means there’s error in the packet and it will be discarded. The same validation logic used for IP header checksum validation can be used.

UDP Checksum Calcuation

UDP Checksum calculation is similar to TCP Checksum computation. It’s also a 16-bit field of one’s complement of one’s complement sum of a pseudo UDP header + UDP datagram.
The Pseudo UDP header also consists of 5 fields,

  • source address: 32 bits/4 bytes, taken from IP header
  • destination address: 32 bits/4 bytes, taken from IP header
  • reserved: 8 bits/1 byte, set to all 0s.
  • protocol: 8 bits/1 byte, taken from IP header
  • length: Because UDP header has a length field that indicates the length of the entire datagram, including UDP header and data, the value from UDP header is used. Note that this is different from TCP pseudo header, which is computed on the fly. But they both indicates the header+payload length.

Note that UDP checksum is optional. If it’s not computed, it’s set to all 0s. This could cause issue as sometimes the checksum can be computed as all 0s. To avoid confusion, if the checksum is computed as all 0s, it’s set to all 1s (which is equivalent in one’s complement arithmetic).


1. IP/TCP/UDP Headers:
2. TCP Checksum Calculation:
3. IP Checksum:
4. One’s complement:’_complement
5. UDP datagram RFC:

Android TCP Client and Server Communication Programming–Illustrated with Example

This is a follow up post of the Android UDP Client and Server communication programming. TCP and UDP are the two main transport layer protocols used in today’s Internet. TCP aims to provide connection-oriented, reliable end-to-end communication service.

Similar to UDP, TCP programming in Android also uses  the APIs provided in package.

To program a TCP server, one needs to create a ServerSocket instance, and listen to (call accept method) incoming connections. If there’s a connection established, accept method will return a socket representing the opened connection. One can then receive and send message through the socket. This is illustrated as the code below,

private void runTcpServer() {

    ServerSocket ss = null;

    try {

        ss = new ServerSocket(TCP_SERVER_PORT);


        //accept connections

        Socket s = ss.accept();

        BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));

        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream()));

        //receive a message

        String incomingMsg = in.readLine() + System.getProperty("line.separator");

        Log.i("TcpServer", "received: " + incomingMsg);

        textDisplay.append("received: " + incomingMsg);

        //send a message

        String outgoingMsg = "goodbye from port " + TCP_SERVER_PORT + System.getProperty("line.separator");



        Log.i("TcpServer", "sent: " + outgoingMsg);

        textDisplay.append("sent: " + outgoingMsg);



    } catch (InterruptedIOException e) {

        //if timeout occurs


    } catch (IOException e) {


    } finally {

        if (ss != null) {

            try {


            } catch (IOException e) {






To program a TCP client, one needs to open a socket with server ip and port number. Once it’s connected, one can receive and send message through the socket. The code is as below,

private void runTcpClient() {

    try {

        Socket s = new Socket("localhost", TCP_SERVER_PORT);

        BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));

        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream()));

        //send output msg

        String outMsg = "TCP connecting to " + TCP_SERVER_PORT + System.getProperty("line.separator"); 



        Log.i("TcpClient", "sent: " + outMsg);

        //accept server response

        String inMsg = in.readLine() + System.getProperty("line.separator");

        Log.i("TcpClient", "received: " + inMsg);

        //close connection


    } catch (UnknownHostException e) {


    } catch (IOException e) {




Note that to use the network socket (ServerSocket and Socket, in our case), you’ll need to request for INTERNET permission. Adds the following line to AndroidManifest.xml will do,

<uses-permission android:name="android.permission.INTERNET"/>

Download and Screenshots
You can download the complete example (client) and (server) or from my github android tutorial repo. If you run server and client on your android device (server first), you’ll get something like this,


Figure 1. Screenshot for TCP Communication Client and Server

The TCP server receives the TCP packet from TCP client and outputs the message to a TextView.

Android UDP Client and Server Communication Programming–Illustrated with Example

UDP and TCP are the two main transport protocols used in today’s Internet. This post illustrates how to program UDP client and server in Android.

Android UDP communication uses the APIs provided in package, which are also available in Java standard edition etc.

To program a UDP server, one can use the code below,

private void runUdpServer() {

    String lText;

    byte[] lMsg = new byte[MAX_UDP_DATAGRAM_LEN];

    DatagramPacket dp = new DatagramPacket(lMsg, lMsg.length);

    DatagramSocket ds = null;

    try {

        ds = new DatagramSocket(UDP_SERVER_PORT);

        //disable timeout for testing



        lText = new String(lMsg, 0, dp.getLength());

        Log.i("UDP packet received", lText);


    } catch (SocketException e) {


    } catch (IOException e) {


    } finally {

        if (ds != null) {





The code opens a DatagramSocket and calls the receive method. Note that if setSoTimeout() is not called, the receive method is a block call, which means the execution will block until a udp message is received. If you don’t want this behavior, set the timeout to a value you want and catch for timeout exception.

For the client side, it’s also simple,

private void runUdpClient()  {

    String udpMsg = "hello world from UDP client " + UDP_SERVER_PORT;

    DatagramSocket ds = null;

    try {

        ds = new DatagramSocket();

        InetAddress serverAddr = InetAddress.getByName("");

        DatagramPacket dp;

        dp = new DatagramPacket(udpMsg.getBytes(), udpMsg.length(), serverAddr, UDP_SERVER_PORT);


    } catch (SocketException e) {


    }catch (UnknownHostException e) {


    } catch (IOException e) {


    } catch (Exception e) {


    } finally {

        if (ds != null) {





The code also opens a DatagramSocket, and create a DatagramPacket with destination address and port number. It calls send method to send the DatagramPacket out.

Note that to use the network socket (DatagramSocket, in our case), you’ll need to request for INTERNET permission. Adds the following line to AndroidManifest.xml will do,

<uses-permission android:name="android.permission.INTERNET"/>

Download and Screenshots

You can download the complete example here (client) and here (server) or from my github android tutorial repo here. If you run server and client on your android device (server first), you’ll get something like this,


Figure 1. Screenshot for UDP Communication Client and Server

The UDP server receives the UDP datagram from UDP client and outputs the message to a TextView.

Android Video Recording API–Illustrated with an Example

Android has made video recording very simply by providing a few high level classes, including Camera, SurfaceView and MediaRecorder. This tutorial illustrates the Video Recording API by providing a workable example as shown in the screenshots.


Figure 1. Screenshots of Android Video Capturing Sample Program

The Camera class allows us to access the Camera on an Android device; the SurfaceView class provides a drawing surface, it is used to present a live preview in our application; the MediaRecoder contains the API to configure and record the video, it uses Camera class to access the hardware camera.

0. Live Preview Before Recording

Before users start to record video, they want to see what they’re capturing. That’s what the preview function does, it allows one to see what the camera is able to capture. The preview implementation is briefly illustrated as the code below,

SurfaceView prSurfaceView = (SurfaceView) findViewById(;

SurfaceHolder prSurfaceHolder = prSurfaceView.getHolder();



Camera prCamera =;        

Camera.Parameters lParam = prCamera.getParameters();





1. Video Recording and Live Preview at Recording

The state transition diagram for Android MediaRecorder class is as below,


Figure 2. State Transition of Android MediaRecorder

The implementation follows the state transition diagram closely. Starting from Initial state, our code sets the audio source as microphone, video source as camera, and the program enters Initialized state. The output format can be set to mpeg4 or 3gp, which indicates the container format for the video recorded. At DataSourceConfigured state, we set the audio encoder as AMR_NB (Later versions of Android supports AAC), and video encoder as MPEG4,H263 or H264 based on user configurations. We also set the frame rate, and output file here. After that, we call prepare method to transit to Prepared state, and finally a start method call will start the recording.

The description above doesn’t contain every detail but just for illustration. The actual code in contains lots of error handling and code for remembering and setting user configurations.

Note that the encoders a device supports may differ.

2. Download

You can download the entire source code for the example above from here, or from my github repo.




Effective Color Conversion (YUV->RGB) for Android in Assembly

Please jump to “0. Android NDK” if you want pure technical stuff; please jump to end of the post to download the sample code if you want to figure out everything yourself.

Recently I am doing a project that decodes a video using ffmpeg and render it on Android using Bitmap. At first, I tried to using sws_scale to do color conversion and scaling, it’s very slow (100~300ms).

Then I found from reference 1 that there’s ARM assembly code available to do YUV to RGB/RGBA color conversion. The code downloaded from the website doesn’t compile for Android, so I changed the code a little bit and made it work on Android.

The code also contains C equivalent  procedures to do the conversion. It also runs faster than sws_scale. In case the assembly code doesn’t work for your processor, you can use the C equivalent methods.

0. Android NDK

In order to do yuv to rgb color conversion in assembly code, you’ll need to have Android NDK set up. I’m testing using NDK r5b, but other NDK versions should also work.
Note that you’ll need basic NDK knowledge in order to understand the example given. This includes how to pass bitmap, string, and primitive data from Java and C, how to manipulate bitmap object in C and basic JNI stuff.

1. How the Sample Program Work
The sample program consists of both Java and native C code. The java code will copy the test.yuv file from assets folder to /sdcard/test.yuv (, create a bitmap object (, call the native code (also, native method defined in jni/test.c file) to do conversion and render the converted bitmap on screen.

The native code (jni/test.c) will manipulate the bitmap buffer, open the test.yuv file to get the YUV bytes and call the assembly procedure to do yuv2rgb8888 (defined in jni/yuv2rgb/yuv4202rgb8888.s file) conversion.

The assembly procedure is modified based on the code from reference 1. I don’t understand it fully yet.  🙁

2. How to Compile
I’ve created the file. So you just go the jni folder, and type “ndk-build” will build the native library for you.

3. Who might Need this Code
The code basically contains the color conversion and also rendering on Android. If you’re developing a video player using ffmpeg, or a video game, you might want to handle color conversion and rendering yourself, then you might find this code useful.

4. Final Note
The sample code only test on yuv420 to rgb8888, but the code actually contains a lot other assembly code. You’ll need to modify the assembly procedure yourself and add them to But it should be straightforward and you can refer the yuv4202rgb8888.s file for reference.

When you call the assembly procedure, you might need to switch the order of u and v if you find the color is wired after conversion.

Note that it is the users’ responsibility to ensure that they are patent free before using the code.

5. Download
You can download the entire source code package here or from github here.


Modification of RED AQM in NS2

Please read the previous post here first to get some background information about RED.

0. Download ns2, Compile it and Install it.

This is a preparation step. It’s not the focus of this post, please refer to reference 1 for details. Note that since we’re going to modify NS2, we’ll need to download and source code and build it ourselves.  

1. Change and red.h

RED calculates the probability of dropping a packet by 2 steps when average queue size falls in between th_min and th_max. The first step is to calculate a probability called Pb, which is based on the formula below,

Pb = Pmax*(avg – th_min)/(th_max-th_min)                                        (1)

It can be seen that the probability Pb is calculated based on a linear function.

In the second step, RED counts the number of packets have gone through the gateway (count) since last packet is dropped from the flow and apply the following formula,

Pa = Pb / (1 – count*Pb)                                                                             (2)

In this part, we modified the first step by applying 4 different functions to calculate Pb and compare the results. The curves for the 4 functions are roughly as below,


Figure 1. Five Different Functions to Evaluate Pb

1.1 Analysis

You can ignore this part and jump to 1.2 if you only want to know how to change a protocol in NS2.

NS2 RED queue implementation doesn’t use the formula (1) above directly, instead it uses the formula below,

Pb = Pmax*(v_a * v_ave + v_b)                                                       (3)

where v_a = 1.0 / (th_max – th_min), v_ave is the average queue size, and v_b = -th_min / (th_max – th_min). This forumla is equivalent to formula (1).

For concave function v1, we use log function,

Pb = Pmax*(m*log(v_ave) + n)                                                        (4)

As the curve intersects with original linear function at th_min and th_max, we have the following,

0 = Pmax*(m*log(th_min) + n)                                                         (5)

Pmax = Pmax*(m*log(th_max + n))                                                 (6)

Also the original linear function gives us the function,

0 = Pmax*(v_a*th_min + v_b)                                                          (7)

with (5) (6) and (7), we can get values for m and n in formula (4),

m = 1/(log(th_max) – log(-v_b/v_a))                                                (8)

n = -m*log(-v_b/v_a)                                                                          (9)

Similarly, we can express the formula for function v4 the convex function (we chose exponential function) using th_max, v_b and v_a as below,

Pb = Pmax*(m*exp(v_ave)+n)                                                          (10)


m = 1/(exp(th_max) – exp(-v_b/v_a))                                             (11)

n = -m*exp(-v_b/v_a)                                                                       (12)

As for function v2 and v3, which are piecewise linear functions, we express the functions as below,

Pb = Pmax*(m*v_ave + p) when v_ave in [th_min, (th_min+th_max)/2] (13)

Pb = Pmax*(n*ave + q) when v_ave in [(th_min+th_max)/2, th_max]

We can get the following equations,

m*th_min + p = 0                                                                                  (14)

m*[(th_min+th_max)/2] + p = n*[(th_min+th_max)/2] + q         (15)

n*th_max + q = 1                                                                                  (16)

With (14), (15) and (16), we can derive the relationship between m and n,

n = [m*(th_max+th_min)/2 – m*th_min] / [(th_max+th_min)/2 – th_max]                                                                                                 (17)

For function v2, we set m = 1.2*v_a, so we have,

p = -1.2*v_a*th_min                                                                            (18)

n = [1.2*v_a*(th_max+th_min)/2 – 1.2*v_a*th_min] / [(th_max+th_min)/2-th_max]                                                                                                 (19)

q = 1 – n*th_max                                                                                   (20)

For function v3, we set m = 0.8*v_a, so we have,

p = -0.8*v_a*th_min                                                                             (21)

n = [0.8*v_a*(th_max+th_min)/2 – 0.8*v_a*th_min] / [(th_max+th_min)/2-th_max]                                                                                                  (22)

q = 1 – n*th_max                                                                                    (23)

1.2 Changes the Source Code

In order to add the four different evaluation functions for Pb and make it configurable, we’ll need to add a new input parameter in red.h. We add a variable func to the data structure edp.

struct edp {


     * User supplied.


    int mean_pktsize;    /* avg pkt size, linked into Tcl */

    int idle_pktsize;    /* avg pkt size used during idle times */

    int bytes;        /* true if queue in bytes, false if packets */

    int wait;        /* true for waiting between dropped packets */

    int setbit;        /* true to set congestion indication bit */

    int gentle;        /* true to increases dropping prob. slowly *

                 * when ave queue exceeds maxthresh. */

    double th_min;        /* minimum threshold of average queue size */

    double th_min_pkts;    /* always maintained in packets */

    double th_max;        /* maximum threshold of average queue size */

    double th_max_pkts;     /* always maintained in packets */

    double max_p_inv;       /* 1/max_p, for max_p = maximum prob.  */

                            /* adaptive RED: the initial max_p_inv     */    

    double mark_p;        /* when p < mark_p, mark chosen packets */

                /* when p > mark_p, drop chosen packets */

        int use_mark_p;        /* use mark_p only for deciding when to drop, */

                /*   if queue is not full */

                /* Set use_mark_p true and set mark_p to 2.0 */

                /*   to always mark instead of drop */

                /*   when queue is not full */

    double q_w;        /* queue weight given to cur q size sample */

    int adaptive;        /* 0 for default RED */

                /* 1 for adaptive RED, adapting max_p */

    int cautious;           /* 0 for default RED */

                                /* 1 for not dropping/marking when the */

                                /*  instantaneous queue is much below the */

                                /*  average */

    double alpha;           /* adaptive RED: additive param for max_p */

    double beta;            /* adaptive RED: multip param for max_p */

    double interval;    /* adaptive RED: interval for adaptations */

    double targetdelay;     /* adaptive RED: target queue size */

    double top;        /* adaptive RED: upper bound for max_p */

    double bottom;        /* adaptive RED: lower bound for max_p */

                /* 0 for automatic setting */

    int feng_adaptive;    /* adaptive RED: Use the Feng et al. version */



     * Computed as a function of user supplied paramters.


    double ptc;        /* packet time constant in packets/second */

    double delay;        /* link delay */

    int func;        /* function used to calculate Pb*/


In, we first bind the variable in the REDQueue constructor REDQueue::REDQueue,

bind(“function_”, &edp_.func);            //roman10: function number: 0~4

Then we change the calculate_p_new function as below,


 * Calculate the drop probability.



REDQueue::calculate_p_new(double v_ave, double th_max, int gentle, double v_a, 

    double v_b, double v_c, double v_d, double max_p)


    double p;

    double lm, ln, lp, lq, Xmin;

    if (gentle && v_ave >= th_max) {

        // p ranges from max_p to 1 as the average queue

        // size ranges from th_max to twice th_max 

        p = v_c * v_ave + v_d;

        } else if (!gentle && v_ave >= th_max) { 

                // OLD: p continues to range linearly above max_p as

                // the average queue size ranges above th_max.

                // NEW: p is set to 1.0 

                p = 1.0;

        } else {

                // p ranges from 0 to max_p as the average queue

                // size ranges from th_min to th_max 

        if (edp_.func == 1) {

            lm = 1.0/(log(th_max) - log(-v_b/v_a));

            ln = -lm*log(-v_b/v_a);

            p = lm*log(v_ave) + ln;

        } else if (edp_.func == 2) {

            Xmin = -v_b/v_a;

            if (v_ave < (th_max + Xmin)/2) {

                lm = 1.2*v_a;

                lp = -lm*Xmin;

                p = lm * v_ave + lp;

            } else {

            ln = (1.2*v_a*(th_max+Xmin)/2-1.2*v_a*Xmin)/((th_max+Xmin)/2-th_max);

                lq = 1.0 - ln*th_max;

                p = ln * v_ave + lq;


        } else if (edp_.func == 3) {

            if (v_ave < (th_max - v_b/v_a)/2) {

            lm = 0.8*v_a;

            lp = v_b*lm/v_a;

            p = lm*v_ave + lp;            

            } else {

            ln = (0.8*v_a*(th_max+Xmin)/2-0.8*v_a*Xmin)/((th_max+Xmin)/2-th_max);

            lq = 1.0 - ln*th_max;

                p = ln * v_ave + lq;


        } else if (edp_.func == 4) {

            lm = 1.0/(exp(th_max) - exp(-v_b/v_a));

            ln = -lm*exp(-v_b/v_a);

            p = lm*exp(v_ave) + ln;

            //printf("%f, %f, %f, %fn", exp(th_max), exp(-v_b/v_a), lm, v_ave);

        } else {

                p = v_a * v_ave + v_b;


                // p = (v_ave - th_min) / (th_max - th_min)

                p *= max_p; 


    if (p > 1.0)

        p = 1.0;

    return p;


2. Change ns-default.tcl

We update ns-2.35/tcl/lib/ns-default.tcl file by adding the following to red queue configuration section (search for Qeueu/RED, and add it at some lines nearby),

Queue/RED set function_ 0

This line sets the function_ to be 0 if user doesn’t supply the function_ value.

3. Test the Changes

One can run the simulation configured below to see if our changes work or not.

set ns [new Simulator]


#    s1                             

#       -                       

#          -                   

#    s2 ----- r1            

#                 -       

#    -                r3  - - - - - D1    

#    -            -

#             -

#    S50   -


set NumSenders 50

set NumReceivers 1

#read scenario, seed and bottleneck bandwidth from the command line input arguments

set Scenario [lindex $argv 0]

set function [lindex $argv 1]

set seed [lindex $argv 2]


puts "scenario: $Scenario; function: $function; seed: $seed"

ns-random $seed

set BufSize 100

set PktSize 1024 

#set winSize 200

#in seconds

set Duration 50 

#create all nodes: note that the order of creating the nodes matter

for {set i 1} {$i <= $NumSenders} {incr i} {

    set s($i) [$ns node]


set r1 [$ns node]

set d1 [$ns node]



#open the nam trace file

set nf [open out.nam w]

$ns namtrace-all $nf


#open the traffic trace file to record all events

set nd [open w]

$ns trace-all $nd


#define a finish procedure

proc finish {} {

    global ns nf nd qtf

    $ns flush-trace

    close $nf

    close $nd

    close $qtf

    #start nam

    #exec nam out.nam &

    exit 0



#link the nodes

if { $Scenario == 1 } {

    Queue/RED set thresh_ 15

    Queue/RED set maxthresh_ 45

} elseif { $Scenario == 2 } {

    Queue/RED set thresh_ 20

    Queue/RED set maxthresh_ 60

} elseif { $Scenario == 3 } {

    Queue/RED set thresh_ 25

    Queue/RED set maxthresh_ 75

} elseif { $Scenario == 4 } {

    Queue/RED set thresh_ 30

    Queue/RED set maxthresh_ 90


Queue/RED set queue_in_bytes_ false

Queue/RED set gentle_ false

Queue/RED set function_ $function


for {set i 1} {$i <= $NumSenders} {incr i} {

    $ns duplex-link $s($i) $r1 10Mb 100ms DropTail

    $ns queue-limit $s($i) $r1 $BufSize


#r1 d1 and d1 r1 are different

$ns duplex-link $r1 $d1 3Mb 100ms RED

$ns queue-limit $r1 $d1 $BufSize


#trace the queue: note that link r1 d1 is different from d1 r1

set redq [[$ns link $r1 $d1] queue]

set qtf [open queue.txt w]

$redq trace curq_

$redq trace ave_

$redq attach $qtf


#set up TCP connections

for {set i 1} {$i <= $NumSenders} {incr i} {

    set tcp($i) [new Agent/TCP]

    $ns attach-agent $s($i) $tcp($i)

    set sink($i) [new Agent/TCPSink]

    $ns attach-agent $d1 $sink($i)

    $ns connect $tcp($i) $sink($i)

    $tcp($i) set fid_ $i

    $tcp($i) set packetSize_ $PktSize

    #$tcp($i) set window_ $winSize

    #set up FTP over TCP connection as traffic source

    set ftp($i) [new Application/FTP]

    $ftp($i) attach-agent $tcp($i)

    $ftp($i) set type_ FTP



#schedule events for the FTP agents

set StartTime [expr [ns-random]  / 2147483647.0 / 100]

puts "starttime $StartTime"

#temporarily set to 2

for {set i 1} {$i <= $NumSenders} {incr i}  {

    $ns at $StartTime "$ftp($i) start"

    $ns at $Duration+$StartTime "$ftp($i) stop"


#ensure the ftp application have enough time to finish, so we +1

$ns at $Duration+$StartTime+1 "finish"


#run the simulation

$ns run


Supposed we save the file as b.tcl, a sample command to run it,

ns b.tcl 4 1 100

4. Results

I run a simulation using the 5 different evaluation functions, below are the plots of average queue size and instance queue versus time. The red lines are the average queue size and green lines are instance queue size.




Figure 2. Average Queue Size and Instance Queue Size versus Time

From the plot, we can see that v1 and v2 keeps the average queue size at a lower level than regular RED, while v3 and v4 keeps the average queue size at a higher level than regular RED. This is expected as v1 and v2 are always above the original linear function given the same average queue size value in the range of [th_min, th_max], in other words, v1 and v2 are more aggressive in terms of dropping packets when the average queue size is between th_min and th_max. On the contrast, v3 and v4 are less aggressive than regular RED.

5. Download

You can download the files mentioned in this post here.

Follow the instructions below to apply the changes (you’ll need gnuplot to plot the figures),

1. Apply the changes
1.1 copy and red.h to ns-allinone-2.34/ns-2.34/queue/
1.2 copy ns-default.tcl to ns-allinone-2.34/ns-2.34/tcl/lib/
1.3 at command line, go to ns-allinone-2.34/ns-2.34/, “sudo make”
2. Run the simulation and plot figures: ./


1. How to Install NS2.34 in Ubuntu:

2. Problem description in detail

TCP Tahoe, Reno, NewReno, and SACK–a Brief Comparison

1. TCP and its Algorithms (Slow-Start, Congestion Avoidance, Fast Retransmit and Fast Recovery)

TCP is a complex transport layer protocol containing four interwined algorithms: Slow-start, congestion avoidance, fast retransmit and fast recovery.

In Slow-start phase, TCP increases the congestion window each time an acknowledgement is received, by number of packets acknowledged. This strategy effectively doubles the TCP congestion window for every round trip time (RTT).

When the congestion window exceeds a threshold named ssthresh, it enters congestion avoidance phase. TCP congestion window is increased by 1 for each RTT until a loss event occurs.

TCP maintains a timer after sending out a packet, if no acknowledgement is received after the timer is expired, the packet is considered as lost. However, this might take too long for TCP to realize a packet is lost and take action. A fast retransmit algorithm is proposed to make use of duplicate ACKs to detect packet loss. In fast retransmit, when an acknowledgement packet with the same sequence number is received a specified number of times (normally set to 3), TCP sender is reasonably confident that the TCP packet is lost and will retransmit the packet.

Fast recovery is closely related to fast retransmit. When a loss event is detected by TCP sender, a fast retransmit is performed. If fast recovery is used, TCP sender will not enter slow-start phase, instead it will reduce the congestion window by half, and “inflates” the congestion window by calculating usable window using min(awin, cwnd+ndup), where awin is the receiver’s window, cwnd is the congestion window, and ndup is number of dup ACK received. When an acknowledgement of new data (called recovery ACK) is received, it returns to congestion avoidance phase.

2. Tahoe, Reno, NewReno, and SACK

TCP Tahoe is the simplest one out of the four variants. It doesn’t have fast recovery. At congestion avoidance phase, it treats the triple duplicate ACKs same as timeout. When timeout or triple duplicate ACKs is received, it will perform fast retransmit, reduce congestion window to 1, and enters slow-start phase.

TCP Reno differs from TCP Tahoe at congestion avoidance. When triple duplicate ACKs are received, it will halve the congestion window, perform a fast retransmit, and enters fast recovery. If a timeout event occurs, it will enter slow-start, same as TCP Tahoe. TCP Reno is effective to recover from a single packet loss, but it still suffers from performance problems when multiple packets are dropped from a window of data.

TCP NewReno tries to improve the TCP Reno’s performance when a burst of packets are lost by modifying the fast recovery algorithm. In TCP NewReno, a new data ACK is not enough to take TCP out of fast recovery to congestion avoidance. Instead it requires all the packets outstanding at the start of the fast recovery period are acknowledged.

TCP NewReno works by assuming that the packet that immediately follows the partial ACK received at fast recovery is lost, and retransmit the packet. However, this might not be true and it affects the performance of TCP. SACK TCP adds a number of SACK blocks in TCP packet, where each SACK block acknowledges a non-contiguous set of data has been received. The main difference between SACK TCP and Reno TCP implementations is in the behavior when multiple packets are dropped from one window of data. SACK sender maintains the information which packets is missed at receiver and only retransmits these packets. When all the outstanding packets at the start of fast recovery are acknowledged, SACK exits fast recovery and enters congestion avoidance.

Note that the four variants of TCP only differs when there’s a packet loss. If all packets reach the destination successfully, the four variants behave the same.


1. Allman, M., Paxson, V. and Blanton, E. 2009. TCP Congestion Control. RFC 5681.

2. Fall, K. and Floyd, S. 1996. Simulation-based Comparisons of Tahoe, Reno and SACK TCP. ACM SIGCOMM Computer Communication Review, 26(3):5-21.

Drop Tail and RED–Two AQM Mechanisms

1. Why Active Queue Management (AQM) Mechanism

Network bandwidth is shared by users. Users want to get as much bandwidth as possible. However, there’s always limited bandwidth. When the bandwidth cannot satisfy what everyone is trying to get, congestion might occur. That’s why there’re congestion avoidance algorithms implemented in TCP at the users’ end.

But we cannot simply rely on end hosts to avoid congestion. Although TCP is designed to avoid congestion, hackers might change TCP implementation to grasp more bandwidth than other users; And how about UDP? Although TCP takes most of the Internet traffic, UDP senders can send lots of traffic which congest the network. Therefore, gateway/routers are identified as the most appropriate place to detect and avoid congestion, and allocate bandwidth fairly.

There’re two types of algorithms have been studied related to congestion control at gateways: queue management algorithms and scheduling algorithms. Scheduling algorithms work by scheduling which packet from which flow is going to be processed by the gateway. In this way, it allocates bandwidth among flows. On the other hand, queue management algorithms manage the flows by managing the buffer at the gateway.

2. Drop Tail

Drop Tail is a simple active queuing management (AQM) algorithm used in many routers. It doesn’t differentiate traffic from different sources. As long as the queue is filled up, it will drop subsequent packets arrived. In other words, drop the tail of sequence of packets.

Drop Tail is simple and easy to implement, however, it suffers from a couple of drawbacks.

● It doesn’t distribute buffer space fairly. Because Drop Tail doesn’t differentiate traffic from different sources, sources with higher traffic volume will take more buffer space.

● If multiple TCP connections exist in the system and a buffer overflow will cause TCP global synchronization, which reduce the network throughput and utility significantly.

3. Random Early Detection (RED)

Random Early Detection or Random Early Drop (RED) is another AQM. It monitors the average queue size and take actions on packet (either drop or mark) based on statistical probabilities.

RED is designed with the following goals,

● Provide connection avoidance by controlling the average queue size

● Avoid TCP global synchronization

● Avoid bias against bursty traffic

● Maintain an upper bound on average queue size even the transport-layer protocols doesn’t cooperate.

In order to achieve the goals, RED operates in three regions. When the average queue size lower than th_min, all packets are accepted; when the average queue size is between th_min and th_max, RED drops or marks a packet by a probability, which is calculated based on the average queue size, and the number of packets transmitted since the last dropped/marked packet in this flow; When average queue size goes beyond th_max, all packets will be dropped/marked.

In this way, RED monitors the average queue size to detect congestion, drops/marks the packet randomly to avoid global synchronization, and allows fluctuations in the actual queue size to accommodate bursty traffic and transient congestion. In case the transport-layer protocol doesn’t cooperate, RED can be configured to drop packets to maintain an upper bound.

Below is a figure illustrates how RED works in detail,

Figure 1. How RED Works


1. Floyd, S. and Jacobson, V. 1993. Random Early Detection Gateways for Congestion Avoidance. IEEE/ACM Transactions on Networking.