Ghost Object Causing Crash in Java

A Bad Code Style

Java’s GC has made our life easier, but it also makes things more implicit and sometimes easier to make mistakes. Recently I encountered a bad code style which is easy to introduce bugs.

Imagining you have a class with a reference field and a method. The method creates a new object and assign it to the reference. Every time you call the method, a new object is created and assigned to the reference. The code is essentially like below,


class A {
    SomeClass B;
    void initB() {
         B = new SomeClass();
         …… do something with B … …
    }
}

Just call initB many times, this pattern creates a lot of objects without referencing to them. Since we’re not referencing to them, GC will do us the favor of garbage collecting them. However, things can easily go wrong.

When we do something with B after creation of the object, the object can be referred at somewhere else (e.g.: B is a task, and do something adds it to a system task queue). Now we call initB many times, we’ll end up with lots of objects somewhere and only one referred by B. We may not have control over the non-referenced objects any more. This results in garbage.

Things can go worse. If SomeClass defines some callback method, which is triggered later by some signal. We may ends up having callback method from many instances of SomeClass.

An Example in Android

Below is an example in Android.


package com.example.androidtest;

import android.os.Bundle;
import android.os.Handler;
import android.app.Activity;
import android.util.Log;

public class MainActivity extends Activity {

  private static final String TAG = "MainActivity";
  
  private Handler handler;
  private Runnable runnable;
  private Integer value;
  
  @Override
  protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);
    value = 10;
    startRunnable();
    startRunnable();
    removeRunnable();
  }
  
  private void startRunnable() {
    handler = new Handler();
    runnable = new Runnable() {
      @Override
      public void run() {
        Log.d(TAG, value.toString());
      }
    }; 
    handler.postDelayed(runnable, 2000);
  }
  
  private void removeRunnable() {
    handler.removeCallbacks(runnable);
    value = null;
  }
  
}

 

The startRunnable method creates a new Handler and Runnable objects, assigns them to the class reference field. We called this method twice, leaving a Handler and Runnable unreferenced by us.

However, these objects are still referenced by some Android system components. The postDelayed method essentially adds the runnable to a message queue associated with the current thread (The actual details are more complicated than this but you get the idea).

We then call removeRunnable to remove the runnable associated with handler, and assign null to value.

However, this doesn’t remove the ghost runnable associated with the ghost handler. And the run() method of the ghost runnable will be triggered, which will cause a crash when it tries to access value.toString().

The Fix

This coding style is bad and easy to introduce bugs. We essentially only want a single instance of Handler and Runnable at any time. If we have to create a new instance, we should ensure the old one is discarded properly, and leaving them referenced by system components is not.

If the system provides some callback functions that are guaranteed to be called once and we can create the objects inside those functions.

If we need to create objects inside some random methods, we can either reuse the old object, or properly reset the old object before creating new one. The code below shows how to do that to fix the example above.


package com.example.androidtest;

import android.app.Activity;
import android.os.Bundle;
import android.os.Handler;
import android.util.Log;

public class MainActivityNew extends Activity {

  private static final String TAG = "MainActivityNew";
  
  private Handler handler;
  private Runnable runnable;
  private Integer value;
  
  @Override
  protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);
    value = 10;
    startRunnable();
    startRunnable();
    removeRunnable();
  }
  
  private void startRunnable() {
    if (handler == null) {
      handler = new Handler();
    } else {
      handler.removeCallbacks(runnable);
    }
    runnable = new Runnable() {
      @Override
      public void run() {
        Log.d(TAG, value.toString());
      }
    }; 
    handler.postDelayed(runnable, 2000);
  }
  
  private void removeRunnable() {
    handler.removeCallbacks(runnable);
    value = null;
  }
  
}

 

In startRunnable method, if a Handler instance is available, we’ll just reuse it; If there’re potentially old Runnable associated with handler, we’ll remove them.

HtmlUnit Memory Leak — A Workaround

HtmlUnit is a programmable browser without GUI. It’s written in Java and exposes APIs that allow us to open a window, load a web page, execute Javascript, fill in forms, click buttons and clicks etc. HtmlUnit is typically used for website testing or crawling information from web sites.

Recently I worked on a task which uses HtmlUnit 2.10 to retrieve information of some web page with fairly complex Javascript. It seems the Javascript engine is causing some memory leak issues. After loading a few web pages, the memory usage is becoming high (>1GB) and eventually OutOfMemoryError will occur. The HtmlUnit FAQ page suggests that we should call WebClient.closeAllWindows(). I tried but it doesn’t work.

Instead of digging into the JavaScript engine and find out why the error happened. I decided to use a workaround — use a two process approach. The main process will keep track of the pages that has been crawled, what to crawl next etc. and create a child process to do the actual retrieval using HtmlUnit. After the child process finishes crawling for several pages, it will exit. The main process will create a new process to crawl next few pages. To make things simple, the two processes use file IO for Inter-Process Communication (IPC). That is, the child process writes what are the pages have been crawled, the main process reads it to update what have been crawled.

Because all memory allocated to the child process will be freed when it is terminated. This approach can work with the memory leak unfixed, but with performance penalty.

Reference:
1. HtmlUnit website: http://htmlunit.sourceforge.net/
2. http://htmlunit.sourceforge.net/faq.html

A Small Good Habit of Equality Test

Many programmers make mistakes at times by writing “=” instead of “==” in equality test. A good habit of programming is to write the constant first. For example, if we want to test if the length of a string is equal to 0. Instead of writing

strlen(x) == 0

We can

write 0 == strlen(x)

In case we write “=” by mistake,

0 = strlen(x)

The compiler will complain about it.

In this way, we will easily find the bug at the compilation instead of squeezing head.

Turn the Selfishness of Individuals into Good of Public

This was a ancient Chinese story regarding Zeng guofan (曾国藩). Zeng is a military general and Confucian scholar in Qing Dynasty. He has a large group of counselors working for him. But one thing bothers him is that a lot of people leaves after working for a short period. One day, he asked one of his trusted counselors why. The person answered, “It is hard to get promoted working for you because you think people should be selfless, and you only promote people with great contributions. There are other generals promotes people more quickly.”  Zeng then asked “How can we solve it”? The counselor said, “You’ll need to be selfishlessness and turn the Selfishness of Individuals into Good of Public  (合众人之私,成一人之公)”. Zeng took the advice and his counselors team became the best thereafter.

Think about it. Few people are selfless. Ordinary people like us are all working for our self interest. In all the companies I worked, people comes and leaves frequently. For the people who stays, quite a few of them do not have a better option. Few people really enjoys working.

Every company needs to take care of the selfness of the individuals, not just the management team, but everyone. Everyone should be given the opportunity to realize his/her full potential. Otherwise, the guy is going to leave sooner or later. And the company may ends of keeping someone, the ones that don’t have a better choice and don’t enjoy working.

A Bug Occurred at malloc

0. Context

I’m developing a program that allocates memory dynamically in a loop. For every iteration of the loop, the code allocates memory using malloc and then operates on the allocated memory. The program crashes at second iteration.

1. Locate the Bug

I added a few printf statements, and found that the program crashes when calling malloc at second iteration.

I used valgrind to run the program, and below is the output,

alloc size: 46104
–4149– VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) – exiting
–4149– si_code=1;  Faulting address: 0x74616874;  sp: 0x62ac7a58
valgrind: the ‘impossible’ happened:
  Killed by fatal signal
==4149==    at 0x3809D201: myvprintf_str (m_debuglog.c:530)
==4149==    by 0x3809D9E2: vgPlain_debugLog_vprintf (m_debuglog.c:877)
==4149==    by 0x380298E5: vprintf_WRK (m_libcprint.c:111)
==4149==    by 0x380299A7: vgPlain_printf (m_libcprint.c:143)
==4149==    by 0x38027A96: vgPlain_assert_fail (m_libcassert.c:259)
==4149==    by 0x656C501F: ???
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==4149==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==4149==    by 0x80527B2: ??? (in /home/roman10/Desktop/github/AndZop/jni/andzop_desktop/ffplay)

It can be seen that the segmentation fault is caused by malloc. But why?

2. The Reason and Fix

How come malloc can give segmentation fault?

I searched for a while, and found a discussion at reference 1. Someone else has encountered the same error at valgrind, and it’s caused by some bug at other code lines instead of malloc.

I checked my code carefully, and got the reason. At the first iteration, I used more the memory space allocated. When malloc is called at second iteration, it probably tried to allocate memory from the same memory region that was used illegally at first iteration.

3. Additional Notes

The code that stopped working doesn’t have to be the code that caused it.

Memory related bugs is hard to debug and one should pay special attention to memory manipulations.

Memory overflow may not cause any issues at the place it happens. But it’s likely to cause issues at other places. 

References:
1.
http://comments.gmane.org/gmane.comp.debugging.valgrind/9620

Bug Caused by pthread_create Input Parameters Passing

0. The Context

I was working on a multi-threaded application using pthread library on Linux. One part of the application uses a thread pool with 100 threads. Each thread is supposed to finish its job and exit. The execution noramlly takes less than several seconds (we’ll call it short-living thread).

1. Locate the Bug

The application behaves abnormal, and I added a few printf statements to find what went wrong. It seems the input parameters passed into the short living thread changes inside the thread execution function. I know that usually means I’m not handling the memory correctly.

2. The Reason and Fix

The input parameter is a data structure, I passed the pointer of the data structure to the thread execution function, and the content of the data structure is copied in the thread execution function. Then the data structure instance is reused.

Wait a minute. What if there’re multiple threads created by pthread_create at almost the same time, before the copy operation is carried out within each thread, the content can be changed by some thread already.

I realized that I’m not handling the parameter passing in pthread correctly. I should allocate separate data structure instance for each short living thread in the thread pool, then the data can never be messed up between threads.

3. How to Avoid Such Bugs

Every body who has programmed a multi-threaded application knows how difficult debugging can be sometimes. The shared data, the scheduling, dead-lock avoidance, there’s lots to be careful.

Program slowly, think before changing the code, and don’t leave everything to debugging.

A Bug Caused by Overlooking Operators Precedence in C

Side Note: I’ve been thinking of writing blogs for bugs that have caused me hours to locate and fix for quite some time.

Sometimes I found myself making silly mistakes, sometimes I made same mistakes multiple times. Just like every other programmer, I spent lots of time locating bugs and fixing them. It’s worthwhile to note the mistake down and avoid making them again. So here comes the first one.

0. The Context
I’m developing an application that needs to read files using mmap. I code the part that mmap a file, but whenever I run the program, the mmap operation always give me “No Such Device” error.

1. Locate the Bug
I searched mmap Linux man page, the “No Such Device” error code means “The fildes parameter does not refer to a *TYPE2 stream file (*STMF) in the “root” (/), QOpenSys, or user-defined file systems.”

A couple of other articles discussed this bug, but I’m not making the same mistake as discussed in those articles.

As the application contains lots of code and difficult to debug, I decided to write a simple test program (actually I just found a mmap sample online and copied it).

#include <stdio.h>

#include <stdlib.h>

#include <fcntl.h>

#include <unistd.h>

#include <sys/mman.h>

#include <sys/stat.h>

#include <errno.h>

 

int main(int argc, char *argv[])

{

    int fd, offset;

    char *data;

    struct stat sbuf;

    FILE *f;

 

    if (argc != 2) {

        fprintf(stderr, "usage: mmapdemo offsetn");

        exit(1);

    }

 

    f = fopen("./h1_1280_720_5m.mp4_mbstpos_gop1.txt", "w");

    fprintf(f, "sssssssssssssssssssssssssss");

    fflush(f);

    //this line will cause no such device error

    printf("fd: %dn", fd);

    if (fd = open("./h1_1280_720_5m.mp4_mbstpos_gop1.txt", O_RDONLY) == -1) {            //line caused bug

    //if ((fd = open("./h1_1280_720_5m.mp4_mbstpos_gop1.txt", O_RDONLY)) == -1) {            //correct source code

        perror("open");

        exit(1);

    }

    printf("fd: %dn", fd);

    if (stat("./h1_1280_720_5m.mp4_mbstpos_gop1.txt", &sbuf) == -1) {

        perror("stat");

        exit(1);

    }

    printf("file size: %ld", sbuf.st_size);

    offset = atoi(argv[1]);

    if (offset < 0 || offset > sbuf.st_size-1) {

        fprintf(stderr, "mmapdemo: offset must be in the range 0-%ldn", sbuf.st_size-1);

        exit(1);

    }

    

    data = mmap(0, sbuf.st_size, PROT_READ, MAP_SHARED, fd, 0)    ;

    if (data == (-1)) {

        perror("mmap");

        exit(1);

    }

 

    printf("byte at offset %d is '%c'n", offset, data[offset]);

 

    return 0;

}

The test code works fine, I then compare the code with the mmap code from the app. I didn’t notice any difference. Then I started to replace the code from test code with the app code line by line (comments out the original code, copy app code over and change parameters. In this way, both the test code and app code are kept and I can compare them easily). The error appears! It’s good that I can reproduce the error. I know that’s the first step to locate the bug in lots of cases.

Then I compare the code line by line, and located the difference.

if (fd = open(“./h1_1280_720_5m.mp4_mbstpos_gop1.txt”, O_RDONLY) == -1)
if ((fd = open(“./h1_1280_720_5m.mp4_mbstpos_gop1.txt”, O_RDONLY)) == -1)

2. The Reason and Fix

Once I located the bug, I almost know why this occurs. It must due to the precedence of “=” and “==” operators. I don’t remember the detail, so I googled it.

“==” operator has hight precedence than “=” operator. So in the app code, the open method return value is compared with -1 first, and the comparison result is assigned to fd. If the file is opened successfully, open method return values won’t equal to -1, the comparison result would be 0 (false). I then added some printf statement to confirm the reasoning.

Once the reason is found, fixing the bug is easy.

3. How to Avoid Such Bugs

I actually once read a programming suggestion that we should use parenthesis to indicate the order of evaluation in a compound statement. I agreed and followed most of the time.

Sometimes it can be painful to count the number of parenthesis as the statement is really complicated. Maybe that’s why I didn’t follow the practice so strictly.

The bug took me around 1.5 hours to fix and another 45 minutes to note it down. I’ve paid my price and I should remember to use parenthesis to indicate the execution order clearly, even they’re ugly sometimes. If the statement is too complex, we can always break it down to multiple simple statements.

The lesson is: Make precedence explicit by Parenthesis!

References:
1. C Operator Precedence and Associativity: http://www.difranco.net/cop2220/op-prec.htm

My Experience of Working at a Startup Company

It has been a week since I left my last job, a startup company working on video broadcast/streaming product solution.

I joined the company as the first full-time employee in Oct 18 2010 and left at 16 Sep 2011. The 11-month journey is full of experience of all sorts, exciting, depressed, joy, sad, working really hard, tired of working, etc.

Good Stuff about Working at Startup

  1. Working on new stuff, which is not the case normally when working at a big or medium-sized company. In a startup, we’re developing something new. We encountered all sorts of difficulties and solve them one by one. It’s fun and great learning experience.
    • e.g.: I worked on dialing out multiple mobile 3G modems using AT commands. Make the dialing process fast, reliable and automatic is something you cann’t count on the software available publicly.
  2. Working on Linux. I was mainly a Windows guy with some basic Linux knowledge. The startup is developing product based on Linux platform, so I picked up a lot of stuff about Linux. I’ve got to say, once you’re forced to use Linux and figured it out, you’ll feel great. You learn much more stuff than working under Windows through an IDE.
    • e.g.: vim, gnuplot, vlc, ffmpeg, ssh, netfilter, gcc/g++, gdb, Qt, Poco, Linux kernel programming… I learned to use these things under Linux, and see how powerful and amazing they’re when one picks them up.
  3. Learned new programming languages. I learned python and perl. Well, I’m not an expert on these two languages. But I did program in these two languages in some projects.
    • e.g.: I developed the modem dialing programing in python, and a automated testing system in perl.
  4. Worked with People with Dreams. People at Startup companies worked on dreams instead of sleeping on it. Most of the time I’m enjoying working with people of this kind.
  5. Worked with Senior Engineer Closely. For me, it’s lucky that the company has a senior software developer as CTO. He is experienced and willing to discuss programming and IT in general with me. The way he handles certain technique issues is good lesson for me.
  6. Worked on Web Dev. I’ve learned some basics about web development before and took some courses about it, but the work allows me to take one step further.
    • e.g. I developed the Web-based UI for the product, and modified the company website.
  7. Worked on Setting up EC2 Stuff. Cloud computing enpowers developer to deploy their work easier than before. It’s great to know something about it and better still I worked with it.
  8. Experienced the Entire Product Development Phase. If you go to a big/medium size company, you’re probably working on improvements for existing products, or build something for existing products. But at the startup, I’ve worked through the entire product build process.
  9. Experienced a Little Bit of Business Side. Developers are developers at big companies. At the startup, I’m doing programming, testing, customer demonstration, internal training, tech support etc. I interacted with potential customers a little bit, and watched how the founder handles customers etc. Well, who knows whether I’ll need some of the techniques one day.

Downside of Working at Startup

  1. No Time for Myself. For me, this is the biggest issue. I’m taking a part-time master at NUS; I’m developing several android apps (almost stopped completely when working at the startup);And I also have my personal life…
    • 5 days of tiring work; 2 days of course work study + master thesis project; I know I cannot hold it for long.
  2. Hard to Keep Interest if you don’t Share the same Vision with the Founder. OK, this is for me. I’m more interested in developing apps for everyone’s usage, not for businesses, which is my previous company is doing.
  3. Tired of Working on All Sorts of Stuff, Many of Which are Boring. I worked on Website, Web-based UI, modem dialing, simulation, Kernel Module development for Packet Filtering, Amazon and Video Streaming Server Set Up, etc. Well, it’s good to work on lots of different stuff, but I cannot find the focus and depth here.
    • The startup doesn’t have many projects that requires the focus and depth I was looking for. And the senior engineer is better candidate than me to work on these stuff.
  4. Flexible Time could Mean Long Time. Sometimes we come in at weekends. During my school holiday, I worked till 10 pm + almost every day for about two months.

After 10 months, I’ve realized that I’m not passionate about the work any more, and I quit.

Approach Matters–How a Different Method Solve a two-day Project in 30 Mins

It was friday afternoon. I was still busy debugging. The bug has been bothering me for 2 days. Then a colleage proposed another method, that saved my day and made me laugh at myself.

Let me start from the beginning. Our software has a web-based UI which is only available at client side which doesn’t have a public IP. In order to make the web UI accessible at the server side, we developed a tunnel that forwards the HTTP request from server to client and response from client to server.

This approach works. The issue is the UI at server side is slow when the client is connected to the Internet through a slow link sometimes. We did some simple test and found out it’s mainly due to the transfer of several javascript files.

Then here comes the natural approach, put the javascript files at server side and return the javascript files from server. I made the changes and started testing. However the tunnel started to function because of some changes I made, and there the debugging began.

Then it was friday afternoon. My colleague came to me and told that we can use another approach: host the javascript at our website.

OK, this is simple, but brilliant. We want to get the javascript from somewhere instead of client. It is not necessarily has to be our server. Instead of developing our own embedded web server and make sure it works with our tunnel, I can simply change the source to get the javascript file from another web server which is available publicly.

If I would have spent more time thinking about approaches and discussing with my colleagues, I will not spend two days developing the embedded server approach and debugging it.

Assume People do Simple Stuff Right–How Bookmark Work on Different Browsers

Every browser has bookmark/favourite function, but different browsers implement the workflow differently. Let’s look at them one by one.

IE 9

At IE9, user needs do the following steps.

1.  Click the star icon at the right side corner. Then a window pops up as below.

ie9

2. Users need to select the Favorites tab if it’s not the current tab, then click the “Add to favorites” button.

3. I thought that’s done. NO! A new window pops out.

ie92

4. Select folder, or new new folder (another window pops out).

5. Finally, click Add to add the page to favourite.

Firefox 4

1. Click the right side star icon at the Location bar. Guess what? That’s all. (The “Bookmark this page” is a tip popped out when I place my mouse over the start icon.)

image

One can find the bookmark as shown in the figure below,

firefox2

2. Of course, some people are more organized and they want to put bookmarks into different folders. Simple, just click the star icon again. Then you’ll see this,

image

The interface is quite intuitive and somehow similar across all browsers. Compared with IE9, it also allows you to remove the bookmark previously added.

Safari 5.1

1. Click the left side “+” icon of the Location bar, you’ll see window below,

safari2

2. Well, my first impression was “where is the new folder option?”.

2.1 Never mind, just click “Add”.

3. Later on, I figured out right click the empty space of the Location bar and we’ll be able to create a new folder.

Opera 11.50

0. Look around, but I cannot find any icon for bookmarks.

1. Click the upper left corner, then I find the place,

opera1

2. Click “Bookmark Page”, the familiar dialog pops out,

opera2

3. Well, we all know what to do next.

Chrome 12.0

1. Similar to firefox, it has a star icon at the right side of the Location bar, again the “Bookmark this page” is the tooltip text when I placed my mouse over the star icon.

image

Click the icon, you’ll see this,

chrome

2. Click Done, then done. If I want to remove the bookmark, click Remove. If I want to be more organized, click Edit. Then another window is shown,

image

Ok. This is also similar to all other browsers, except Chrome shows more bookmark folders than all other browsers. (All other browsers use a dropdown list.)

My Personal Preference and Why

This is purely subjective. But I will give my preference order as Chrome > Firefox > IE 9 = Opera > Safari, in terms of the bookmark function. Why?

Both Chrome and Firefox assume users will do the right thing, in other words, they think people click  the star icon because he/she actually wants to bookmark a page. So once users click the star icon, they add the bookmark.

Besides, they offer obvious and easy to use interface for users to further organize the bookmarks. (both requires two clicks. For firefox, it’s two clicks of the star icon; for Chrome, it’s one click of the star icon, one click of the Edit button.) Users also have the option to quickly remove the bookmark if he/she changes his/her mind or finishes the reading of the page.

Why Chrome is sightly better than Firefox in my preference? The “Edit Bookmark” window. I like it a lot more than all the other browsers offer. It’s much easier to choose a folder or organize folder structure.

IE9, Opera and Safari takes another approach. They tried to guide users through some process. First this, then that, do you confirm? Yes, then the action is taken.

Come on, I know what I’m doing, I simply want to save a page to read it later. Why so many steps? And Safari, please let me create new folder when I add a bookmark so I can be organized if I want to.

So lesson learned: assume your users are smart and execute their instructions as quick as possible, and always give them the choice to refine the instructions and rollback.