### Side Note: First draft on Mar 30 2011.

Video bilayer segmentation refers to the process of dividing the video frames into foreground and background. Here we introduce a video bilayer segmentation process which is close to real-time.

The entire process can be illustrated as the figure below,

Figure 1. Process Overview of the real-time Bilayer Segmentation

The input includes the video and the segementation mask for the first frame. The segmentation of the first frame can be done using background subtraction, interactive graph cut, image snapping, lazy snapping and so on.

The segmentation for the rest of the video frames are done one by one automatically by the process illustrated above.

1. Bayesian Estimation

For each pixel p in a video frame, a probability Iprob (p) of a pixel belongs to foreground can be expressed as,

where Cp is the color vector of pixel p, F and B are foreground and background respectively. The likelihood P(Cp|F) and P(Cp|B) are calculated by accumulating background and foreground color historgrams. The prior P (F) and P (B) are computed from the previous segmentation mask.

1.1 Calculation of likelihood P(Cp|F) and P(Cp|B)

The likelihood basically indicates the probability of a color being foreground (as in P(Cp|F)) or background (as in P(Cp|B)) based on color distribution of all previous segmentation results.

To build a color histogram (here we use 2 dimensional histogram as example), we set a two dimentional grid, each bin in the grid with certain value ranges. For example,

[0…10, 0..10][0..10, 11…20]….[0..10, 251..260]

[11..20, 0..10][11..20, 11..20]…[11..20, 251..260]

[251..260, 0..10][251..260, 11..20]…[251..260, 251..260]

And we count the number of pixels that falls into this 2-dimentional grids. As the video frame pixel normally contains 3 components, therefore, a 3-dimensional can be built for it.

There’re two ways of creating the color histograms for likelihood calculation. The first one is the accumlative histograms. As the foreground and background changes, the accumlative histogram can incorporate these changes into the calculation. As segmentation always contain some errors, the accumlation process can be improved by only updating the histogram for the bins which have zero values. In this case, the error pixels doesn’t accumulate and only have very small values in the histogram with limited influence.

The other method is to use the first segmentation result to build the color histogram and use it for subseqent processing. This is useful if we know the foreground and background color distribution is not going to change much.

Once the color histogram is built, We can normalize them and read the values for the likelihoods P(Cp|F) and P(Cp|B).

1.2 Compute Priors P (F) and P(B)

The priors are computed based on the previous frame, in consideration of temporal correlations. The computation can be expressed as the formula below,

where a(t-1) is the previous segmentation mask, with 255 indicates the foreground and 0 for background. G3x3 and G7x7 are Gaussian filters. Resize are scaling transformation operations. The result Mt is a smoothed mask.

With Mt, the P (F) and P(B) are be calculated by,

With the priors and likehoods, Iprob can be calculated.

Reference: