This is a follow up article on video boundary detection part 2, gradual transition detection.

In part 2, I’ve covered the detection of gradual transition detection. The twin-comparison method is effective in detecting graudual transitions, but it cannot determine which type of gradual transition it is.

This article introduces another method for video boundary detection, Standard Deviation of Pixel Intensities. This method is effective for detecting fade in/fade out transitions.

A sample sequence of fade out transition frames is as below,

This transition is produced by decreasing of pixel intensities over time, until the screen goes completely black. Fade in is the opposite of fade out.

The scaling of pixel intensities of fade in/out transitions is visible in the standard deviation of the pixel intensities. A plot of the standard deviation of pixel intensities for one of the test videos is as below,

The down lines and up lines correspond to fade out and fade in respectively. The zero value corresponds to the frame that is completely black.

Therefore, the fade in / fade out detection problem is equivalent to detection of the down lines and up lines.

Part 1 of video detection has covered the conversion of RGB channels to form an intensity component. The standard deviation calculation is a common mathematical computation and is not covered here. (But you can just google to find tons of materials about it.)

Use it with Twin-Comparison Method

The standard deviation of intensities can be used together with the twin-comparison method to better detect the fade in/fade out transition.

The idea is to use twin-comparison method to detect the gradual transitions, and then use standard deviation of pixel intensity to determine if the gradual transition is fade in/fade out.

A sample plot of the instensity histogram difference (left), accumulated intensitiy histogram difference and the standard deviation of pixel intensities (scaled and overlapped) (right)is as below,

The right graph shows the twin-comparison method can detect the transition effectively, and the standard deviation of pixel intensities method can be applied to determine it’s a fade in/fade out transition.

Side note: First Draft on Apr 15 2011.

Video Boundary Detection–Part 2 Gradual Transition and Its Matlab Implementation

Side Note: First draft on Apr 14 2011.

Gradual transitions are more difficult to detect than abrupt transitions. It has a lot of forms. Wipe, dissolve, fade in/fade out, just name a few of them.

The frame to frame difference of gradual transition is not as significant as it is in abrupt transitions. However, the difference between the first transition frame and the subsequent frames tend to increase, and this difference (called accumulated difference) will eventually become as comparably large as the difference seen in abrupt transition.

There is a popular method called twin-comparison method for gradual transition detection.

In this method, a lower thread hold Ts is set to detect the candidate frames that start a transition, and the same threshold used for abrupt detection Tb is used to compare against accumulative difference to test whether there really exist a transition. The end frame of the transition is detected when the consecutive difference is less than Ts, and the accumulated difference has gone beyond Tb.

An illustration of the twin-comparison is presented as the figure below,

The upper half of the figure shows how the lower threshold detects the potential start frame of the transition based on intensity histogram difference. And the lower half of the figure indicates the accumulated difference goes beyond higher threshold Tb and a gradual transition is detected.

Matlab Implementation

The matlab implementation of this method can be downloaded here. In some of the test videos, the neighboring frames doesn’t always have a difference bigger than Ts, so the implementation sets if there’re 1 out of 3 frames that has a intensity histogram difference bigger than Ts, the trend is considered as continuing.

Video Boundary Detection–Part 1 Abrupt Transition and Its Matlab Implementation

Side Note: First Draft on Apr 13 2011.

Video boundary detection is a foundamental technique for video retrieval. There’re two different types of transitions,  abrupt transition (also called cut) and gradual transitions, which includes fade in/fade out, dissolve, wipe, etc.

This article covers abrupt video transition detection and provide a simple implementation in Matlab.

Abrupt transitions are relatively easy to detect, as there’re normally a big difference between the two transition frames. The problem is equivalent to detect this big difference.

This difference can be measured on a pixel by pixel basis, in block-based manner, or based on some global characteristics of the frames, for example, color histogram and intensity histogram.

One of the effective way is intensity histogram. According to NTSC standard, the intensity for a RGB frame can be calculated as,

I = 0.299R + 0.587G + 0.114B

where R, G and B are Red, Green and Blue channel of the pixel.

For the intensity histogram difference we’re looking for, it can be expressed as,

where Hi(j) is the histogram value for ith frame at level j. G denotes the total number of levels for the histogram.

In a continuous video frame sequence, the histogram difference is small, whereas for abrupt transition detection, the intensity histogram difference spikes. Even there is a notable movement or illumination changes between neighboring frames, the intensity histogram difference is relatively small compared with those peaks caused by abrupt changes. Therefore, the difference of intensity histogram with a proper threshold is effective in detecting abrupt transitions.

The threshold value to determine whether the intensity histogram difference indicates a abrupt transition can be set to,

where mu and sigma are the mean value and standard deviation of the intensity histogram difference. The value of alpha typically varies from 3 to 6.

Implementation in Matlab

The implementation of the above method in Matlab can be found here. Note that sometimes gradual transitions can have spikes that even higher than those in abrupt transitions. In order to differentiate the abrupt transitions from graudal transitions, the neighboring frames of a detected spike are also tested, if there’re multiple spikes nearby, the transition is more likely to be gradual transition, and we simply drop this detection.

Test and Result

One of the video sequence I’ve tested present the following graph,

The straight line is the threshold, it’s clear that there’re two abrupt transitions in this video sequence.

Real-Time Video Bilayer Segmentation–Theory–Part 3 Color Likelihood Propagation

Side Note: First Draft on Apr 1 2011.

4. Color Likelihood Propagation

The Trimap based on Iprob divides the video frame into foreground (F), background (B) and unknown (U) regions. MRF can be applied to classify the pixels in U.

In the MRF energy equation, all other terms can be calculated based on previous computation except for L(pi|F) and L(pi|B), which are the color likelihood of some reliable feature points in the video frame.

The color likelihood propagation technique first will identify reliable pixels from a frame based on feature selection and motion between two adjacent frames (itself and next frame). Once these pixels are identified, their color likelihood are calculated.

The calculation is based on foreground and background distribution, meaning the segmentation result is already known. Therefore we use previous segmentaed frame to estimate the color likelihood of current frame.  In other words, a pixel p’s likelihood in current frame is assumed to be the same as the likelihood of its corresponding pixel q (based on optical flow to find the correspondence) in the previous frame.

The calculation is as below,

Let’s use foreground as an example. First, the mean color Fi and covariance of the foreground color distribution is calculated as above. where Wi is a regularization constant. wq is spatial Gaussian coefficient with σ = 8. Cq is the color vector for a pixel q. Ni is the known foreground colors within q’s neighborhood.

Then the foreground color likelihood of q is computed as,

This L(qi|F) is assumed to be the same as L(pi|F) in the current frame, which is what we’re looking for.

Based on background color distribution, L(pi|B) can also be calculated.

But how to select reliable feature and do optical flow in the first place?

Good Features to Track is a research paper talking about a technique of selecting reliable features to track in feature-based vision system.  It is used to select good features.

Lucas-Kanade Optical Flow is a technique to compute motion between two frames.

These two techniques are not covered here, and there’s tons of material online. Fortunately, OpenCV library has a good implementation of these two techniques. And one can apply them by making very simple function calls.

This color likelihood is different from the part 1 color likelihood in the sense it based on local color distribution, as it’s computed against a pixel’s neighborhood Ni, where the color likelihood of part 1 is based on color histogram, which indicates the global distribution of colors.

5. To be Continued on Implementation

With this final piece, MRF can be applied to solve the energy equation. This completes the theory for Real-Time Bilayer Segmentation. Part 4 will start with implementation, it’s done with Matlab, OpenCV C++ libraries, and MRF C++ libraries.

Real-Time Video Bilayer Segmentation–Theory–Part 2 Producing the Trimap

Side Note: First draft on Mar 31 2011.

2. Producing the Trimap

Below is an example of a trimap,

Figure 1. Trimap based on Iprob

How do we get this trimap?

Part 1 has covered how to compute the Iprob, once we have the Iprob for every pixel of a video frame, we compare the Iprob with a small real value epsilon.

here B stands for background, F stands for foreground and U refers to unknown. In figure 1, the white area are F, black area are B and gray area are U.

For the white and black areas, it’s confident to decide whether they’re foreground or background based on Iprob alone. The gray regions are further processed through MRF to decide it’s foreground or background.

As a matter of fact, the benefit of this trimap is to reduce the MRF processing. With the trimap, MRF processing is only done for pixels in unknown region.

3. GraphCut for Unkown Regions

There are many problems can be described using MRF in the form of

E = Ed+λEs

where the Ed is the data cost and Es is the smoothness cost. In essence, Ed describes the energy of a pixel being classified as certain category, Es describes the dependence of this classification’s dependency on its neighbors. The solution for these problems are usually finding a classification that results in minimum cost for all pixels.

There’re many ways to solve the cost (some refer as energy) minimization problem, one of them is GraphCut.

For this real-time bilayer segmentation, the Ed and Es function can be expressed as,

here Data(Seg) is the Ed for all pixels under unknown region. And,

here Sth(Seg) is the Es for all neighboring pixels under unknown region. dist(p,q) refers Euclidean distance between pixels p and q.

Note that Seg can be either F or B in the two equations above. For smoothness energy, if the two neighboring pixels p and q are classified as the same category (both F or both B), the energy is 0.

P(Cp|F), P(Cp|B) and Iprob are already calculated in part 1, smoonthness energy is also straightforward to calculated for two given pixels.

The method of calculating L(pi|F) and L(pi|B) will be covered in Part 3.

Real-Time Video Bilayer Segmentation–Theory–Part 1 Bayesian Estimation

Side Note: First draft on Mar 30 2011.

Video bilayer segmentation refers to the process of dividing the video frames into foreground and background. Here we introduce a video bilayer segmentation process which is close to real-time.

The entire process can be illustrated as the figure below,

Figure 1. Process Overview of the real-time Bilayer Segmentation

The input includes the video and the segementation mask for the first frame. The segmentation of the first frame can be done using background subtraction, interactive graph cut, image snapping, lazy snapping and so on.

The segmentation for the rest of the video frames are done one by one automatically by the process illustrated above.

1. Bayesian Estimation

For each pixel p in a video frame, a probability Iprob (p) of a pixel belongs to foreground can be expressed as,

where Cp is the color vector of pixel p, F and B are foreground and background respectively. The likelihood P(Cp|F) and P(Cp|B) are calculated by accumulating background and foreground color historgrams. The prior P (F) and P (B) are computed from the previous segmentation mask.

1.1 Calculation of likelihood P(Cp|F) and P(Cp|B)

The likelihood basically indicates the probability of a color being foreground (as in P(Cp|F)) or background (as in P(Cp|B)) based on color distribution of all previous segmentation results.

To build a color histogram (here we use 2 dimensional histogram as example), we set a two dimentional grid, each bin in the grid with certain value ranges. For example,

[0…10, 0..10][0..10, 11…20]….[0..10, 251..260]

[11..20, 0..10][11..20, 11..20]…[11..20, 251..260]

[251..260, 0..10][251..260, 11..20]…[251..260, 251..260]

And we count the number of pixels that falls into this 2-dimentional grids. As the video frame pixel normally contains 3 components, therefore, a 3-dimensional can be built for it.

There’re two ways of creating the color histograms for likelihood calculation. The first one is the accumlative histograms. As the foreground and background changes, the accumlative histogram can incorporate these changes into the calculation. As segmentation always contain some errors, the accumlation process can be improved by only updating the histogram for the bins which have zero values. In this case, the error pixels doesn’t accumulate and only have very small values in the histogram with limited influence.

The other method is to use the first segmentation result to build the color histogram and use it for subseqent processing. This is useful if we know the foreground and background color distribution is not going to change much.

Once the color histogram is built, We can normalize them and read the values for the likelihoods P(Cp|F) and P(Cp|B).

1.2 Compute Priors P (F) and P(B)

The priors are computed based on the previous frame, in consideration of temporal correlations. The computation can be expressed as the formula below,

where a(t-1) is the previous segmentation mask, with 255 indicates the foreground and 0 for background. G3x3 and G7x7 are Gaussian filters. Resize are scaling transformation operations. The result Mt is a smoothed mask.

With Mt, the P (F) and P(B) are be calculated by,

With the priors and likehoods, Iprob can be calculated.

Reference: