This is some notes taken when I summarize the things learned after taking Andrew Ng’s machine learning course at coursera.

**Introduction**

Linear regression predicts continuous values. At times, we need to categorize things. Logistic regression is a probabilistic statistical classification model does that.

We will examine how logistic regression classify things to two categories (either 0 or 1) first, and then how it is used for multiple categories.

The logistic regression model can be described by the following logistic/sigmoid function below,

h(x) an be interpreted as the estimated probability that y = 1 on input x.

If theta’X >= 0, h(x) >= 0.5, we predict output y = 1

If theta’X < 0, h(x) < 0.5, we predict output y = 0

theta’X essentially describes the decision boundary. Note that we can use other values instead of 0.5 as the cutoff point if it is more suitable.

**Cost Function**

The cost function for logistic regression is defined as below,

The cost is further defined as,

We can merge the functions, and the cost function eventually becomes

With regularization, the cost function becomes,

Note that j starts from 1 as a convention.

**Gradient Descent**

The gradient descent of logistic regression is identical to linear regression, except that h(x(i)) is different.

**Multi-class Classification: One-vs-All**

We can use one-vs-all technique to apply logistic regression to multi-class classification. The idea is to train a logistic regression classifier for each class i to predict the probability that y = i. Then we pick the category that has the maximum probability for an input.