Tools & Application for Business Excellence: Logistic Regression

Overview

While statistical techniques like regression, Analysis of Variance aka ANOVA are useful when a response variable (Y) is continuous. However, if the (Y) aka Key Performance Output Variable is discrete than these methods end up being redundant or futile.

If the response variable is binary (discrete) and the input variable(s) is/are continuous than we can use the binary logistic regression. Binary logistic regression are helpful to find out how various factors affect the probability of an event.

To gain in-depth understanding of the binary logistic regression it will be a good idea to break the equation of BLR (Binary Logistic Regression) and understand it bit by bit.

Equation – Ρ = ßo + ß1 + ß1×1 + ß2×2 + ß3×3…. ..+ ßnxn

Where

  • Ρ = Probability
  • ß1,ß2, 3 are the coefficients, which we want to see if they are statistically significant or not and if they are what are their values
  • x1, x2,x3 = are the factors or independent variables having affect (significant or non-significant) on the on the probability

Binary logistic regression also has a concept of “Odds” (O) this can be understood by the example of winning a bet. If the probability of winning a bet is 0.75, odds in favor of winning the bet are O = 0.75/(1-0.75) = 3. This means that it is three times likely to win the bet compared to loosing. Those who are familiar with betting will be in a better position to understand the workings of odds compared to those who purely understand this concept from an equation perspective.

How to assess the model?

“The Log Likelihood Statistic”, it is similar to the residual sum of squares in multiple regression and is an indicator of how much unexplained information is there post model fitting. Large values indicates poorly fitted statistical models.

How to assess the Predictors?

“The Odds Ratio” can be used to assess the predictors. The odds ratio quantify how each predictor effects the probabilities of each response level. For Example: if you want to determine whether age and gender affect whether a person will watch a horror movie. You create a logistic regression model using the following variables.

Variable Type Description
Horror Movie Binary Response Variable 0 if the person did not watch Horror Movie, 1 if the person did watch.
Gender Binary Predictor Variable 0 if Male , 1 if Female
Age Continuous Predictor Variable Any non-negative value.

Suppose the logistic regression model declares both predictors to be significant and Gender with an odds ratio of 2.0, the odds of a female watching a horror movie are two times the odds of a male. If Age has an odds ratio of 1.05 then the odds that a person watches a horror movie increases by 5% for each additional year of age.

Apart from the above statistics, most applications provide exhaustive additional information which can help to assess the performance of the model and are out of the scope of this article. However, in the end I can conclude by stating that the Logit Regression Models are the best when the response is discrete and predictors or input variables are continuous. Logistic regression models can be used to identify which of the factors or variables are most important contributors.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s