Tools and Application for Business Excellence: Logistic Regression

Overview

While statistical techniques like regression, Analysis of Variance aka ANOVA are useful when a response variable (Y) is continuous. However, if the (Y) aka Key Performance Output Variable (KPOV) is discrete than these methods end up being redundant or futile.

If the response variable is binary (discrete) and the input variable(s) is/are continuous than we can use the BLR method. Binary Logistic regression is helpful to understand how various factors affect the probability of an event.

To gain in-depth knowledge of the binary logistic regression it will be a good idea to break the equation  and understand it bit by bit.

Equation = Ρ = βθ + β1 + β1×1 + β2×2 + β3×3…..+ βnxn

  • P = Probability
  • β1, β2, βn = are the coefficients, which we want to see if they are statistically significant or not and if they are what are their values
  • x1,x2,xn = are the factors or independent variable having some effect (significant or non-significant) on the probability

Binary logistic regression also has a concept of “Odds” (O) this can be understood by the example of winning a bet. If the probability of winning a bet is 0.75, odds in facor of winning the best are = 0.75/(1-0.75) = 3, this means that it is three times likely to win the bet compared to loosing. Those who are familiar with betting will be in a better position to understand the workings of offs compared to those who are novice and understand this logic from an equation perspective.

Best Practices for Binary Logistic Regression:

  1. Go “Full Throttle” or “Full Model” this means ensuring that the model includes all the significant factors present in the data.
  2. “Reduce one variable at a time”, then run the regression using reduce model. This will ensure that the model is reduced to only those variable which are vital and with no multicolinearity.

How to assess the model?

“The Log Likelihood Static”, it is similar to the residual sum of squares in multiple regression and is an indicator of how much unexplained information is there post model fitting. Large values indicates poorly fitted statistical model.

Apart from the above statistics, most applications provide exhaustive additional information which can help to assess the performance of the model. I will be discussion this regression technique in my upcoming article in much more details.

Till then stay tuned & happy modeling !!!

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s