Demystifying the Correlation Conundrum

It often gets confusing and terrifying when trying to solve the puzzle of which correlation technique should be deployed based on the X and Y variables properties. In this article we will discuss two highly used Correlation Analysis Techniques i.e. Pearson Correlation Coefficient and Spearman Rank Correlation.

Correlation is a bivariate technique that measures the relationship strength between two variables. The value of correlation varies between +1 & -1, where +1 denotes a highly positive relationship in between two variables and -1 indicates the inverse. As the correlation coefficient value goes near 0, the relationship between the two variables tends to get weaker.

Pearson Correlation

This correlation technique is widely used in statistics to measure the strength of the relationship between linear related variables. For example, in the contact center, if we want to measure how two metrics are related to each other (Call Duration and Non-Talk), Pearson correlation technique can be used to measure the degree of relationship between these two variables.

The following formula is used to calculate the Pearson correlation:

pearson-correlation-image

R = Pearson r correlation coefficient

N = number of value in each data set

∑xy = sum of the products of paired scores

∑x = sum of x scores

∑y = sum of y scores

∑x2 = sum of squared x scores

∑y2 = sum of squared y scores

Questions that can be answered by Pearson correlation

  • Is there a statistically significant relationship between call duration and Non-Talk?
  • Is there a relationship between temperature, measure in degree Celsius and ice cream sales, measure by income?
  • Is there a relationship between age in years and height in inches?

Assumptions

Pearson correlation technique assumes that both the variables are normally distributed. It also assumes that there is a linearity and homoscedasticity in between the variables. Linearity assumes a straight line relationship between each of the variable and homoscedasticity assumes the normally distribution along the regression line.

Spearman rank Correlation

Spearman rank correlation is a non-parametric test which is used to measure the degree of association in between the two variables. It was developed by Spearman, thus called the Spearman rank correlation. Spearman test assume anything about distribution of the data. It is appropriate to use the Spearman rank correlation test when the variables are measured on a scale that is ordinal.

The following formula is used to calculate the Spearman rank correlation:

spearman-correlation-image

Where:

P = Spearman rank correlation

di = the difference between the ranks of corresponding values Xi and Yi

n = number of value in each data set

Questions that can be answered by Pearson correlation

  • Is there a statistically significant relationship between participant responses to two Likert scales questions?
  • Is there a statistically significant relationship between how the Member Experience Surveys are done on the scale of 1 – 10 compared to their experience scores?

Assumptions

Spearman rank correlation test doesn’t make any distributional assumptions about the data. The assumptions of Spearman rho correlation are that data must be at least ordinal and scores on one variable must be monotonically related to the other variable.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s