EDA made simple using R Function

In one of my previous posts I shared a single line python code which can help any analyst to perform an EDA analysis at the snap of the finger (click to read my previous article) and in this article we will learn how to create simple and efficient R code-template (don’t be scared by the name of “template” this cool function consists only 8 lines of code) that will allow us to perform EDA with ease and finesse. … More EDA made simple using R Function

WOE (Weight of Evidence) & Information Value (IV) Methods for Variable Selection in Machine Learning

Evolved from the logistic regression technique, these two concepts – WOE (Weight of Evidence) and Information value (IV) are being used as a benchmarking technique to screen variables in credit risk modeling / customer churn models to predict the probability of fraud or customer attrition. The Weight of Evidence (WOE) tells the predictive power of … More WOE (Weight of Evidence) & Information Value (IV) Methods for Variable Selection in Machine Learning

Pandas One Line Magical Code For EDA: Pandas Profile Report

For a lot of us EDA may simply mean getting deep into the data and finding some initial patterns and trends within the underlying data. It may also mean establishing correlations in between variables to curate some interesting insights. However, one thing that we as data analytics practioners cannot afford to overlook, which can potentially … More Pandas One Line Magical Code For EDA: Pandas Profile Report

Google Colab: Create predictive models in no time

To democratize data analytics and do all the data munging related heavy lifting Google’s Colaboratory, which is a Jupyter notebook environment which requires no setup and runs entirely on the cloud. Google’s Colaboratory is a perfect solution for today’s data analysts and engineers. In this article we will see how we can use this amazing cloud based platform and use Random Forest model to predict customer churn in less than 200 lines of code. … More Google Colab: Create predictive models in no time

Microsoft Azure ML Studio – A Tutorial on How to Create a Churn Model in No Time

In this article, we will see how we can implement a simple customer churn model that is built by using Azure Machine Learning studio. This article will give us a starting point to understand how Azure ML based models are created and deployed in the most easy to understand manner. The experiment (Azure ML Model … More Microsoft Azure ML Studio – A Tutorial on How to Create a Churn Model in No Time

Customer Churn Analysis: Using Logistic Regression to predict at Risk Customers

While we all know that the Linear Regression routines are pretty straightforward and easy to understand, where it clearly states that the value of an independent variable increases by 1 point, the dependent variable increases by b units. However, when it comes to predicting a discrete variable – for example, whether a customer will stay … More Customer Churn Analysis: Using Logistic Regression to predict at Risk Customers

Data Science or Spiritual Science: Your data has a false ego

On the heels of one of my previously written article “Data science and Spirituality: The Common Grounds” I am writing this article to further expand on this topic which is not only interesting but is also bewildering for many. In the Chapter 2 (Contents of the Gita Summarised) of Srimad Bhagwad Gita Text 71 says … More Data Science or Spiritual Science: Your data has a false ego

Understanding Confusion Matrix

Confusion Matrix is one of the most popular and widely used performance measurement technique for classification models. While it is super easy to understand, but its terminology can be confusing. Therefore, keeping the above premise under consideration this article aims to clear the “fog” around this amazing model evaluation system. To get things started I have … More Understanding Confusion Matrix

Types of Biases Analysts Need to Know

The tendency of over or under sampling populations while performing a particular experiment is known as Bias. A statistic is considered as biased if it is calculated in such a way that it is systematically different from the population parameter being estimated. For example, Jon is working on an experiment for his physical education class. He wants to explore how a … More Types of Biases Analysts Need to Know

How to be a self taught data scientist?

Here is a curated list of articles, tutorials and courses that one can use to become a ‘self taught data scientist’. This list contains free learning resource related to data science and big data related techniques and concepts. Each element within this list provides with the level indicators like Rookie, Intermediate, Expert, which can help … More How to be a self taught data scientist?