Common Data Mining Techniques: Key to detecting and preventing fraud

In my past article “Detecting Healthcare Fraud Using Speech and Data Analytics” I have described at length about the two most robust and common techniques used to identify and predict fraud. However, this article focuses purely on the most common Data Mining method CRISP-DM “Cross Industry Standard for Data Mining Process” that can help organizations to mine the data using some common techniques to identify fraud.

Before we proceed we should understand what is Data Mining?

“The process of discovering meaningful new relationships, patterns and trends by sifting through data using pattern recognition technologies as well as statistical and mathematical techniques.” – Gartner Group

Let’s look at some very common and basic fraud detecting techniques used based on the business motivation of either:

  1. Predicting or classifying a fraud
  2. Grouping or finding affinities/associations


  1. Techniques used to predict or classify a fraud
  • Regression algorithms: These algorithms predict a numeric outcome. Most commonly used algorithms are:
    1. Neural Networks
    2. Regression
    3. General Linear Modeling
  • Classification algorithms: These algorithms predict symbolic outcomes. Most commonly used algorithms are:
  1. CART
  2. Logistic Regression


  1. Techniques used to group and associate fraudulent transactions/events
    • Group and find association algorithms: K-means, Factor Analysis are amongst the most common and popular clustering and grouping algorithms used to detect fraud.
    • Association algorithms: Apriori algorithm is one of the most popular association algorithm used in the healthcare industry for frequent item set mining and association rule learning over the transaction database.

Use Case – 1: Regression


Use Case – 2: Decision Trees


Use Case – 3: Clustering and Association


Benefits of Fraud Detecting using CRISP DM

  • Lends itself as a systematic tool and methodology set to detect and prevent fraud
  • Helps to maximize the investigative efforts in auditing fraudulent transactions
  • Results in higher recoupments
  • Continually updates the model to identify new emerging abuse patterns

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s