The very thought of using an algorithm to solve a murder mystery is really interesting and nerve tingling. Today we will take a different slant to understand the mechanics of Naive Bayes. In my previous blogs and articles I have written at length about the basics of this algorithm. Therefore, without getting in to the nuts of bolts of Naive Bayes I will simply start with this interesting journey.
Naive Bayes the Sherlock Holmes of algorithms
The Monday Morning Reds!
On a cold Monday morning I received a text on my mobile phone stating that there has been a murder in the nearby Shopping Complex Parking lot and I was required there “ASAP”. I talked to myself and said “Well seems to be Monday Morning Reds”, anyhow, I started from my office and while driving to the murder location I called some of my confidantes to get some information about who all were seen at the murder site. To my dismay, I was only able to get information about the gender types spotted near by the parking lot a day before the murder. Though it was not a great starting point but after some initial rigmarole I was able to get the CCTV footages to identify some common people who strolled around the parking lot.
The next step was to interview them and find some initial clues that might lead me to some solid intel about the murderer. The ratio of the gender that I interviewed was fifty fifty (Male and Female). My aim was to isolate the gender first and once the gender is isolated further drill down to get hold of the perpetrator.
As expected I was unable to get a lot of information post interviewing the suspects (I shouldn’t be calling them suspects. However, being the author of this so called murder mystery I will take that liberty) so talking to myself I said “I am stuck, do I have any option? Wow, I will be in trouble if I don’t get a solid clue soon”
While buried under behavioral, physical and age related data I started to question my capabilities as a detective then suddenly it occurred to me that I have a nerd friend who claims that he has solved murder mysteries by using basic data of the suspects with the help of some algorithms. For a while I wanted to laugh at myself of thinking about the possibility of solving a murder mystery using algorithm, still I wanted to take chances as I was so desperate to get a solid clue.
Next Pit Stop: The Nerd Mansion
It’s been one month since I have had any progress on this case and I am pressed hard against the time to get a solid start. Finally, I decided to visit my nerd friend with whatever data I had at my disposal. Upon explaining the entire situation to him, with a smirk on his face The Nerd Man says “I think I can help ya!!” ….I said really can you! He said “at least I can help you to isolate the gender which can lead you to a solid start” and I was like “Wow, let’s get cracking”.
The Nerd Man wanted me to leave him with the data that I had from the interview so that he can massage it per his requirements and extract the clues out of it. I left his mansion nearly after an hour or so and he promised me that with in a weeks’ time he will have everything that I need to have a good start on this case. While leaving my nerd friend’s mansion I was happy and excited as a bunny who found a fresh new carrot to chew on.
The day of Revelation
Exactly after one week I knocked the doors of the Nerd Mansion and saw my nerd friend wearing a whimsical smile on his face accompanied by a victorious laugh “Hahaha…didn’t I tell ya I will have something for you within one week and I can point you towards the probable perpetrator gender” and honestly I was not ready to accept that and wanted to get under the hood to see the real mustang.
I asked my friend if he can explain how he was so sure that he knows what he is saying and I accompanied him to his workstation to see the magic and this is what he explained.
He had to convert all the numbers to categories and bins which certainly required a lot of data wrangling and massaging.
Below is the process that he followed to answer the question
My nerd friend further shared the transformed data that looked something like this:
By now I was totally confused with what he did with the data that I supplied to him and before I further questioned him he said “Wait…now is the time when the real fun will begin, so hold on to your horses”. I was startled like a kid trying to find his imaginary friend and could find him anywhere.
So I let him continue with his data munging story. He explained that once he converted all the data into frequency table he started to calculate the likelihoods for each variable by multiplying it with the other variables.
Likelihood Calculation of a Male Suspect
Multiply all the % for each variable i.e. Dominating Behavior x 30-40 Age Group x Who’s Tall x Writes with Right Hand x With Short Hair and x who’s Male this gave him likelihood % for Male suspect & he performed similar steps to calculate the Female suspect likelihood.
Now it all started making sense to me as to how he helped to narrow down to a particular gender by using the behavioral and the physical appearance data that I provided. The last part of calculating the Probability of a Suspect based on the likelihood was killer and helped me to connect the dots to this amazing data driven approach.
Calculating the Male and Female Suspect Probabilities
Male Suspect = Divide the Male Suspect likelihood % with the sum of Male and Female Likelihood %
Female = Divide the Female Suspect likelihood % with the sum of Male and Female Likelihood %
This calculation clearly showed the probability of a Male suspect is 82 % higher compared to the Female Suspect which is only 17% based on the conditions that my friend selected using this amazing mathematical equation.
Now, I at least have a starting point for my investigation with some solid mathematical explanation attached to it and by the way out of curiosity I asked my nerd friend about the name of this cool equation and with a glint in his eyes he said “it is Naïve Bayes Theorem”