Linear Model better known as linear regression is one of the most common and flexible analysis framework to identify relationship between two or more variables. The widely used linear model is represented by drawing the best fit line through a series of data points represented on a scatter plot.
For any budding business analyst this should be the starting point to understand how model works at the very core of its design.
Selecting the Variables in Deducer GUI:
- Outcome Variable: Y or the dependent variable should be put in this list
- As Numeric: Independent variable that should be treated as covariates should be put in this section. Deducer automatically converts a factor into a numeric variable, so make sure that the order of the factor level is correct
- As Factor: Categorical independent variables (Language, Ethnicity etc.)
- Weights: This option allows the users to apply sampling weights to the regression model
- Subset: Helps to define if the analysis needs to be done within a subset of the whole data set
Note: Only one outcome is allowed. It can also be transformed by double clicking on it. Example Log transform weight for the analysis, can be changed to log(weight).
The users can add terms to the model by selecting one of more variables from the variable list.
- 2-wayAdd all two way and lower interactions between the selected variables.
- 3-wayAdd all three way and lower interactions between the selected variables.
- + Add main effects for all the selected variables
- : Add interaction between selected models
- * Add interaction in between the selected terms, as well as any lower order interactions with them
- – Remove Term
- In Add nested terms
- Poly Add orthogonal polynomial terms to the model
Exploring the Model
Post Model creation, using this tab the features of the model can be explored. The preview panel displays a preview of what will be displayed in the console when the model is run. In the upper left hand portion of the dialog there are icons representing the assumptions that are being made by the model.
The above interactive console provides the following options to perform some detailed analysis:
- Option: This controls the main tests and diagnostic summaries of the model
- ANOVA Table
- Summary Table
- Unequal Variance
- Diagnostics – VIF (Variance Inflation Factors), Influence Summary
- Post Hoc: Helps to compare between the levels of factors
- Post Hoc: The factors for which it should be calculated
- Type: Comparison Type. Example Tukey does all the pairwise comparisons
- Estimate CI: Should confidence intervals be calculated
- Corrections: Correct the p-values and CI if the factor has >2 levels
- Tests: Customer hypothesis test based on the model parameters
- Plots: Visualize the marginal effects of the model
- Point wise intervals: Plot point wise CI
- Y-axis labels: labels for the y-axis plots
- Multiple Lines Per Panel: If the effect is an interaction effect, this option decides if the interaction should be plotted on multiple lines with in the same panel or as separate panels
- Rug: small lines on the x axis denoting the data distributions
- # of Levels: Number of levels for which the effect should be calculated
- Means (Marginal Means): Just like the effects plots, the marginal means are the estimated means based on the model’s outcome variable across the levels of a terms given the other terms are static or at the typical level.
- Export: Linear model export allows its users to export number of relevant variables related to the model
Diagnostic Tab (Top of the preview window)
This panel contains 6 plots evaluating the outlier, influence and equality of variance
The above two plots show the distribution of the residuals and ideally these should be normal.
Residual vs. Fitted: Shows the residuals of the model plotted against the predicted values. If the red line is not flat, then the model may have significant non-linearity.
Scale Location: Plots the predicted values vs. the square root of the standardized residuals. Also, known as Spread vs. Level
Cooks Distance: Linear model is sensitive to outliers that can unduly influence the results of the model. Therefore, the cooks distance helps the analysts to identify observations with Cook’s values that are greater than 1.
Residuals vs. Leverage: Another plot to examine outliers and influence
Term Plots: Also known as Component or Partial Residual Plots
For models without interactions, component residual plots are given. These can be used to examine the linearity of the relationship in between the predictor and outcome variables.
- For numeric variables a scatter plot is produced
- For factors a box plot is generated
Added Variable Plots
Just like the term plots, added variable plots are used to examine the linearity of covariates. It is highly recommended when there are no term plots available.
In a nutshell Deducer is one of the most functional GUIs with the potential of mass appeal. The ease of use that Deducer offers to its users is second to none. Deducer continues to amaze everyone by accepting file formats for the leading statistical software like:
Being a Java based GUI it competes with its rivals like SAS and SPSS without compromising on the quality of output. Especially for businesses and individuals with tight budgets, Deducer can be deployed without spending hundreds and thousands of dollars.