As a psychology student, statistics are my worst nightmare. But I have put together a few things to remember when trying to decide which stats test to use and when. This is a very simple version, so please make sure you read up more about the different tests as there is much more to know.
First of all, ask yourself; which type of experiment has been used? If you have used a simple independent measures design, and the results meet the requirements for a parametric test then an independent t-test can be used. The t-test compares two sample means to check whether there is a statistically significant difference between them – possibly the most simple statistical test there is.
If, however, the requirements for a parametric test have not been met, the non-parametric equivalent for this test is the Mann Whitney U test. This test uses ranking of data in order to determine statistical significance.
On the other hand, if you have used a repeated measures design, the parametric test you should use is the related samples t-test. It is encouraged that we use this test wherever possible, as it is good for detecting the smaller effects of the independent variable.
The non-parametric equivalent for this test is Wilcoxon Signed Ranks, which again uses a ranking method, but is more suited for the repeated measures design.
Finally, there are occasions where you can only use the known population parameters in your experiment, and therefor cannot collect data from more than one sample – this is especially true in cases where the population is quite large. In this case, a one sample t-test can be used to work out the missing data – although it relies mostly on estimates.
The main problem with using categories is that they cannot be place in a meaningful order, for example, you cannot determine which Nationality is greater in value; French, German or British. But categories are often used in psychological tests, so there is still a way we can test the significance of our results if we have used categorical data.
In experiments where participants can only be a part of ONE category, such as Nationality, we use Chi Square. You can use this test with any number of variables. 2×2 Chi Squares are quite interesting – you can investigate the relationship between gender and smoking for example. However, the more variables you use, the more likely your results are to be insignificant.
Can you manipulate the Independent Variable?
In some cases, the independent variable cannot be manipulated. For example, if you’re investigating the relationship between self esteem and income, you can’t always manipulate the two variables as it would be unethical. But it is still possible to measure the variables using correlation and covariance.
Pearsons R Correlation Coefficient is the parametric test for correlations. It uses the covariance and standard deviation to calculate the correlation coefficient (expressed in this case as: r). It’s value has a minimum of -1, meaning a perfect negative correlation, and a maximum of 1, meaning a perfect positive correlation. A value of 0 means that there is no correlation at all.
For non-parametric correlations, Spearman’s Rho is used. Whilst computationally identical to Pearsons R, Spearman’s Rho converts data into ranks in order to eliminate the effect of extreme/outlier scores.
Where experiments measure the relationship between three or more variables, a Scatter Plot Matrix can be used as a quick way of producing multiple scatter plots. This way you can analyze many scatter plots at once to determine which ones are significant.
Correlation measures the scatter of data points around the line of best fit, simple regression assesses the GRADIENT of the relationship between variables. Linear Regression is very useful for making predictions based on the data that is available – such as predicting record sales from advertising). If you have more than one variable, and want to predict the outcome of another, then multiple regression can be used. This is a hypothetical model of the relationship between several variables, instead of just one.
There are three methods of regression that can be used by statistics programs (our University is a big fan of SPSS). The most popular one, I think, is the hierarchical model. This is where known predictors (based on past research) are entered into the model first, and the new predictors are entered as a separate step. It is based on theory testing which makes it the best method, and it allows you to see the predictive influence of a new variable on the outcome.
The second method is forced entry, where all the factors are entered into the model simultaneously. You can choose the variables you enter into the model, but the results obtained depend on which ones you choose – so make sure you have a good reason for the decision you make.
The final one is the stepwise model. For this model, the predictors entered are determined using their partial correlation with the outcome.
If the Independent Variable has more than one level.
When the independent variable you are using has more than one level, you need to conduct an ANALYSIS OF COVARIANCE (ANOVA). An example of this is if we gave participants anagrams to solve, one with no letters given, one with the first letter given, and one with the last letter given, we would need to use ANOVA to analyze our results.
For independent measures designs, we would use one way between groups ANOVA, and for repeated measures designs we would use one factor repeated measures ANOVA.
Again, the above statistical tests are used when parametric assumptions are met. The non-parametric tests for these are the Kruskall-Wallis Test (for independent measures) and the Friedman Test (for repeated measures).
When there is more than one variable that has more than one level (this is where things get really complicated) you can use a two way ANOVA – only when there are two variables. If there are three or more, then a three way mixed ANOVA is used.
These tests are used when the ANOVA is found to be significant. For repeated measures design, the most popular post-hoc test is Bonferroni. This test divides the significance level by the number of tests used to determine if the results are truly significant.
For independent measures, there are many post-hoc tests. Tukey’s Honestly Significant Differences (HSD) test compares everything between the means of each condition. This test is very useful if you have NO prior expectations of where the differences should lie. It can also be used for repeated measures designs, but it is not advised as it isn’t very effective.
Dunnetts test requires a standard t-test to run, and then it is compared by the Dunnetts test. It is certainly the most simple post-hoc test around.
REGWQ is the final post-hoc test used for independent measures. It is recommended when all the groups sizes are the same.
Following ANOVA, you can also carry out multiple t-tests (but the more you use, the less significant your results will be); planned comparisons or contrasts (hypothesis driven, but you need to plan them before experiment); or simple main effects (testing for the effect of one factor at each level of the other factor in a two way ANOVA).
When you already know that an extraneous variable is affecting the outcome/dependent variable, an analysis of covariance (ANCOVA) is used. This test controls the known extraneous variables while it tests for the difference between group means, and also reduces error.
Multiple Dependent Variables.
When we have several dependent variables, we can use multiple analysis of variance (MANCOVA) to test the differences between our group means. This is the alternative to using multiple ANOVA’s, which would cause our significance level to lower, much like with multiple t-tests.
I hope this has helped simplify statistics a little – just a basic outline of what test to use when. There are many other tests around, taught by different schools, used by different researchers. These are just the ones that I know.
- independent measures design: one independent variable and one dependent variable. Two conditions are used to test the hypothesis, and two different groups of participants take part in each condition. (Also known as “between participants design”)
- parametric test: these tests assume that the data is of a sufficient “quality” and are more sensitive to this. Simply put, the assumptions are; the sampling distribution is a NORMAL DISTRIBUTION, the data is either INTERVAL or RATIO scale, there are no extreme scores or outliers, and that variance is approximately equal.
- non-parametric test: these tests are used when the data SKEWS, is in UNIFORM layout, contains OUTLIERS, or if the VARIANCE of one participant group is three times bigger than the other. (Also known as “distribution free tests”)
- repeated measures design: one independent variable and one dependent variable. Two conditions are used, but only one group of participants take part in BOTH conditions. (Also known as “within participants design”).
- covariance: the measure of how much two variables change together (so, how much variation they have in common).
- standard deviation: the calculation of how much all the scores in a data set vary around the mean. It is expressed as:
- negative correlation: as one value increases, the other one decreases, and vice versa.
- positive correlation: both values increase together.
Written by: Philippa Berry
Special thanks to:
- Hierarchical models: special structures within repeated measures models. (freshbiostats.wordpress.com)
- Split Plot ANOVA (slideshare.net)
- Minitab Public Training for Manufacturing Quality Coming to Washington, D.C. April 23 (prweb.com)