SWK 3401.4-5.5 Correlation (r) Exercise
Introduction
The purpose of this activity is to learn how to calculate and evaluate a correlation coefficient using Excel. The practice data are, again, from the Safe At Home project. This time, the question asked by investigators concerns the existence of an association (relationship) between the two interval (continuous) variables of time since the precipitating incident of intimate partner violence (sincwk) and client readiness to change at program entry, prior to treatment beginning (redypre). Not only do they wish to know if there exists a significant association, but if one exists, what is its strength and direction. The total sample size was N=369.
Instructions
- Open the file named battertx correlation start.xlsx. You will need to select editing to work with this file.
- This time our data happen to be properly configured to conduct the correlation analysis. Yay! This test is as simple as knowing the range for values on each variable. The first variable, sincwk, contains data in cells G2:G370. The second variable, redypre, contains data in cells L2:L370.
- The syntax for the analysis is =CORREL(range variable 1, range variable 2). So, in our exercise, the syntax is =CORREL(G2:G370,L2:L370). Place this syntax in any empty cell, such as P.
- The correlation coefficient is quickly computed for you: r= -0.046. While this represents a negative association between the two variables, it also represents a very weak association (a low correlation coefficient, close to zero and far from +1 or =1). In a report, you would also include the degrees of freedom, which is N-2 in a 2 variable correlation analysis. Since we had N=369, we write this as: r(367)= -0.046.
- The p-value associated with this correlation coefficient (r) is computed by Excel using the Data Analysis tool called Regression. Select the “Data” tab in the top menu bar, then select “Data Analysis” in the far right upper level tool bar. Scroll down to Regression in the menu and select it. A dialogue box should open for you.
- In the dialogue box, for the Y variable range, put in the range for the second variable in the analysis including the label row. In our exercise, this is redypre and it runs (with label) from L1:L370. In the X variable range, put in the range for the first variable, again including the label row. In our exercise, this is sincwk and it runs (with the label) from G1:G370. Then be sure to check “label” since we used that row to keep everything clear.
- Select the confidence interval range at 95% and select a place for the output to appear—perhaps beginning in cell P10. When you give Excel the ok to run the analysis, a block of information should appear.
- We need only a portion of this information, and this is what it is telling us. First, the “multiple R” value since we have only 2 variables is the same as the absolute value of our correlation coefficient, r. This shows the strength of the association (but not the direction): 0.046.
- Next, we look in the ANOVA table at the significance F value. This is the p-value for our correlation coefficient: p=.38 (with rounding). You will notice that the exact same p-value is reported for our 2-variable analysis in the next section for the first variable, sincwk. It is in cell T27. This is another way of finding the p-value for our correlation.
- Now you can compare the computed p-value (.38) with the decision rule where α=.05. Since .38 is not less than .05, we fail to reject the null hypothesis of no association. In other words, it is quite possible (but not a guarantee) that no association exists between these two variables.
- If you like, you can compare your output to the file named battertx correlation finish.xlsx. If you obtained similar results and reached similar conclusions, then you have successfully completed this learning activity!