SWK 3401.4-5.4 Chi-Square (χ2) Analysis Exercise
INTRODUCTION
The purpose of this activity is to learn how a chi-square (χ2)analysis is conducted and evaluated using Excel. We continue to use the Safe At Home data for this analysis. The investigators wish to know if there exists a relationship between how clients enter batterer treatment (nonvoluntarily through court order or voluntarily) and whether they complete or drop out from treatment before completion. In this case, the research question was: is there a significant relationship between program entry status and program completion? The null hypothesis being tested in this example is:
H0: No statistically significant relationship exists between program entry status and program completion.
INSTRUCTIONS
- Open the data file named battertx chisquare start.xlsx. You will need to select enable editing to work with this file.
- To conduct a chi-square test of independence (the null hypothesis of no relationship), we will need to reconfigure the data again. Excel wants to know the observed values in the analysis contingency table. In our example, we have a 2 x 2 contingency table: program entry status (nonvoluntary/court ordered vs. voluntary) by program completion status (completed vs. dropped out). Empty, that contingency table looks like this:
- We can fill in this contingency table by making Excel count cases for us. Start with placing labels in cells to help us keep things clear. The word nonvoluntary could go in cell E373, voluntary in cell F373, completed in cell D374, dropped out in D375, and totals in both G374 and D376.
- We know how to calculate the total number of cases: in an empty cell (G376 for the grand total in our contingency table) using the COUNT syntax for the column of ID numbers. The syntax would be =COUNT(A2:A372) and it shows the result of 369 cases total.
- The COUNTIF command asks Excel to count cases that meet certain criteria. For example, we could count all those who completed (comple=1) and then count all those who dropped out (comple=2); we could count all those who entered nonvoluntarily (entry=1) and those who entered voluntarily (entry=2). But this would not give us the 4 counts we need to fill those 2 x 2 cells. So, we need to ask Excel to do these COUNTIF procedures on just a range of cases each time, after we sort the data to create groups on the other variable.
- Thus, our next step is to sort the data by entry status. Then we can COUNTIF voluntary=1 in the range of cases where entry status was 1. The sort requires you to first, select all (remember—this keeps all the data for each case together as the same case), then you select “data” from the top menu bar, then “sort” from the menu below, sort by the “entry” variable on cell values, smallest to largest. That puts our nonvoluntary clients in the range of cells F2:F308. Our voluntary clients are in the range of cells F309:F370.
- The data that we want counted are in column M, the comple variable (for program completion). We want to count how many of them in the M2:M308 range (the nonvoluntary group with entry=1) have the value 1 for the comple variable (1 means completed the program). The syntax to type into the first empty cell of our contingency table (E374) would be =COUNTIF(M2:M308,1). The result is 134 cases.
- Repeat that process for the group with comple=2 (dropped out), so the syntax placed in cell E375 of out contingency table would be =COUNTIF(M2:M308,2) which gives the result of 173 cases.
- Next, we need to repeat these two steps with the group who entered voluntarily—their values for completions are in M309:M370, splitting at M308. So our next COUNTIF command would be =COUNTIF(M309:M370,1) placed in the proper cell of our contingency table (F374) to count those voluntary clients who completed treatment. The result is 27.
- Finally, we provide out 4thCOUNTIF command in the final contingency table cell (F375), for which the syntax is =COUNTIF(M309:M370,2). The result is 35.
- Now we are able to fill in the contingency table totals by asking Excel to compute the sum (∑) for the pairs of cells in each row or column. The total for:
- cell G374 is =SUM(E374:F374) which ends up being 161 cases;
- cell G375 is =SUM(E375:F375) which ends up being 208 cases;
- cell E376 is =SUM(E374:E375) which ends up being 307 cases; and
- cell F376 is =SUM(F374:F375) which ends up being 62 cases.
At this point, your contingency table in Excel should look like this:
- So far, so good! Next, Excel needs to compute the Expected values for each cell. Copy and paste your contingency table (D373 to G376) into a set of blank cells (I373—L376 would work well). Then empty the 4 cells with observed values (J374 to K375). You want to type in your 5 computed totals (307, 62, 161, and 208).
- Next, you are going to ask Excel to compute the proportions for each cell multiplying the associated column and row totals, then dividing by the overall total: (row * column)/n to compute the expected value for each cell:
- in cell J374 (nonvoluntary, completed) the syntax is =(L374*J376)/L376, and the expected value is 133.95 (with rounding).
- in cell K374 (voluntary, completed) the syntax is =(L374*K376)/L376, and the expected value is 27.05.
- in cell J375 (nonvoluntary, dropped out) the syntax is =(L375*J376)/L376, the expected value is 173.05.in cell K375 (voluntary, dropped out) the syntax is =(L375*K376)/L376, and the expected value is 34.95 (with rounding).
- Now, we are ready to ask Excel to calculate the probability that the difference between the observed and expected values (proportions) is zero—our null hypothesis. Remember that the formula for computing the χ2 test statistic is:
χ2=∑ (Or,c– Er,c)2/ Er,c In plain English, this means we are going to do the following steps:- For each cell, subtract the expected (E) value from the observed (O) value
- Compute the square of that result
- Divide that result by the expected (E) value for that cell
- Add up (sum) the results for all of the cells together (∑).
Excel needs syntax placed in (parentheses) to tell it the order of the operations—if we said O-E2/E, it would subtract E2/E from O and that would be a very different answer. The sign for squaring a value in Excel is a “^” symbol (the carrot top symbol over the number 6 on a keyboard).
Here is the way we are going to make all of this happen using Excel:
- In cell J378, the syntax is: =((E374-J374)^2)/J374 [the result in scientific notation 1.98E-05—a very small value]
- In cell K378, the syntax is: =((F374-K374)^2)/K374 [the result in scientific notation 9.8E-05—a very small value]
- In cell J379, the syntax is: =((E375-J375)^2)/J375 [the result in scientific notation 1.53E-05—a very small value]
- In cell K379, the syntax is: =((F375-K375)^2)/K375 [the result in scientific notation 7.59E-05—a very small value
- In a new, empty cell, we find the sum total of these four values: which is a very tiny number: 0.00029 (use the “sum” function to add up your four cells J378, K378, J379, and K379).
- Finally, we use a chi-square distribution table with the degrees of freedom for this test to find the criterion value for comparison of our computed χ2 test statistic. Our degrees of freedom are computed as: (number of rows – 1) * (number of columns-1). In our case, we have 2 rows and 2 columns, so our degrees of freedom are (2-1) * (2-1) which is 1*1, and the result is df=1 for our analysis.
- The comparison statistic comes from Excel if you place the following syntax into any empty cell: =CHISQ.INV.RT(0.05,df), where df is filled in for our degrees of freedom and our selected α is .05. In our case, the critical value syntax is: =CHISQ.INV.RT(0.05,1) and the result is 3.841 (with rounding). Since our test statistic (.000209) is smaller than our criterion value (3.84), we fail to reject the null hypothesis.
- In case you wish (or need) to report the test’s corresponding p-value, this is done with Excel by using the syntax =CHISQ.TEST(observed cell range, expected cell range). So, in our example, the syntax is: =CHISQ.TEST(E374:F375,J374:K375), and when this is placed in any empty cell the result is p=.988. Again, we fail to reject the null hypothesis because the p-value is greater than our α criterion of .05.
- Feel free to compare your working results with those presented in the file named battertx chisquare finish.xlsx. If you obtained similar results and drew the same conclusions, you have successfully completed the learning activity! Note that some of the valueds computed in this Excel workbook activity differ slightly from those in the coursebook—this is because missing values for the “comple” variable were filled in for this workbook and not for the coursebook.
A video that helps for a larger contingency table (2 x 3) is presented at the website https://www.youtube.com/watch?v=n08-9tqiDqc and a handout that explains the commands and steps is https://stat.utexas.edu/images/SSC/documents/VideoPDFs/Excel2016/Chi-Square_Independence_2016.pdf