8 ALTERNATIVE CORRELATION
8.1 Correlations for Different Types of Variables
Pearson correlation is generally introduced as a method to evaluate the strength of linear association between scores on two quantitative variables, an X predictor and a Y outcome variable. If the scores on X and Y are at least interval level of measurement and if the other assumptions for Pearson’s r are satisfied (e.g., X and Y are linearly related, X and Y have a bivariate normal distribution), then Pearson’s r is generally used to describe the strength of the linear relationship between variables. However, we also need to have indexes of correlation for pairs of variables that are not quantitative or that fall short of having equal-interval level of measurement properties or that have joint distributions that are not bivariate normal. This chapter discusses some of the more widely used alternative bivariate statistics that describe strength of association or correlation for different types of variables.
When deciding which index of association (or which type of correlation coefficient) to use, it is useful to begin by identifying the type of measurement for the X and Y variables. The X or independent variable and the Y or dependent variable may each be any of the following types of measurement:
- Quantitative with interval/ratio measurement properties 2. Quantitative but only ordinal or rank level of measurement 3. Nominal or categorical with more than two categories 4. Nominal with just two categories
- A true dichotomy b. An artificial dichotomy
Table 8.1 presents a few of the most widely used correlation statistics. For example, if both X and Y are quantitative and interval/ratio (and if the other assumptions for Pearson’s r are satisfied), the Pearson product- moment correlation (discussed in Chapter 7) is often used to describe the strength of linear association between scores on X and Y. If the scores come in the form of rank or ordinal data or if it is necessary to convert scores into ranks to get rid of problems such as severely nonnormal distribution shapes or outliers, then Spearman r or Kendall’s tau (τ) may be used. If scores on X correspond to a true dichotomy and scores on Y are interval/ratio level of measurement, the point biserial correlation may be used. If scores on X and Y both correspond to true dichotomies, the phi coefficient (Φ) can be reported. Details about computation and interpretation of these various types of correlation coefficients appear in the following sections of this chapter.
Some of the correlation indexes listed in Table 8.1, including Spearman r, point biserial r, and the phi coefficient, are equivalent to Pearson’s r. For example, a Spearman r can be obtained by converting scores on X and Y into ranks (if they are not already in the form of ranks) and then computing Pearson’s r for the ranked scores. A point biserial r can be obtained by computing Pearson’s r for scores on a truly dichotomous X variable that typically has only two values (e.g., gender, coded 1 = female, 2 = male) and scores on a quantitative Y variable (such as heart rate, HR). (The use of gender or sex as an example of a true dichotomy could be questioned. Additional categories such as transsexual could be included in some studies.) A phi coefficient can be obtained by computing Pearson’s r between scores on two true dichotomies (e.g., Does the person take a specific drug? 1 = no, 2 = yes; Does the person have a heart attack within 1 year? 1 = no, 2 = yes). Alternative computational formulas are available for Spearman r, point biserial r, and the phi coefficient, but the same numerical results can be obtained by applying the formula for Pearson r. Thus, Spearman r, point biserial r, and phi coefficient are equivalent to Pearson’s r. Within SPSS, you obtain the same results when you use the Pearson’s r procedure to compute a correlation between drug use (quantitative variable) and death (true dichotomy variable) as when you request a phi coefficient between drug use and death in the Crosstabs procedure. On the other hand, some of the other correlation statistics listed in Table 8.1 (such as the tetrachoric correlation rtet, biserial r, and Kendall’s tau) are not equivalent to Pearson’s r.
For many combinations of variables shown in Table 8.1, several different ****************
statistics can be reported as an index of association. For example, for two truly dichotomous variables, such as drug use and death, Table 8.1 lists the phi coefficient as an index of association, but it is also possible to report other statistics such as chi-square and Cramer’s V, described in this chapter, or log odds ratios, described in Chapter 23 on binary logistic regression.
Later chapters in this textbook cover statistical methods that are implicitly or explicitly based on Pearson’s r values and covariances. For example, in multiple regression (in Chapters 11 and 14), the slope coefficients for regression equations can be computed based on sums of squares and sums of cross products based on the X and Y scores, or from the Pearson correlations among variables and the means and standard deviations of variables. For example, we could predict a person’s HR from that person’s scores on several different X predictors (X1 = gender, coded 1 = female, 2 = male; X2 = age in years; X3 = body weight in pounds):
Table 8.1 Widely Used Correlations for Various Types of Independent Variables (X) and Dependent Variables (Y) (Assuming That Groups Are Between Subjects or Independent)
- There may be no purpose-designed statistic for some combinations of types of variables, but it is usually possible to downgrade your assessment of the level of measurement of one or both variables. For example, if you have an X variable that is interval/ratio and a Y variable that is ordinal, you could convert scores on X to ranks and use Spearman r. It might also be reasonable to apply Pearson’s r in this situation.
- In practice, researchers do not always pay attention to the existence of artificial dichotomies when they select statistics. Tetrachoric r and biserial r are rarely reported.
When researchers use dichotomous variables (such as gender) as predictors in multiple regression, they implicitly assume that it makes sense to use Pearson’s r to index the strength of relationship between scores on gender and scores on the outcome variable HR. In this chapter, we will examine the point biserial r and the phi coefficient and demonstrate by example that they are equivalent to Pearson’s r. An important implication of ****************
this equivalence is that we can use true dichotomous variables (also called dummy variables; see Chapter 12) as predictors in regression analysis.
However, some problems can arise when we include dichotomous predictors in correlation-based analyses such as regression. In Chapter 7, for example, it was pointed out that the maximum value of Pearson’s r, r = +1.00, can occur only when the scores on X and Y have identical distribution shapes. This condition is not met when we correlate scores on a dichotomous X variable such as gender with scores on a quantitative variable such as HR.
Many of the statistics included in Table 8.1, such as Kendall’s tau, will not be mentioned again in later chapters of this textbook. However, they are included because you might encounter data that require these alternative forms of correlation analysis and they are occasionally reported in journal articles.
8.2 Two Research Examples
To illustrate some of these alternative forms of correlation, two small datasets will be used. The first dataset, which appears in Table 8.2 and Figure 8.1, consists of a hypothetical set of scores on a true dichotomous variable (gender) and a quantititave variable that has interval/ratio level of measurement properties (height). The relationship between gender and height can be assessed by doing an independent samples t test to compare means on height across the two gender groups (as described in Chapter 5). However, an alternative way to describe the strength of association between gender and height is to calculate a point biserial correlation, rpb, as shown in this chapter.
The second set of data come from an actual study (Friedmann, Katcher, Lynch, & Thomas, 1980) in which 92 men who had a first heart attack were asked whether or not they owned a dog. Dog ownership is a true dichotomous variable, coded 0 = no and 1 = yes; this was used as a predictor variable. At the end of a 1-year follow-up, the researchers recorded whether each man had survived; this was the outcome or dependent variable. Thus, the outcome variable, survival status, was also a true dichotomous variable, coded 0 = no, did not survive and 1 = yes, did survive. The question in this study was whether survival status was predictable from dog ownership. The strength of association between these two true dichotomous variables can be indexed by several different statistics, including the phi coefficient; a test of statistical
significance of the association between two nominal variables can be obtained by performing a chi-square (χ2) test of association. The data from the Friedmann et al. (1980) study appear in the form of a data file in Table 8.3 and as a summary table of observed cell frequencies in Table 8.4.
Similarities among the indexes of association (correlation indexes) covered in this chapter include the following:
Table 8.2 Data for the Point Biserial r Example: Gender (Coded 1 = Male and 2 = Female) and Height in Inches
- The size of r (its absolute magnitude) provides information about the strength of association between X and Y. In principle, the range of possible values for the Pearson correlation is −1 ≤ r ≤ +1; however, in practice, the maximum possible values of r may be limited to a narrower range. Perfect correlation (either r = +1 or r = −1) is possible only when the X, Y scores have identical distribution shapes.
Figure 8.1 Scatter Plot for Relation Between a True Dichotomous Predictor (Gender) and a Quantitative Dependent Variable (Height)
NOTE: M1 = mean male height; M2 = mean female height.
When distribution shapes for X and Y differ, the maximum possible correlation between X and Y is often somewhat less than 1 in absolute value. For the phi coefficient, we can calculate the maximum possible value of phi given the marginal distributions of X and Y. Some indexes of association covered later in the textbook, such as the log odds ratios in Chapter 23, are scaled quite differently and are not limited to values between −1 and +1.
- For correlations that can have a plus or a minus sign, the sign of r provides information about the direction of association between scores on X and scores on Y. However, in many situations, the assignment of lower versus higher scores is arbitrary (e.g., gender, coded 1 = female, 2 = male), and in such situations, researchers need to be careful to pay attention to the codes that were used for categories when they interpret the sign of a correlation. Some types of correlation (such as η and Cramer’s V) have a range from 0 to +1— that is, they are always positive.
- Some (but not all) of the indexes of association discussed in this chapter are equivalent to Pearson’s r.
Ways in which the indexes of association may differ:
- The interpretation of the meaning of these correlations varies. Chapter 7 described two useful interpretations of Pearson’s r. One involves the “mapping” of scores from zX to zY, or the prediction of a score from a zX score for each individual participant. Pearson’s r of 1 can occur only when there is an exact one-to-one correspondence between distances from the mean on X and distances from the mean on Y, and that in turn can happen only when X and Y have identical distribution shapes. A second useful interpretation of Pearson’s r was based on the squared correlation (r2). A squared Pearson correlation can be interpreted as “the proportion of variance in Y scores that is linearly predictable from X,” and vice versa. However, some of the other correlation indexes—even though they are scaled so that they have the same range from −1 to + 1 like Pearson’s r—have different interpretations.
Table 8.3 Dog Ownership/Survival Data
SOURCE: Friedmann et al. (1980).
NOTE: Prediction of survival status (true dichotomous variable) from dog ownership (true dichotomous variable): Dog ownership: 0 = no, 1 = yes. Survival status: 0 = no, did not survive; 1 = yes, did survive.
Table 8.4 Dog Ownership and Survival Status 1 Year After the First Heart Attack
SOURCE: Friedmann et al. (1980).
NOTE: The table shows the observed frequencies for outcomes in a survey study of N = 92 men who have had a first heart attack. The frequencies in the cells denoted by b and c represent concordant outcomes (b indicates answer “no” for both variables, c indicates answer “yes” for both variables). The frequencies denoted by a and d represent discordant outcomes (i.e., an answer of “yes” for one variable and “no” for the other variable). When calculating a phi coefficient by hand from the cell frequencies in a 2 × 2 table, information about the frequencies of concordant and discordant outcomes is used.
- Some of the indexes of association summarized in this chapter are applicable only to very specific situations (such as 2 × 2 tables), while other indexes of association (such as the chi-square test of association) can be applied in a wide variety of situations.
- Most of the indexes of association discussed in this chapter are symmetrical. For example, Pearson’s r is symmetrical because the correlation between X and Y is the same as the correlation between Y and X. However, there are some asymmetrical indexes of association (such as lambda and Somers’s d). There are some situations where the ability to make predictions is asymmetrical; for example, consider a study about gender and pregnancy. If you know that an individual is pregnant, you can predict gender (the person must be female) perfectly. However, if you know that an individual is female, you cannot assume that she is pregnant. For further discussion of asymmetrical indexes of association, see Everitt (1977).
- Some of the indexes of association for ordinal data are appropriate when there are large numbers of tied ranks; others are appropriate only when there are not many tied ranks. For example, problems can arise when computing Spearman r using the formula that is based on differences between ranks. Furthermore, indexes that describe strength of association between categorical variables differ in the way they handle tied scores; some statistics subtract the number of ties when they evaluate the numbers of concordant and discordant pairs (e.g., Kendall’s tau), while other statistics ignore cases with tied ranks.
8.3 Correlations for Rank or Ordinal Scores
Spearman r is applied in situations where the scores on X and Y are both in the form of ranks or in situations where the researcher finds it necessary or useful to convert X and Y scores into ranks to get rid of problems such as extreme outliers or extremely nonnormal distribution shapes. One way to obtain Spearman r, in by-hand computation, is as follows. First, convert scores on X into ranks. Then, convert scores on Y into ranks. If there are ties, assign the mean of the ranks for the tied scores to each tied score. For example, consider this set of X scores; the following example shows how ranks are assigned to scores, including tied ranks for the three scores equal to 25:
X Rank of X: R X 30 1 28 2 25 (3 + 4 + 5)/3 = 4 25 (3 + 4 + 5)/3 = 4 25 (3 + 4 + 5)/3 = 4 24 6 20 7 12 8
For each participant, let di be the difference between ranks on the X and Y variables. The value of Spearman r, denoted by rs, can be found in either of two ways:
- Compute the Pearson correlation between RX (rank on the X scores) and RY (rank on the Y scores).
- Use the formula below to compute Spearman r (rs) from the differences in ranks:
where di = the difference between ranks = (RX − RY) and n = the number of pairs of (X, Y) scores or the number of di differences.
If rs = +1, there is perfect agreement between the ranks on X and Y; if rs = −1, the rank orders on X and Y are perfectly inversely related (e.g., the person with the highest score on X has the lowest score on Y).
Another index of association that can be used in situations where X and Y are either obtained as ranks or converted into ranks is Kendall’s tau; there are two variants of this called Kendall’s tau-b and Kendall’s tau-c. In most cases, the values of Kendall’s τ and Spearman r lead to the same conclusion about the nature of the relationship between X and Y. See Liebetrau (1983) for further discussion.
8.4 Correlations for True Dichotomies
Most introductory statistics books only show Pearson’s r applied to pairs of quantitative variables. Generally, it does not make sense to apply Pearson’s r in situations where X and/ or Y are categorical variables with more than two categories. For example, it would not make sense to compute a Pearson correlation to assess whether the categorical variable, political party membership (coded 1 = Democrat, 2 = Republican, 3 = Independent, 4 = Socialist, etc.), is related to income level. The numbers used to indicate party membership serve only as labels and do not convey any quantitative information about differences among political parties. The mean income level could go up, go down, or remain the same as the X scores change from 1 to 2, 2 to 3, and so on; there is no reason to expect a consistent linear increase (or decrease) in income as the value of the code for political party membership increases.
However, when a categorical variable has only two possible values (such as gender, coded 1 = male, 2 = female, or survival status, coded 1 = alive, 0 = dead), we can use the Pearson correlation and related correlation indexes to relate them to other variables. To see why this is so, consider this example: X is gender (coded 1 = male, 2 = female); Y is height, a quantitative variable (hypothetical data appear in Table 8.2, and a graph of these scores is shown in Figure 8.1). Recall that Pearson’s r is an index of the linear relationship between scores on two variables. When X is dichotomous, the only possible
relation it can have with scores on a continuous Y variable is linear. That is, as we move from Group 1 to Group 2 on the X variable, scores on Y may increase, decrease, or remain the same. In any of these cases, we can depict the X, Y relationship by drawing a straight line to show how the mean Y score for X = 1 differs from the mean Y score for X = 2.
See Figure 8.1 for a scatter plot that shows how height (Y) is related to gender (X); clearly, mean height is greater for males (Group 1) than for females (Group 2). We can describe the relationship between height (Y) and gender (X) by doing an independent samples t test to compare mean Y values across the two groups identified by the X variable, or we can compute a correlation (either Pearson’s r or a point biserial r) to describe how these variables are related. We shall see that the results of these two analyses provide equivalent information. By extension, it is possible to include dichotomous variables in some of the multivariable analyses covered in later chapters of this book. For example, when dichotomous variables are included as predictors in a multiple regression, they are usually called “dummy” variables. (see Chapter 12). First, however, we need to consider one minor complication.
Pearson’s r can be applied to dichotomous variables when they represent true dichotomies—that is, naturally occurring groups with just two possible outcomes. One common example of a true dichotomous variable is gender (coded 1 = male, 2 = female); another is survival status in a follow-up study of medical treatment (1 = patient survives; 0 = patient dies). However, sometimes we encounter artificial dichotomies. For instance, when we take a set of quantitative exam scores that range from 15 to 82 and impose an arbitrary cutoff (scores less than 65 are fail, scores greater than 65 are pass), this type of dichotomy is “artificial.” The researcher has lost some of the information about variability of scores by artificially converting them to a dichotomous group membership variable.
When a dichotomous variable is an artificially created dichotomy, there are special types of correlation; their computational formulas involve terms that attempt to correct for the information about variability that was lost in the artificial dichotomization. (Fitzsimons, 2008, has argued that researchers should never create artificial dichotomies because of the loss of information; however, some journal articles do report artificially dichotomized scores.) The correlation of an artificial dichotomy with a quantitative variable is called a biserial r (rb); the correlation between two artificial dichotomies is
called a tetrachoric r (rtet). These are not examples of Pearson’s r; they use quite different computational procedures and are rarely used.
8.4.1 Point Biserial r (rpb)
If a researcher has data on a true dichotomous variable (such as gender) and a continuous variable (such as emotional intelligence, EI), the relationship between these two variables can be assessed by calculating a t test to assess the difference in mean EI for the male versus female groups or by calculating a point biserial r to describe the increase in EI scores in relation to scores on gender. The values of t and rpb are related, and each can easily be converted into the other. The t value can be compared with critical values of t to assess statistical significance. The rpb value can be interpreted as a standardized index of effect size, or the strength of the relationship between group membership and scores on the outcome variable:
In this equation, df = N – 2; N = total number of subjects. The sign of rpb can be determined by looking at the direction of change in Y across levels of X. This conversion between rpb and t is useful because t can be used to assess the statistical significance of rpb, and rpb or can be used as an index of the effect size associated with t.
To illustrate the correspondence between r.pb and t, SPSS was used to run two different analyses on the hypothetical data shown in Figure 8.1. First, the independent samples t test was run to assess the significance of the difference of mean height for the male versus female groups (the procedures for running
an independent samples t test using SPSS were presented in Chapter 5). The results are shown in the top two panels of Figure 8.2. The difference in mean height for males (M1 = 69.03) and females (M2 = 63.88) was statistically significant, t(66) = 9.69, p < .001. The mean height for females was about 5 in. lower than the mean height for males. Second, a point biserial correlation between height and gender was obtained by using the Pearson correlation procedure in SPSS: <Analyze> → <Correlate> → <Bivariate>. The results for this analysis are shown in the bottom panel of Figure 8.2; the correlation between gender and height was statistically significant, rpb(66) = −.77, p < .001. The nature of the relationship was that having a higher score on gender (i.e., being female) was associated with a lower score on height. The reader may wish to verify that when these values are substituted into Equations 8.3 and 8.4, the rpb value can be reproduced from the t value and the t value can be obtained from the value of rpb. Also, note that when η2 is calculated from
the value of t as discussed in Chapter 5, η2 is equivalent to .
Figure 8.2 SPSS Output: Independent Samples t Test (Top) and Pearson’s r (Bottom) for Data in Figure 8.1
This demonstration is one of the many places in the book where readers will see that analyses that were introduced in different chapters in most
introductory statistics textbooks turn out to be equivalent. This occurs because most of the statistics that we use in the behavioral sciences are special cases of a larger data analysis system called the general linear model. In the most general case, the general linear model may include multiple predictor and multiple outcome variables, and it can include one or more quantitative and dichotomous variables on the predictor side of the analysis and one or more quantitative or measured variables as outcome variables (Tatsuoka, 1993). Thus, when we predict a quantitative Y from a quantitative X variable, or a quantitative Y from a categorical X variable, these are special cases of the general linear model where we limit the number and type of variables on one or both sides of the analysis (the predictor and the dependent variable). (Note that there is a difference between the general linear model and the generalized linear model; only the general model, which corresponds to the SPSS general linear model or GLM procedure, is covered in this book.)
8.4.2 Phi Coefficient (Φ)
The phi coefficient (Φ) is the version of Pearson’s r that is used when both X and Y are true dichotomous variables. It can be calculated from the formulas given earlier for the general Pearson’s r using score values of 0 and 1, or 1 and 2, for the group membership variables; the exact numerical value codes that are used do not matter, although 0, 1 is the most conventional representation. Alternatively, phi can be computed from the cell frequencies in a 2 × 2 table that summarizes the number of cases for each combination of X and Y scores. Table 8.5 shows the way the frequencies of cases in the four cells of a 2 × 2 table are labeled to compute phi from the cell frequencies. Assuming that the cell frequencies a through d are as shown in Table 8.5 (i.e., a and d correspond to “discordant” outcomes and b and c correspond to “concordant” outcomes), here is a formula that may be used to compute phi directly from the cell frequencies:
where b and c are the number of cases in the concordant cells of a 2 × 2 table and a and d are the number of cases in the discordant cells of a 2 × 2 table.
In Chapter 7, you saw that the Pearson correlation turned out to be large
and positive when most of the points fell into the concordant regions of the X, Y scatter plot that appeared in Figure 7.16 (high values of X paired with high values of Y and low values of X paired with low values of Y). Calculating products of z scores was a way to summarize the information about score locations in the scatter plot and to assess whether most cases were concordant or discordant on X and Y. The same logic is evident in the formula to calculate the phi coefficient. The b × c product is large when there are many concordant cases; the a × d product is large when there are many discordant cases. The phi coefficient takes on its maximum value of +1 when all the cases are concordant (i.e., when the a and d cells have frequencies of 0). The Φ coefficient is 0 when b × c = a × d—that is, when there are as many concordant as discordant cases.
Table 8.5 Labels for Cell Frequencies in a 2 × 2 Contingency Table (a) as Shown in Most Textbooks and (b) as Shown in Crosstab Tables From SPSS
NOTES: Cases are called concordant if they have high scores on both X and Y or low scores on both X and Y. Cases are called discordant if they have low scores on one variable and high scores on the other variable. a = Number of cases with X low and Y high (discordant), b = number of cases with X high and Y high (concordant), c = number of cases with Y low and X low (concordant), and d = number of cases with X low and Y high (discordant). In textbook presentations of the phi coefficient, the 2 × 2 table is usually oriented so that values of X increase from left to right and values of Y increase from bottom to top (as they would in an X, Y scatter plot). However, in the Crosstabs tables produced by SPSS, the arrangement of the rows is different (values of Y increase as you read down the rows in an SPSS table). If you want to calculate a phi coefficient by hand from the cell frequencies that appear in the SPSS Crosstabs output, you need to be careful to look at the correct cells for information about concordant and discordant cases. In most textbooks, as shown in this table, the concordant cells b and c are in the major diagonal of the 2 × 2 table—that is, the diagonal that runs from lower left to upper right. In SPSS Crosstabs output, the concordant cells b and c are in the minor diagonal—that is, the diagonal that runs from upper left to lower right.
A formal significance test for phi can be obtained by converting it into a chi-square; in the following equation, N represents the total number of scores ****************
in the contingency table:
This is a chi-square statistic with 1 degree of freedom (df). Those who are familiar with chi-square from other statistics courses will recognize it as one of the many possible statistics to describe relationships between categorical variables based on tables of cell frequencies. For χ2 with 1 df and α = .05, the critical value of χ2 is 3.84; thus, if the obtained χ2 exceeds 3.84, then phi is statistically significant at the .05 level.
When quantitative X and Y variables have different distribution shapes, it limits the maximum possible size of the correlation between them because a perfect one-to-one mapping of score location is not possible when the distribution shapes differ. This issue of distribution shape also applies to the phi coefficient. If the proportions of yes/no or 0/1 codes on the X and Y variables do not match (i.e., if p1, the probability of a yes code on X, does not equal p2, the probability of a yes code on Y), then the maximum obtainable size of the phi coefficient may be much less than 1 in absolute value. This limitation on the magnitude of phi occurs because unequal marginal frequencies make it impossible to have 0s in one of the diagonals of the table (i.e., in a and d, or in b and c).
For example, consider the hypothetical research situation that is illustrated in Table 8.6. Let’s assume that the participants in a study include 5 dead and 95 live subjects and 40 Type B and 60 Type A personalities and then try to see if it is possible to arrange the 100 cases into the four cells in a manner that results in a diagonal pair of cells with 0s in it. You will discover that it can’t be done. You may also notice, as you experiment with arranging the cases in the cells of Table 8.6, that there are only six possible outcomes for the study—depending on the way the 5 dead people are divided between Type A and Type B personalities; you can have 0, 1, 2, 3, 4, or 5 Type A/dead cases, and the rest of the cell frequencies are not free to vary once you know the number of cases in the Type A/dead cell.1
It is possible to calculate the maximum obtainable size of phi as a function of the marginal distributions of X and Y scores and to use this as a point of reference in evaluating whether the obtained phi coefficient was relatively large or small. The formula for Φmax is as follows:
That is, use the larger of the values p1 and p2 as pj in the formula above. For instance, if we correlate an X variable (coronary-prone personality, coded Type A = 1, Type B = 0, with a 60%/40% split) with a Y variable (death from heart attack, coded 1 = dead, 0 = alive, with a 5%/95% split), the maximum possible Φ that can be obtained in this situation is about .187 (see Table 8.6).
Table 8.6 Computation of Φmax for a Table With Unequal Marginals
NOTE: To determine what the maximum possible value of Φ is given these marginal probabilities, apply Equation 8.7:
Because p1 (.60) > p2 (.05), we let pj = .60, qj = .40; pi = .05, qi = .95:
Essentially, Φmax is small when the marginal frequencies are unequal because there is no way to arrange the cases in the cells that would make the frequencies in both of the discordant cells equal zero. One reason why correlations between measures of personality and disease outcomes are typically quite low is that, in most studies, the proportions of persons who die, have heart attacks, or have other specific disease outcomes of interest are quite small. If a predictor variable (such as gender) has a 50/50 split, the maximum possible correlation between variables such as gender and heart attack may be quite small because the marginal frequency distributions for the variables are so different. This limitation is one reason why many researchers now prefer other ways of describing the strength of association, such as the odds ratios that can be obtained using binary logistic regression (see Chapter 21).
8.5 Correlations for Artificially Dichotomized Variables
Artificial dichotomies arise when researchers impose an arbitrary cutoff point on continuous scores to obtain groups; for instance, students may obtain a continuous score on an exam ranging from 1 to 100, and the teacher may impose a cutoff to determine pass/fail status (1 = pass, for scores of 70 and above; 0 = fail, for scores of 69 and below). Special forms of correlation may be used for artificial dichotomous scores (biserial r, usually denoted by rb, and tetrachoric r, usually denoted by rtet). These are rarely used; they are discussed only briefly here.
8.5.1 Biserial r (rb)
Suppose that the artificially dichotomous Y variable corresponds to a “pass” or “fail” decision. Let MXp be the mean of the quantitative X scores for the pass group and p be the proportion of people who passed; let MXq be the mean of the X scores for the fail group and q be the proportion of people who failed. Let h be the height of the normal distribution at the point where the pass/fail cutoff was set for the distribution of Y. Let sX be the standard deviation of all the X scores. Then,
(from Lindeman et al., 1980, p. 74). Tables for the height of the normal distribution curve h are not common, and therefore this formula is not very convenient for by-hand computation.
8.5.2 Tetrachoric r (rtet)
Tetrachoric r is a correlation between two artificial dichotomies. The trigonometric functions included in this formula provide an approximate adjustment for the information about the variability of scores, which is lost when variables are artificially dichotomized; these two formulas are only approximations; the exact formula involves an infinite series.
The cell frequencies are given in the following table:
where b and c are the concordant cases (the participant has a high score on X and a high score on Y, or a low score on X and a low score on Y); a and d are the discordant cases (the participant has a low score on X and a high score on Y, or a high score on X and a low score on Y), and n = the total number of scores, n = a + b + c + d.
If there is a 50/50 split between the number of 0s and the number of 1s on both the X and the Y variables (this would occur if the artificial dichotomies were based on median splits)—that is, if (a + b) = (c + d) and (a + c) = (b + d), then an exact formula for tetrachoric r is as follows:
However, if the split between the 0/1 groups is not made by a median split on one or both variables, a different formula provides an approximation for tetrachoric r that is a better approximation for this situation:
(from Lindeman et al., 1980, p. 79).
8.6 Assumptions and Data Screening for Dichotomous Variables
For a dichotomous variable, the closest approximation to a normal distribution would be a 50/50 split (i.e., half zeros and half ones). Situations where the group sizes are extremely unequal (e.g., 95% zeros and 5% ones) should be avoided for two reasons. First, when the absolute number of subjects in the smaller group is very low, the outcome of the analysis may be greatly influenced by scores for one or a few cases. For example, in a 2 × 2 contingency table, when a row has a very small total number of cases (such as five), then the set of cell frequencies in the entire overall 2 × 2 table will
be entirely determined by the way the five cases in that row are divided between the two column categories. It is undesirable to have the results of your study depend on the behavior of just a few scores. When chi-square is applied to contingency tables, the usual rule is that no cell should have an expected cell frequency less than 5. A more appropriate analysis for tables where some rows or columns have very small Ns and some cells have expected frequencies less than 5 is the Fisher exact test (this is available through the SPSS Crosstabs procedure when the table is 2 × 2 in size). Second, the maximum possible value of the phi coefficient is constrained to be much smaller than +1 or –1 when the proportions of ones for the X and Y variables are far from equal.
8.7 Analysis of Data: Dog Ownership and Survival After a Heart Attack
Friedmann et al. (1980) reported results from a survey of patients who had a first heart attack. The key outcome of interest was whether or not the patient survived at least 1 year (coded 0 = no, 1 = yes). One of the variables they assessed was dog ownership (0 = no, 1 = yes). The results for this sample of 92 patients are shown in Table 8.4. Three statistics will be computed for this table, to assess the relationship between pet ownership and survival: a phi coefficient computed from the cell frequencies in this table; a Pearson correlation calculated from the 0, 1 scores; and a chi-square test of significance. Using the formula in Equation 8.5, the phi coefficient for the data in Table 8.4 is .310. The corresponding chi-square, calculated from Equation 8.6, is 8.85. This chi-square exceeds the critical value of chi-square for a 1-df table (χ2 critical = 3.84), so we can conclude that there is a significant association between pet ownership and survival. Note that the phi coefficient is just a special case of Pearson’s r, so the value of the obtained correlation between pet ownership and survival will be the same whether it is obtained from the SPSS bivariate Pearson correlation procedure or as a Φ coefficient from the SPSS Crosstabs procedure.
Although it is possible to calculate chi-square by using the values of Φ and N, it is also instructive to consider another method for the computation of chi-square, a method based on the sizes of the discrepancies between observed frequencies and expected frequencies that are based on a null
hypothesis that the row and column variables are not related. We will reanalyze the data in Table 8.4 and compute chi-square directly from the cell frequencies. Our notation for the observed frequency of scores in each cell will be O; the expected cell frequency for each cell is denoted by E. The expected cell frequency is the number of observations that are expected to fall in each cell under the null hypothesis that the row and column variables are independent. These expected values for E are generated from a simple model that tells us what cell frequencies we would expect to see if the row and column variables were independent.
First, we need to define independence between events A (such as owning a dog) and B (surviving 1 year after a heart attack). If Pr(A) = Pr(A|B)—that is, if the unconditional probability of A is the same as the conditional probability of A given B, then A and B are independent. Let’s look again at the observed frequencies given in Table 8.4 for pet ownership and coronary disease patient survival. The unconditional probability that any patient in the study will be alive at the end of 1 year is denoted by Pr(alive at the end of 1 year) and is obtained by dividing the number of persons alive by the total N in the sample; this yields 78/92 or .85. In the absence of any other information, we would predict that any randomly selected patient has about an 85% chance of survival. Here are two of the conditional probabilities that can be obtained from this table. The conditional probability of surviving 1 year for dog owners is denoted by Pr(survived 1 year|owner of dog); it is calculated by taking the number of dog owners who survived and dividing by the total number of dog owners, 50/53, which yields .94. This is interpreted as a 94% chance of survival for dog owners. The conditional probability of survival for nonowners is denoted by Pr(survived 1 year|nonowner of dog); it is calculated by taking the number of dog nonowners who survived and dividing by the total number of nonowners of dogs; this gives 28/39 or .72— that is, a 72% chance of survival for nonowners of dogs. If survival were independent of dog ownership, then these three probabilities should all be equal: Pr(alive|owner) = Pr(alive|nonowner) = Pr(alive). For this set of data, these three probabilities are not equal. In fact, the probability of surviving for dog owners is higher than for nonowners and higher than the probability of surviving in general for all persons in the sample. We need a statistic to help us evaluate whether this difference between the conditional and unconditional probabilities is statistically significant or whether it is small enough to be reasonably attributed to sampling error. In this case, we can evaluate
significance by setting up a model of the expected frequencies we should see in the cells if ownership and survival were independent.
For each cell, the expected frequency, E—the number of cases that would be in that cell if group membership on the row and column variables were independent—is obtained by taking (Row total × Column total)/Table total N. For instance, for the dog owner/alive cell, the expected frequency E = (Number of dog owners × Number of survivors)/Total N in the table. Another way to look at this computation for E is that E = Column total × (Row total/Total N); that is, we take the total number of cases in a column and divide it so that the proportion of cases in Row 1 equals the proportion of cases in Row 2. For instance, the expected number of dog owners who survive 1 year if survival is independent of ownership = Total number of dog owners × Proportion of all people who survive 1 year = 53 × (78/92) = 44.9. That is, we take the 53 dog owners and divide them into the same proportions of survivors and nonsurvivors as in the overall table. These expected frequencies, E, for the dog ownership data are summarized in Table 8.7.
Note that the Es (expected cell frequencies if H0 is true and the variables are not related) in Table 8.7 sum to the same marginal frequencies as the original data in Table 8.4. All we have done is reapportion the frequencies into the cells in such a way that Pr(A) = Pr(A|B). That is, for this table of Es,
Pr(survived 1 year) = 78/92 = .85,
Pr(survived 1 year|owner) = 44.9/53 = .85, and
Pr(survived 1 year|nonowner) = 33.1/39 = .85.
In other words, if survival is not related to ownership of a dog, then the probability of survival should be the same in the dog owner and non-dog- owner groups, and we have figured out what the cell frequencies would have to be to make those probabilities or proportions equal.
Table 8.7 Expected Cell Frequencies (If Dog Ownership and Survival Status Are Independent) for the Data in Tables 8.3 and 8.4
Next we compare the Es (the frequencies we would expect if owning a dog and survival are independent) and Os (the frequencies we actually obtained in our sample). We want to know if our actually observed frequencies are close to the ones we would expect if H0 were true; if so, it would be reasonable to conclude that these variables are independent. If Os are very far from Es, then we can reject H0 and conclude that there is some relationship between these variables. We summarize the differences between Os and Es across cells by computing the following statistic:
Note that the (O − E) deviations sum to zero within each row and column, which means that once you know the (O − E) deviation for the first cell, the other three (O − E) values in this 2 × 2 table are not free to vary. In general, for a table with r rows and c columns, the number of independent deviations (O − E) = (r −1 )(c − 1), and this is the df for the chi-square. For a 2 × 2 table, df = 1.
In this example,
This obtained value agrees with the value of chi-square that we computed earlier from the phi coefficient, and it exceeds the critical value of chi-square that cuts off 5% in the right-hand tail for the 1-df distribution of χ2 (critical value = 3.84). Therefore, we conclude that there is a statistically significant ****************
relation between these variables, and the nature of the relationship is that dog owners have a significantly higher probability of surviving 1 year after a heart attack (about 94%) than nonowners of dogs (72%). Survival is not independent of pet ownership; in fact, in this sample, there is a significantly higher rate of survival for dog owners.
The most widely reported effect size for the chi-square test of association is Cramer’s V. Cramer’s V can be calculated for contingency tables with any number of rows and columns. For a 2 × 2 table, Cramer’s V is equal to the absolute value of phi. Values of Cramer’s V range from 0 to 1 regardless of table size (but only if the row marginal totals equal the column marginal totals). Values close to 0 indicate no association; values close to 1 indicate strong association:
where chi-square is computed from Equation 8.11, n is the total number of scores in the sample, and m is the minimum of (Number of rows – 1), (Number of columns – 1).
The statistical significance of Cramer’s V can be assessed by looking at the associated chi-square; Cramer’s V can be reported as effect-size information for a chi-square analysis. Cramer’s V is a symmetrical index of association; that is, it does not matter which is the independent variable.
Chi-square goodness-of-fit tests can be applied to 2 × 2 tables (as in this example); they can also be applied to contingency tables with more than two rows or columns. Chi-square also has numerous applications later in statistics as a generalized goodness-of-fit test.2 Although chi-square is commonly referred to as a “goodness-of-fit” test, note that the higher the chi-square value, in general, the worse the agreement between the model used to generate expected values that correspond to some model and the observed data. When chi-square is applied to contingency tables, the expected frequencies generated by the model correspond to the null hypothesis that the row and column variables are independent. Therefore, a chi-square large enough to be judged statistically significant is a basis for rejection of the null hypothesis that group membership on the row variable is unrelated to group membership on the column variable.
When chi-square results are reported, the write-up should include the ****************
- A table that shows the observed cell frequencies and either row or column percentages (or both).
- The obtained value of chi-square, its df, and whether it is statistically significant.
- An assessment of the effect size; this can be phi, for a 2 × 2 table; other effect sizes such as Cramer’s V are used for larger tables.
- A statement about the nature of the relationship, stated in terms of differences in proportions or of probabilities. For instance, in the pet ownership example, the researchers could say that the probability of surviving for 1 year is much higher for owners of dogs than for people who do not own dogs.
8.8 Chi-Square Test of Association (Computational Methods for Tables of Any Size)
The method of computation for chi-square described in the preceding section can be generalized to contingency tables with more than two rows and columns. Suppose that the table has r rows and c columns. For each cell, the expected frequency, E, is computed by multiplying the corresponding row and column total Ns and dividing this product by the N of cases in the entire table. For each cell, the deviation between O (observed) and E (expected) frequencies is calculated, squared, and divided by E (expected frequency for that cell). These terms are then summed across the r × c cells. For a 2 × 3 table, for example, there are six cells and six terms included in the computation of chi-square. The df for the chi-square test on an r × c table is calculated as follows:
Thus, for instance, the degrees of freedom for a 2 × 3 table is (2 − 1) × (3 − 1) = 2 df. Only the first two (O – E) differences between observed and expected frequencies in a 2 × 3 table are free to vary. Once the first two deviations are known, the remaining deviations are determined because of the requirement that (O – E) sum to zero down each column and across each row of the table. Critical values of chi-square for any df value can be found in the table in ****************
8.9 Other Measures of Association for Contingency Tables
Until about 10 years ago, the most widely reported statistic for the association between categorical variables in contingency tables was the chi-square test of association (sometimes accompanied by Φ or Cramer’s V as effect-size information). The chi-square test of association is still fairly widely reported. However, many research situations involve prediction of outcomes that have low base rates (e.g., fewer than 100 out of 10,000 patients in a medical study may die of coronary heart disease). The effect-size indexes most commonly reported for chi-square, such as phi, are constrained to be less than +1.00 when the marginal distribution for the predictor variable differs from the marginal distribution of the outcome variable; for instance, in some studies, about 60% of patients have the Type A coronary-prone personality, but only about 5% to 10% of the patients develop heart disease. Because the marginal distributions (60/40 split on the personality predictor variable vs. 90/10 or 95/5 split on the outcome variable) are so different, the maximum possible value of phi or Pearson’s r is restricted; even if there is a strong association between personality and disease, phi cannot take on values close to +1 when the marginal distributions of the X and Y variables are greatly different. In such situations, effect-size measures, such as phi, that are marginal dependent may give an impression of effect size that is unduly pessimistic.
Partly for this reason, different descriptions of association are often preferred in clinical studies; in recent years, odds ratios have become the most popular index of the strength of association between a risk factor (such as smoking) and a disease outcome (such as lung cancer) or between a treatment and an outcome (such as survival). Odds ratios are usually obtained as part of a binary logistic regression analysis. A brief definition of odds ratios is provided in the glossary, and a more extensive explanation of this increasingly popular approach to summarizing information from 2 × 2 tables that correlate risk and outcome (or intervention and outcome) is provided in Chapter 23.
In addition, dozens of other statistics may be used to describe the patterns of scores in contingency tables. Some of these statistics are applicable only to tables that are 2 × 2; others can be used for tables with any number of rows and columns. Some of these statistics are marginal dependent, while others ****************
are not dependent on the marginal distributions of the row and column variables. Some of these are symmetric indexes, while others (such as lambda and Somers’s d) are asymmetric; that is, they show a different reduction in uncertainty for prediction of Y from X than for prediction of X from Y. The McNemar test is used when a contingency table corresponds to repeated measures—for example, participant responses on a binary outcome variable before versus after an intervention. A full review of these many contingency table statistics is beyond the scope of this book; see Everitt (1977) and Liebetrau (1983) for more comprehensive discussion of contingency table analysis.
8.10 SPSS Output and Model Results Write-Up
Two SPSS programs were run on the data in Table 8.4 to verify that the numerical results obtained by hand earlier were correct. The SPSS Crosstabs procedure was used to compute phi and chi-square (this program also reports numerous other statistics for contingency tables). The SPSS bivariate correlation procedure (as described earlier in Chapter 7) was also applied to these data to obtain a Pearson’s r value.
To enter the dog owner/survival data into SPSS, one column was used to represent each person’s score on dog ownership (coded 0 = did not own dog, 1 = owned dog), and a second column was used to enter each person’s score on survival (0 = did not survive for 1 year after heart attack, 1 = survived for at least 1 year). The number of lines with scores of 1, 1 in this dataset corresponds to the number of survivors who owned dogs. The complete set of data for this SPSS example appears in Table 8.3.
The SPSS menu selections to run the Crosstabs procedure were as follows (from the top-level menu, make these menu selections, as shown in Figure 8.3): <Analyze> → <Descriptive Statistics> → <Crosstabs>.
This opens the SPSS dialog window for the Crosstabs procedure, shown in Figure 8.4. The names of the row and column variables were placed in the appropriate windows. In this example, the row variable corresponds to the score on the predictor variable (dog ownership), and the column variable corresponds to the score on the outcome variable (survival status). The Statistics button was clicked to access the menu of optional statistics to describe the pattern of association in this table, as shown in Figure 8.5. The optional statistics selected included chi-square, phi, and Cramer’s V. In ****************
addition, the Cells button in the main Crosstabs dialog window was used to open up the Crosstabs Cell Display menu, which appears in Figure 8.6. In addition to the observed frequency for each cell, the expected frequencies for each cell and row percentages were requested.
The output from the Crosstabs procedure for these data appears in Figure 8.7. The first panel shows the contingency table with observed and expected cell frequencies and row percentages. The second panel reports the obtained value of χ2 (8.85) and some additional tests. The third panel in Figure 8.7 reports the symmetric measures of association that were requested, including the value of Φ (.310) and that of Cramer’s V (also .310).
Figure 8.3 Menu Selections for Crosstabs Procedure
Figure 8.4 SPSS Crosstabs Main Dialog Window
In addition, a Pearson correlation was calculated for the scores on dog ownership and survival status, using the same procedure as in Chapter 7 to obtain a correlation: <Analyze> → <Correlation> → <Bivariate>. Pearson’s r (shown in Figure 8.8) is .310; this is identical to the value reported for phi using the Crosstabs procedure above.
Figure 8.5 SPSS Crosstabs Statistics Dialog Window
Figure 8.6 SPSS Crosstabs Cell Display Dialog Window
Figure 8.7 SPSS Output From Crosstabs Procedure for Dog/Survival Status Data in Tables 8.3 and 8.4
Figure 8.8 SPSS Output From Pearson Correlation Procedure for Dog/Survival Status Data in Tables 8.3 and 8.4
Results A survey was done to assess numerous variables that might predict survival for 1 year after a first heart attack; there were 92 patients in the study. Only one predictor variable is reported here: dog ownership. Expected cell frequencies were examined to see whether there were any expected frequencies less than 5; the smallest expected cell frequency was 5.9. (If there were one or more cells with expected frequencies less than 5, it would be preferable to report the Fisher exact test rather than chi-square.) Table 8.4 shows the observed cell frequencies for dog ownership and survival status. Of the 53 dog owners, 3 did not survive; of the 39 nonowners of dogs, 11 did not survive. A phi coefficient was calculated to assess the strength of this relationship: Φ = .310. This corresponds to a medium-size effect. This was a statistically significant association: χ2(1) = 8.85, p < .05. This result was also statistically significant by the Fisher exact test, p = .006. The nature of the relationship was that dog owners had a significantly higher proportion of survivors (94%) than non–dog owners (72%). Because this study was not experimental, it is not possible to make a causal inference.
This chapter provided information about different forms of correlation that are appropriate when X and Y are rank/ordinal or when one or both of these variables are dichotomous. This chapter demonstrated that Pearson’s r can be applied in research situations where one or both of the variables are true dichotomies. This is important because it means that true dichotomous variables may be used in many other multivariate analyses that build on variance partitioning and use covariance and correlation as information about the way variables are interrelated.
The chi-square test of association for contingency tables was presented in this chapter as a significance test that can be used to evaluate the statistical significance of the phi correlation coefficient. However, chi-square tests have
other applications, and it is useful for students to understand the chi-square as a general goodness-of-fit test; for example, chi-square is used as one of the numerous goodness-of-fit tests for structural equation models.
This chapter described only a few widely used statistics that can be applied to contingency tables. There are many other possible measures of association for contingency tables; for further discussion, see Everitt (1977) or Liebetrau (1983). Students who anticipate that they will do a substantial amount of research using dichotomous outcome variables should refer to Chapter 23 in this book for an introductory discussion of binary logistic regression; logistic regression is presently the most widely used analysis for this type of data. For categorical outcome variables with more than two categories, polytomous logistic regression can be used (Menard, 2001). In research situations where there are several categorical predictor variables and one categorical outcome variable, log linear analysis is often reported.
Notes 1. In other words, conclusions about the outcome of this study depend entirely on the outcomes for
these five individuals, regardless of the size of the total N for the table (and it is undesirable to have a study where a change in outcome for just one or two participants can greatly change the nature of the outcome).
- There are other applications of chi-square apart from its use to evaluate the association between row and column variables in contingency tables. For example, in structural equation modeling (SEM), chi-square tests are performed to assess how much the variance/covariance matrix that is reconstructed from SEM parameters differs from the original variance/covariance matrix calculated from the scores. A large chi-square for an SEM model is interpreted as evidence that the model is a poor fit—that is, the model does not do a good job of reconstructing the variances and covariances.
- How are point biserial r (rpb) and the phi coefficient different from Pearson’s r?
- How are biserial r (rb) and tetrachoric r (rtet) different from Pearson’s r?
Is high blood pressure diagnosis (defined as high blood pressure = 1 = systolic pressure equal to or greater than 140 mm Hg, low blood pressure = 0 = systolic pressure less than 140 mm Hg) a true dichotomy or an artificial dichotomy?
The data in the table below were collected in a famous social- psychological field experiment. The researchers examined a common source of frustration for drivers: a car stopped at a traffic light that fails to move when the light turns green. The variable they manipulated was the status of the frustrating car (1 = high status, expensive, new; 0 = low status, inexpensive, old). They ran repeated trials in which they stopped at a red light, waited for the light to turn green, and then did not move the car; they observed whether the driver in the car behind them honked or not (1 = honked, 0 = did not honk). They predicted that people would be more likely to honk at low-status cars than at high-status cars (Doob & Gross, 1968). This table reports part of their results:
- Calculate phi and chi-square by hand for the table above, and write up a Results section that describes your findings and notes whether the researchers’ prediction was upheld.
Enter the data for this table into SPSS. To do this, create one variable in the SPSS worksheet that contains scores of 0 or 1 for the variable status and another variable in the SPSS worksheet that contains scores of 0 or 1 for the variable honking (e.g., because there were 18 people who honked at a high-status car, you will enter 18 lines with scores of 1 on the first variable and 1 on the second variable).
- Using SPSS, do the following: Run the Crosstabs procedure and obtain both phi and chi-square; also, run a bivariate correlation (and note how the obtained bivariate correlation compares with your obtained phi).
- In this situation, given the marginal frequencies, what is the maximumpossible value of phi?
- The researchers manipulated the independent variable (status of the car) and were careful to control for extraneous variables. Can they make a causal inference from these results? Give reasons for your answer.
When one or both of the variables are dichotomous, Pearson’s r has specific names; for example, when a true dichotomy is correlated with a quantitative variable, what is this correlation called? When two true dichotomous variables are correlated, what is this correlation called?
- What information should be included in the report of a chi-square test ofcontingency?
The table below gives the percentage of people who were saved (vs. lost) when the Titanic sank. The table provides information divided into groups by class (first class, second class, third class, and crew) and by gender and age (children, women, men).
Titanic Disaster—Official Casualty Figures
SOURCE: British Parliamentary Papers, Shipping Casualties (Loss of the Steamship ‘Titanic”). 1912, cmd. 6352, ‘Report of a Formal Investigation into the circumstances attending the foundering on the 15th April, 1912, of the British Steamship ‘Titanic,” of Liverpool, after striking ice in or near Latitude 41 = 46’ N., Longitude 50 = 14’ W., North Atlantic Ocean, whereby loss of life ensued.’ (London: His Majesty’s Stationery Office, 1912), page 42.
The information in the table is sufficient to set up some simple chi-square tests. For example, let’s ask, Was there a difference in the probability of being saved for women passengers in first class versus women passengers in third class? There were a total of 309 women in first and third class. The relevant numbers from the table on page 336 appear in the table below.
Compute a phi coefficient using the observed cell frequencies in the table above. Also, compute a chi-square statistic for the observed frequencies in the table above. Write up your results in paragraph form. Was there a statistically significant association between being in first class and being saved when we look at the passenger survival data from the Titanic? How strong was the association between class and outcome (e.g., how much more likely were first-class women passengers to be saved than were third-class women passengers)?
This page content not show in demo version
all pages show in full version, Buy Now Vitalsource Downloader
10 ADDING A THIRD VARIABLE
10.1 Three-Variable Research Situations
In previous chapters, we reviewed the bivariate correlation (Pearson’s r) as an index of the strength of the linear relationship between one independent variable (X) and one dependent variable (Y). This chapter moves beyond the two-variable research situation to ask, “Does our understanding of the nature and strength of the predictive relationship between a predictor variable, X1, and a dependent variable, Y, change when we take a third variable, X2, into account in our analysis, and if so, how does it change?” In this chapter, X1 denotes a predictor variable, Y denotes an outcome variable, and X2 denotes a third variable that may be involved in some manner in the X1, Y relationship. For example, we will examine whether age (X1) is predictive of systolic blood pressure (SBP) (Y) when body weight (X2) is statistically controlled.
We will examine two preliminary exploratory analyses that make it possible to statistically control for scores on the X2 variable; these analyses make it possible to assess whether controlling for X2 changes our understanding about whether and how X1 and Y are related. First, we can split the data file into separate groups based on participant scores on the X2 control variable and then compute Pearson correlations or bivariate regressions to assess how X1 and Y are related separately within each group. Although this exploratory procedure is quite simple, it can be very informative. Second, if the assumptions for partial correlation are satisfied, we can compute a partial correlation to describe how X1 and Y are correlated when scores on X2 are statistically controlled. The concept of statistical control that is introduced in this chapter continues to be important in later chapters that discuss analyses
that include multiple predictor variables. Partial correlations are sometimes reported as the primary analysis in a
journal article. In this textbook, partial correlation analysis is introduced primarily to explain the concept of statistical control. The data analysis methods for the three-variable research situation that are presented in this chapter are suggested as preliminary exploratory analyses that can help a data analyst evaluate what kinds of relationships among variables should be taken into account in later, more complex analyses.
For partial correlation to provide accurate information about the relationship between variables, the following assumptions about the scores on X1, X2, and Y must be reasonably well satisfied. Procedures for data screening to identify problems with these assumptions were reviewed in detail in Chapters 7 and 9, and detailed examples of data screening are not repeated here.
- The scores on X1, X2, and Y should be quantitative. It is also acceptable to have predictor and control variables that are dichotomous. Most of the control variables (X2) that are used as examples in this chapter have a small number of possible score values because this limitation makes it easier to work out in detail the manner in which X1 and Y are related at each level or score value of X2. However, the methods described here can be generalized to situations where the X2 control variable has a large number of score values, as long as X2 meets the other assumptions for Pearson correlation.
- Scores on X1, X2, and Y should be reasonably normally distributed. For dichotomous predictor or control variables, the closest approximation to a normal distribution occurs when the two groups have an equal number of cases.
- For each pair of variables (X1 and X2, X1 and Y, and X2 and Y), the joint distribution of scores should be bivariate normal, and the relation between each pair of variables should be linear. The assumption of linearity is extremely important. If X1 and Y are nonlinearly related, then Pearson’s r does not provide a good description of the strength of the association between them.