Quantitative Reasoning and Analysis

PRINTED BY: Patricia Sellers <PATRICIA.SELLERS@WALDENU.EDU>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted without publisher’s prior permission. Violators will be prosecuted.

I

Chapter 4

Measures of Central Tendency

Chapter Learning Objectives

Defining all measures of central tendency, explaining their differences, relative strengths and weaknesses

Determining the mode in a given distribution Finding or calculating the median and percentiles Calculating the mean Determining the shape of a distribution

n Chapters 2 and 3, we learned that frequency distributions and graphical techniques are useful tools for describing data. The main advantage of using frequency distributions or graphs is to summarize quantitative information in ways that can be easily understood even by a lay audience. Often, however, we

need to describe a large set of data involving many variables for which graphs and tables may not be the most efficient tools. For instance, let’s say that we want to present information on the income, education, and political party affiliation of both men and women. Presenting this information might require up to six frequency distributions or graphs. The more variables we add, the more complex the presentation becomes.

Another way of describing a distribution is by selecting a single number that describes or summarizes the distribution more concisely. Such numbers describe what is typical about the distribution, for example, the average income among Latinos who are college graduates or the most common party identification among the rural poor. Numbers that describe what is average or typical of the distribution are called measures of central tendency.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0002.xlink.html

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0003.xlink.html

PRINTED BY: Patricia Sellers <PATRICIA.SELLERS@WALDENU.EDU>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted without publisher’s prior permission. Violators will be prosecuted.

Measures of central tendency Categories or scores that describe what is average or typical of the distribution.

In this chapter, we will learn about three measures of central tendency: the mode, the median, and the mean. You are probably somewhat familiar with these measures. The terms median income and average income, for example, are used quite a bit even in the popular media. Each describes what is most typical, central, or representative of the distribution. In this chapter, we will also learn about how these measures differ from one another. We will see that the choice of an appropriate measure of central tendency for representing a distribution depends on three factors: (1) the way the variables are measured (their level of measurement), (2) the shape of the distribution, and (3) the purpose of the research.

THE MODE

The mode is the category or score with the largest frequency or percentage in the distribution. Of all the averages discussed in this chapter, the mode is the easiest one to identify. Simply locate the category represented by the highest frequency in the distribution.

Mode The category or score with the highest frequency (or percentage) in the distribution.

We can use the mode to determine, for example, the most common foreign language spoken in the United States today. English is clearly the language of choice in public communication in the United States, but you may be surprised by the U.S. Census Bureau’s finding that 1 out of every 5 people living in the United States speaks 1 of 155 different languages other than English at home. Record immigration from many countries since 1980 has contributed to a sharp increase in the number of people who speak a foreign language.1

What is the most common foreign language spoken in the United States today? To answer this question, look at Table 4.1, which lists the 10 most commonly spoken foreign languages in the United States and the number of people who speak each language. The table shows that Spanish is the most common; more than 35 million people speak Spanish. In this example, we refer to “Spanish” as the mode—the category with the largest frequency in the distribution.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo65

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note1

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.1

PRINTED BY: Patricia Sellers <PATRICIA.SELLERS@WALDENU.EDU>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted without publisher’s prior permission. Violators will be prosecuted.

Table 4.1 Ten Most Common Foreign Languages Spoken in the United States, 2009

Source: U.S. Census Bureau, Statistical Abstract of the United States: 2012, Table 53.

The mode is always a category or score, not a frequency. Do not confuse the two. That is, the mode in the previous example is “Spanish,” not its frequency of 35,468,501.

The mode is not necessarily the category with the majority (i.e., more than 50%) of cases, as it is in Table 4.1; it is simply the category in which the largest number (or proportion) of cases fall. For example, Figure 4.1 is a pie chart showing the answers of 2010 General Social Survey (GSS) respondents to the following question: “Would you say your own health, in general, is excellent, good, fair, or poor?” Note that the highest percentage (45.04) of respondents is associated with the answer “good.” The answer “good” is therefore the mode.

The mode is used to describe nominal variables. Recall that with nominal variables —such as foreign languages spoken in the United States, race/ethnicity, or religious affiliation—we are only able to classify respondents based on a qualitative and not a quantitative property. By describing the most commonly occurring category of a nominal variable (such as Spanish in our example), the mode thus reflects the most important element of the distribution of a variable measured at the nominal level. The mode is the only measure of central tendency that can be used with nominal-level variables. It can also be used to describe the most commonly occurring category in

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.1

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.1

any distribution. For example, the variable health presented in Figure 4.1 is an ordinal variable.

Figure 4.1 Respondents’ Health

Source: Douglas S. Massey. “The Social and Economic Origins of Immigration,” Annals, AAPSS (July 1990): 510.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.1

In some distributions, there are two scores or categories with the highest frequency. Such distributions have two modes and are said to be bimodal. For instance, Figure 4.2 is a bar graph showing the response of 2010 GSS respondents to the following question: “If you were asked to use one of four names for your social class, which would you say you belong to: the lower class, the working class, the middle class, or the upper class?” The same percentage of respondents (approximately 44%) identified themselves as “working class” or “middle class.” Both response categories have the highest frequency, and therefore, both are the modes. We can describe this distribution as bimodal. When two scores or categories with the highest frequencies are quite close (but not identical) in frequency, the distribution is still “essentially” bimodal. In these situations, you should not rely on merely reporting the (true) mode, but instead report the two highest frequency categories.

Learning Check

Listed below are the political party affiliations of 15 individuals. Find the mode.

Figure 4.2 Respondent’s Subjective Class Identification

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.2

THE MEDIAN

The median is a measure of central tendency that can be calculated for variables that are at least at an ordinal level of measurement. The median represents the exact middle of a distribution; it is the score that divides the distribution into two equal parts so that half the cases are above it and half below it. For example, according to the U.S. Bureau of Labor Statistics, the median weekly earnings of full-time wage and salary workers in 2012 was $768.2 This means that half the workers in the United States earned more than $768 a week and half earned less than $768. Since many variables used in social research are ordinal, the median is an important measure of central tendency.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo64

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note2

Median The score that divides the distribution into two equal parts so that half the cases are above it and half below it.

For instance, what are the opinions of Americans about their children’s future? How can we describe their levels of confidence in their children’s future? To answer this question, the 2010 GSS asked respondents what they thought their children’s standard of living would be like when they reach their parent’s current age. Respondents selected from the response options of “much better,” “somewhat better,” “about the same,” “somewhat worse,” or “much worse” (respondents without children were allowed to report “no children”). Ratings of parents’ anticipated standard of living for their children is an ordered (ordinal) variable. Thus, to estimate the average rating, we need to use a measure of central tendency appropriate for ordinal variables. The median is a suitable measure for those variables whose categories or scores can be arranged in order of magnitude from the lowest to the highest. Therefore, the median can be used with ordinal or interval ratio variables, for which scores can be at least rank ordered, but cannot be calculated for variables measured at the nominal level.

Finding the Median in Sorted Data It is very easy to find the median. In most cases, it can be done by a simple

inspection of the sorted data. The location of the median score differs somewhat, depending on whether the number of observations is odd or even. Let’s first consider two examples with an odd number of cases.

An Odd Number of Cases Suppose we are looking at the responses of five people to the question, “Thinking

about the economy, how would you rate economic conditions in this country today?” Following are the responses of these five hypothetical persons:

To locate the median, first arrange the responses in order from the lowest to the highest (or the highest to the lowest):

The median is the response associated with the middle case. Find the middle case when N is odd by adding 1 to N and dividing by 2: (N + 1)/2. Since N is 5, you calculate (5 + 1)/2 = 3. The middle case is thus the third case, and the median is “only fair,” the response associated with the third case. Notice that the median divides the distribution exactly in half so that there are two respondents who are more satisfied and two respondents who are less satisfied.

Now let’s look at another example. The following is a list of the number of hate crimes reported in the nine most populous U.S. states in the year 2011.3

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note3

To locate the median, first arrange the number of hate crimes in order from the lowest to the highest: The middle case is (9 + 1)/2 = 5, the fifth state, Texas. The median is 189, the number of hate crimes associated with Texas. It divides the distribution exactly in half so that there are four states with fewer hate crimes and four with more (this is illustrated in Figure 4.3a).

- The states associated with the number of hate crimes listed above (in the same order of the listing) are Georgia, Pennsylvania, Illinois, Florida, Texas, Ohio, Michigan, New York, and

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.3

California.

Figure 4.3 Finding the Median Number of Hate Crimes for (a) Nine States and (b) Eight States

An Even Number of Cases Now let’s delete the last score to make the number of states even (Figure 4.3b). The

scores have already been arranged in ascending order.

Again, to locate the median, first arrange the number of hate crimes in order from the lowest to the highest:

When N is even (eight states), we no longer have a single middle case. The median is therefore located halfway between the two middle cases. Find the two middle cases by using the previous formula: (N + 1)/2, or (8 + 1)/2 = 4.5. In our example, this means that you average the scores for the fourth and fifth states, Florida and Texas. The numbers of hate crimes associated with these states are 139 and 189. To find the median for this interval-ratio variable, simply average the two middle numbers:

The median is therefore 164.

As a note of caution, when data are ordinal, averaging the middle two scores is no longer appropriate. The median simply falls between two middle values.

Learning Check

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.3

Find the median of the following distribution of an interval-ratio variable: 22, 15, 18, 33, 17, 5, 11, 28, 40, 19, 8, 20.

Finding the Median in Frequency Distributions Often our data are arranged in frequency distributions. Take, for instance, the

frequency distribution displayed in Table 4.2. It shows the political views of GSS respondents in 2010.

To find the median, we need to identify the category associated with the observation located at the middle of the distribution. We begin by specifying N, the total number of respondents. In this particular example, N = 1,457. We then use the formula (N + 1)/2, or (1,457 + 1)/2 = 729. The median is the value of the category associated with the 729th case. The cumulative frequency (Cf) of the 729th case falls in the category “moderate”; thus, the median is “moderate.” This may seem odd; however, the median is always the value of the response category, not the frequency.

Table 4.2 Political Views of GSS Respondents, 2010

A second approach to locating the median in a frequency distribution is to use the cumulative percentages column, as shown in the last column of Table 4.2. In this example, the percentages are cumulated from “extremely liberal” to “extremely conservative.” We could also cumulate the other way, from “extremely conservative” to “extremely liberal.” To find the median, we identify the response category that

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.2

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.2

contains a cumulative percentage value equal to 50%. The median is the value of the category associated with this observation.4 Looking at Table 4.2, the percentage value equal to 50% falls within the category “moderate.” The median for this distribution is therefore “moderate.” If you are not sure why the middle of the distribution—the 50% point—is associated with the category “moderate,” look again at the cumulative percentage column (C%). Notice that 28.3% of the observations are accumulated below the category “moderate” and that 65.0% are accumulated up to and including the category “moderate.” We know, then, that the percentage value equal to 50% is located somewhere within the “moderate” category.

Learning Check

For a review of cumulative distributions, refer to Chapter 2.

STATISTICS IN PRACTICE: GENDERED INCOME INEQUALITY

We can use the median to compare groups. Consider the significant changes that have taken place during the past few decades in the income levels of men and women in the United States. Income levels profoundly influence our lives both socially and economically. Higher income is associated with increased education and work experience for both men and women.

Figure 4.4 compares the median incomes for men and women in 1973 and in 2011. Because the median is a single number summarizing central tendency in the distribution, we can use it to note differences between subgroups of the population or changes over time. In this example, the increase in median income from 1973 to 2011 clearly shows a significant income gain for women. However, that said, in 2011 women still made, on average, about $10,000 less than men ($37,133 vs. $46,993, respectively).

Figure 4.4 Median Income for Men and Women, 1973 and 2011

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note4

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.2

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0002.xlink.html

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.4

Sources: Data for 1973 obtained from the U.S. Census Bureau, Current Population Reports P60– 226, Income, Poverty and Health Insurance Coverage in the United States: 2003. Data for 2011 obtained from the U.S. Census Bureau, American Community Survey, 2011.

Learning Check

Examine Figure 4.4 and contrast the median incomes of women and men over the three decades. What can you learn about gender and income?

Locating Percentiles in a Frequency Distribution The median is a special case of a more general set of measures of location called

percentiles. A percentile is a score at or below which a specific percentage of the

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.4

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo84

distribution falls. The nth percentile is a score below which n% of the distribution falls. For example, the 75th percentile is a score that divides the distribution so that 75% of the cases are below it. The median is the 50th percentile. It is a score that divides the distribution so that 50% of the cases fall below it. Like the median, percentiles require that data be ordinal or higher in level of measurement. Percentiles are easy to identify when the data are arranged in frequency distributions.

Percentile A score below which a specific percentage of the distribution falls.

To help illustrate how to locate percentiles in a frequency distribution, we display in Table 4.3 the frequency distribution, the percentage distribution, and the cumulative percentage distribution of opinion about police job performance of respondents for the 2008 Monitoring the Future (MTF) survey.

The 50th percentile (the median) is “Fair,” meaning that 50% of the respondents view police job performance above “Fair” and 50% of the respondents view police job performance as “Fair” or below “Fair” (as you can see from the cumulative percentage column, 50% falls somewhere in the third category, associated with the category “Fair”). Similarly, the 20th percentile is “Poor” because 20% of the respondents view police job performance as “Poor” or below “Poor.”

Percentiles are widely used to evaluate relative performance on standardized achievement tests, such as the SAT or ACT. Let’s suppose that your ACT score was 29. To evaluate your performance for the college admissions officer, the testing service translated your score into a percentile rank. Your percentile rank was determined by comparing your score with the scores of all other students who took the test at the same time. Suppose for a moment that 90% of all students received a lower ACT score than you (and 10% scored above you). Your percentile rank would have been 90. If, however, there were more students who scored better than you— let’s say that 15% scored above you and 85% scored lower than you—your percentile rank would have been 85.

Another widely used measure of location is the quartile. The lower quartile is equal to the 25th percentile and the upper quartile is equal to the 75th percentile. (Can you locate the upper quartile in Table 4.3?) A college admissions office interested in accepting the top 25% of its applicants based on their SAT scores could calculate the upper quartile (the 75th percentile) and accept everyone whose score is equivalent to the 75th percentile or higher. (Note that they would be calculating percentiles based

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.3

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.3

on the scores of their applicants, not of all students in the nation who took the SAT.)

Table 4.3 Frequency Distribution for Police Job Performance: 2008 MTF Respondents

THE MEAN

The arithmetic mean is by far the best known and most widely used measure of central tendency. The mean is what most people call the “average.” The mean is typically used to describe central tendency in interval-ratio variables such as income, age, and education. You are probably already familiar with how to calculate the mean. Simply add up all the scores and divide by the total number of scores.

Mean A measure of central tendency that is obtained by adding up all the scores and dividing by the total number of scores. It is the arithmetic average.

Firearm statistics, for example, can be analyzed using the mean. Table 4.4 shows the 2011 gun ownership rates (per 100 population) for 15 of the most populous countries in the world. We want to summarize the information presented in this table by calculating some measure of central tendency. Because the variable “gun ownership rate” is an interval-ratio variable, we will select the arithmetic mean as our measure of central tendency.

To find the mean gun ownership rate (number of guns per 100 people) for the data presented in Table 4.4, add up the gun ownership rates for all the countries and divide the sum by the number of countries:

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo56

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.4

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.4

The mean gun ownership rate for 15 of the most populous countries in the world is 13.3.5 That means in these 15 countries, the average number of guns per 100 people is 13.3.

Table 4.4 2011 Gun Ownership Rates per 100 People for 15 of the Most Populous Countries

Source: United Nations Office on Drugs and Crime, 2011 Annual Report.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note5

Calculating the Mean Another way to calculate the arithmetic mean is to use a formula. Beginning with

this section, we introduce a number of formulas that will help you calculate some of the statistical concepts that we are going to present. A formula is a shorthand way to explain what operations we need to follow to obtain a certain result. So instead of saying “add all the scores together and then divide by the number of scores,” we can define the mean by the following formula:

Let’s take a moment to consider these new symbols because we continue to use them in later chapters. We use Y to represent the raw scores in the distribution of the variable of interest; Ȳ is pronounced “Y-bar” and is the mean of the variable of interest. The symbol represented by the Greek letter Σ is pronounced “sigma,” and it is used often from now on. It is a summation sign (just like the + sign) and directs us to sum whatever comes after it. Therefore, ΣY means “add up all the raw Y scores.” Finally, the letter N, as you know by now, represents the number of cases (or observations) in the distribution.

Let’s summarize the symbols as follows:

Y = the raw scores of the variable Y Ȳ = the mean of Y ΣY = the sum of all the Y scores N = the number of observations or cases

Now that we know what the symbols mean, let’s work through another example. The following are the ages of the 10 students in a graduate research methods class:

21, 32, 23, 41, 20, 30, 36, 22, 25, 27

What is the mean age of the students?

For these data, the ages included in this group are represented by Y; N = 10, the number of students in the class; and ΣY is the sum of all the ages:

ΣY = 21 + 32 + 23 + 41 + 20 + 30 + 36 + 22 + 25 + 27 = 277

Thus, the mean age is

The mean can also be calculated when the data are arranged in a frequency distribution. We have presented an example involving a frequency distribution in A Closer Look 4.1.

Learning Check

A Closer Look 4.1 Finding the Mean in a Frequency Distribution

When data are arranged in a frequency distribution, we must give each score its proper weight by multiplying it by its frequency. We can use the following modified formula to calculate the mean:

where

Y = the raw scores of the variable Y

Ȳ = the mean of Y ΣfY = the sum of all the fYs

N = the number of observations or cases We now illustrate how to calculate the mean from a frequency distribution

using the preceding formula. In the 2010 GSS, respondents were asked about what they think is the ideal number of children for a family. Their responses are presented in the following table.

Ideal Number of Children: GSS 2010

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#case4.1

Notice that to calculate the value of ∑fY (Column 3), each score (Column 1) is multiplied by its frequency (Column 2), and the products are then added together. When we apply the formula

we find that the mean for the ideal number of children is 2.52.

If you are having difficulty understanding how to find the mean in a frequency distribution, examine this table. It explains the process without using any notation.

Here is another example. The following tables are frequency distributions of years

of education for American Indians or Alaska Natives and Hispanic American GSS 2010 respondents. Calculate the mean level of education for each of the two groups by applying the formula for calculating the mean in a frequency distribution.

Years of Education for American Indians or Alaska Natives: GSS 2010

Years of Education for Hispanic Respondents: GSS 2010

Examine the tables showing years of education for American Indians or Alaska Natives and Hispanic respondents. Note the similarities and differences between the two groups. Education is a major component of social class. You may want to

take your analysis one step further. SPSS Problem 3 at the end of this chapter provides specific instructions that will help you explore the relationship between social class and the number of children that couples decided to have.

Learning Check

The following distribution is the same as the one you used to calculate the median in an earlier Learning Check: 22, 15, 18, 33, 17, 5, 11, 28, 40, 19, 8, 20. Calculate the mean. Is it the same as the median, or is it different?

Understanding Some Important Properties of the Arithmetic Mean The following three mathematical properties make the mean the most important

measure of central tendency. It is, in fact, a concept that is basic to numerous and more complex statistical operations.

Interval-Ratio Level of Measurement Because it requires the mathematical operations of addition and division, the mean

can be calculated only for variables measured at the interval-ratio level. This is the only level of measurement that provides numbers that can be added and divided.

Center of Gravity Because the mean (unlike the mode and the median) incorporates all the scores in

the distribution, we can think of it as the center of gravity of the distribution. That is, the mean is the point that perfectly balances all the scores in the distribution. If we subtract the mean from each score and add up all the differences, the sum will always be zero!

Learning Check

Why is the mean considered the center of gravity of the distribution? Think of the last time you were in a park on a seesaw (it may have been a long time ago) with

a friend who was much heavier than you. You were left hanging in the air until your friend moved closer to the center. In short, to balance the seesaw a light person far away from the center (the mean) can balance a heavier person who is closer to the center. Can you illustrate this principle with a simple income distribution?

Sensitivity to Extremes The examples we have used to show how to compute the mean demonstrate that,

unlike with the mode or the median, every score enters into the calculation of the mean. This property makes the mean sensitive to extreme scores in the distribution. The mean is pulled in the direction of either very high or very low values. A glance at Figure 4.5 should convince you of that. Figures 4.5a and b show the incomes of 10 individuals. In Figure 4.5b, the income of one individual has shifted from $5,000 to $35,000. Notice the effect it has on the mean; it shifts from $3,000 to $6,000! The mean is disproportionately affected by the relatively high income of $35,000 and is misleading as a measure of central tendency for this distribution. Notice that the median’s value is not affected by this extreme score; it remains at $3,000. Thus, the median gives us better information on the typical income for this group. In the next section, we will see that because of the sensitivity of the mean, it is not suitable as a measure of central tendency in distributions that have a few very extreme values on one side of the distribution. (A few extreme values are no problem if they are not mostly on one side of the distribution.)

Illustrating the Seesaw Principle

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

Learning Check

When asked to choose the appropriate measure of central tendency for a distribution, remember that the level of measurement is not the only consideration. When variables are measured at the interval-ratio level, the mean is usually the measure of choice, but remember that extreme scores in one direction make the mean unrepresentative and the median or mode may be the better choice.

Figure 4.5 The Value of the Mean Is Affected by Extreme Scores: (a) No Extreme Scores and (b) One Extreme Score

THE SHAPE OF THE DISTRIBUTION: TELEVISION, EDUCATION, AND SIBLINGS

In this chapter, we have looked at the way in which the mode, median, and mean reflect central tendencies in the distribution. Distributions (this discussion is limited to distributions of interval-ratio variables) can also be described by their general shape, which can be easily represented visually. A distribution can be either symmetrical or skewed, depending on whether there are a few extreme values at one end of the distribution.

A distribution is symmetrical (Figure 4.6a) if the frequencies at the right and left tails of the distribution are identical, so that if it is divided into two halves, each will be the mirror image of the other. In a unimodal, symmetrical distribution, the mean, median, and mode are identical.

Symmetrical distribution The frequencies at the right and left tails of the distribution are identical; each half of the distribution is the mirror image of the other.

We can illustrate the differences among the three types of distributions by examining three variables in the 2010 GSS.6 The frequency distributions for these three variables are presented in Tables 4.5 through 4.7, and the corresponding graphs are depicted in Figures 4.7 through 4.9.

The Symmetrical Distribution First, let’s examine Table 4.5 and Figure 4.7, displaying the distribution of the

number of hours per day spent watching television. Notice that the largest number (247) watch television 2 hours/day (mode = 2.0), and about a fairly similar number (204 and 188, respectively) reported either 1 or 3 hours of watching television per day. As shown in Figure 4.7, the mode, the median, and the mean are almost identical, and they coincide at about the middle of the distribution.

The distribution of number of hours spent per week watching television in Table 4.5 and Figure 4.7 is a nearly symmetrical distribution with the mean, the median, and the mode being almost identical.

Figure 4.6 Types of Frequency Distributions

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo123

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.6

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/endnote.xlink.html#ch4-note6

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.9

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.5

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.7

Table 4.5 Hours Spent per Day Watching Television

Figure 4.7 Hours Spent per Day Watching Television

The Positively Skewed Distribution Now let’s examine Table 4.6 and Figure 4.8, displaying the distribution of the

number of siblings of a respondent. Note that the largest number of respondents (312) has two brothers and/or sisters.

Also note that few people report having six or more siblings. Notice also that in this distribution, the mean, the median, and the mode have different values, with the mode having the lowest value (mode = 2.00), the median having the second lowest value (median = 3.0), and the mean having the highest value (mean = 3.07).

Table 4.6 Number of Brothers and Sisters

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.6

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.8

Figure 4.8 Number of Brothers and Sisters

The distribution as depicted in Table 4.6 and Figure 4.8 is positively skewed. As a general rule, for skewed distributions the mean, the median, and the mode do not coincide. The mean, which is always pulled in the direction of extreme scores, falls closest to the tail of the distribution where a small number of extreme scores are located.

The Negatively Skewed Distribution Now examine Table 4.7 and Figure 4.9 for the number of years spent in school

among those respondents who did not finish high school. Here you can see the opposite pattern. The distribution of the number of years spent in school for those without a high school diploma is a negatively skewed distribution. First, note that the largest number of years spent in school are concentrated at the high end of the scale (11 years) and that there are fewer respondents at the low end. The mean, the median, and the mode also differ in values as they did in the previous example.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.6

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.8

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo88

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo110

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.9

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/glossary.xlink.html#glo68

However, here the mode has the highest value (mode = 11.0), the median has the second highest value (median = 10.0), and the mean has the lowest value (mean = 8.89).

Positively skewed distribution A distribution with a few extremely high values. Skewed distribution A distribution with a few extreme values on one side of the distribution. Negatively skewed distribution A distribution with a few extremely low values.

Table 4.7 Years of School Among Respondents Without a High School Degree

Figure 4.9 Years of School Among Respondents Without a High School Degree

Guidelines for Identifying the Shape of a Distribution Following are some useful guidelines for identifying the shape of a distribution.

- In unimodal distributions, when the mode, the median, and the mean coincide or are almost identical, the distribution is symmetrical.

- When the mean is higher than the median (or is positioned to the right of the median), the distribution is positively skewed.

- When the mean is lower than the median (or is positioned to the left of the median), the distribution is negatively skewed.

Learning Check

To identify positively and negatively skewed distributions, look at the tail on the chart. If the tail points to the right (the positive end of the X-axis), the distribution is positively skewed. If the tail points to the left (the negative, or potentially negative, end of the X-axis), the distribution is negatively skewed.

CONSIDERATIONS FOR CHOOSING A MEASURE OF CENTRAL TENDENCY

So far, we have considered three basic kinds of averages: the mode, the median, and the mean. Each can represent the central tendency of a distribution. But which one should we use? The mode? The median? The mean? Or, perhaps, all of them? There is no simple answer to this question. However, in general, we tend to use only one of the three measures of central tendency, and the choice of the appropriate one involves a number of considerations. These considerations and how they affect our choice of the appropriate measure are presented in the form of a decision tree in Figure 4.10.

Level of Measurement One of the most basic considerations in choosing a measure of central tendency is

the variable’s level of measurement. The valid use of any of the three measures requires that the data be measured at the level appropriate for that measure or higher. Thus, as shown in Figure 4.10, with nominal variables our choice is restricted to the mode as a measure of central tendency.

However, with ordinal data, we have two choices: the mode or the median (or sometimes both). Our choice depends on what we want to know about the distribution. If we are interested in showing what is the most common or typical value in the distribution, then our choice is the mode. If, however, we want to show which value is located exactly in the middle of the distribution, then the median is our measure of choice.

When the data are measured on an interval-ratio level, the choice between the appropriate measures is a bit more complex and is restricted by the shape of the distribution.

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.10

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.10

Skewed Distribution When the distribution is skewed, the mean may give misleading information on the

central tendency because its value is affected by extreme scores in the distribution. The median (see, e.g., A Closer Look 4.2) or the mode can be chosen as the preferred measure of central tendency because neither is influenced by extreme scores. For instance, when examining the number of years that GSS respondents without a high school diploma completed (Table 4.7, Figure 4.9), the mean does not provide as accurate a representation of the “typical” number of years that an individual without a high school degree has spent in school as the median and the mode. Thus, either one could be used as an “average,” depending on the research objective.

Figure 4.10 How to Choose a Measure of Central Tendency

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#case4.2

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#tab4.7

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.9

Symmetrical Distribution When the distribution we want to analyze is symmetrical, we can use any of the

three averages. Again, our choice depends on the research objective and what we want to know about the distribution. In general, however, the mean is our best choice because it contains the greatest amount of information and is easier to use in more advanced statistical analyses.

A Closer Look 4.2 A Cautionary Note: Representing Income

Personal income is frequently positively skewed because there are a few people with very high incomes; therefore, the mean may not be the most appropriate measure to represent “average” income. For example, the 2011 American Community Survey—an ongoing survey of economic and income statistics— reported the 2011 mean and median annual earnings of white, black, and Latino households in the United States. In the figure below, we compare the mean and median income for each group.

Source: U.S. Census Bureau, American Community Survey, 2011.

As shown, for all groups, the reported mean is higher than the median. This discrepancy indicates that household income in the United States is highly skewed, with the mean overrepresenting those households in the upper-income bracket and misrepresenting the income of the average household. A preferable alternative (also shown) is to use the median annual earnings of these groups.

Since the earnings of whites are the highest in comparison with all other groups, it is useful to look at each group’s median earnings relative to the earnings of whites. For example, blacks were paid just 58 cents ($32,229/$55,412) and Latinos were paid 70 cents ($38,624/$55,412) for every $1 paid to whites.

M A I N P O I N T S • The mode, the median, and the mean are measures of central tendency—

numbers that describe what is average or typical about the distribution.

- The mode is the category or score with the largest frequency (or percentage) in the distribution. It is often used to describe the most commonly occurring category of a nominal level variable.

- The median is a measure of central tendency that represents the exact middle of the distribution. It is calculated for variables measured on at least an ordinal level of measurement.

- The mean is typically used to describe central tendency in interval-ratio variables, such as income, age, or education. We obtain the mean by summing all the scores and dividing by the total (N) number of scores.

- In a symmetrical distribution, the frequencies at the right and left tails of the distribution are identical. In skewed distributions, there are either a few extremely high (positive skew) or a few extremely low (negative skew) values.

K E Y T E R M S

mean measures of central tendency median mode negatively skewed distribution

percentile positively skewed distribution skewed distribution symmetrical distribution

Sharpen your skills with SAGE edge at edge.sagepub.com/frankfort7e. SAGE edge for students provides a personalized approach to help you accomplish your coursework goals in an easy-to-use learning environment.

Exercises

S P S S D E M O N S T R A T I O N S

[GLOBAL13SSDS]

Demonstration 1: Producing Measures of Central Tendency With Frequencies

The Frequencies command, which we demonstrated in Chapter 2, also has the ability to produce the three measures of central tendency discussed in this chapter. We will use Frequencies to calculate measures of central tendency for SUICIDEFCAT (categorical percentages of international female suicide rates) and WOMENPARCAT (categorical percentages of women holding office at the national level in their respective countries).

Click on Analyze, Descriptive Statistics, then Frequencies. Place SUICIDEFCAT and WOMENPARCAT in the Variable(s) box. Then click on the Statistics button. You will see that the Central Tendency box (Figure 4.11) lists four choices, but we will click on only the first three. Then click on Continue, then on OK to process this request.

The statistics box for WOMENPARCAT and SUICIDEFCAT is displayed here. We have also displayed the frequency distribution for SUICIDEFCAT (Figure 4.12). SUICIDEFCAT is an ordinal variable, which means that the mode and median are

http://edge.sagepub.com/frankfort7e

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0002.xlink.html

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.11

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.12

appropriate measures of central tendency. SPSS has no idea that these variables are measured on an ordinal scale. In other words, SPSS produces exactly the output we asked for, without regard for whether the output is correct for this type of variable. It’s up to you to select the proper measure of central tendency.

The median for the variable SUICIDEFCAT is 3.00, which is close to the mean value. Since we know the value of the median, we can say that roughly half of the countries in our sample have a female suicide rate higher than 3.00 and half have a female suicide rate lower than 3.00. In addition, SUICIDEFCAT is trimodal, which means that there are three modes with the same highest frequency. If two or more categories have exactly the same frequency, SPSS reports the smallest modal value, but it also reports as a footnote that multiple modes exist. By examining the frequency distribution included in Figure 4.12, we can see that female suicide rates were the same for the categories “0.00–2.00,” “4.01–6.00,” and “6.01 or Higher,” thus producing three modes.

Figure 4.11 Statistics Dialog Box

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.12

Figure 4.12 also displays the measures of central tendency for WOMENPARCAT. Since WOMENPARCAT is an ordinal variable, we can use the median and mode to summarize its distribution. The median is 2.0, which corresponds to the response “10.1–30.0%.” We can confirm from the cumulative percentages (not shown) that about 60% of the countries in our sample have between 10% and 30% of women in office at the national level; the remaining 40% (which is not shown here) either have less than 10.0% or more than 30.1% of women in office at the national level. The mode was also “10–30%,” meaning that the majority of countries in our sample had between 10% and 30% of women comprise their national-level office populations.

[GSS10SSDS]

epub://m2d42wotr01ygbgy982w.1.vbk/OEBPS/ch0004.xlink.html#fig4.12

Demonstration 2: Producing Measures of Central Tendency With Descriptives

We begin this exercise by telling the computer that we want our results split by sex. That is, we want separate results for males and females. Select Data, Split File, Organize Output by Groups. Insert the variable sex into the box labeled “Groups Based on.” Click OK. Now SPSS will filter our results by SEX.