- Questions & Answers
- Accounting
- Computer Science
- Automata or Computationing
- Computer Architecture
- Computer Graphics and Multimedia Applications
- Computer Network Security
- Data Structures
- Database Management System
- Design and Analysis of Algorithms
- Information Technology
- Linux Environment
- Networking
- Operating System
- Software Engineering
- Big Data
- Android
- iOS
- Matlab

- Economics
- Engineering
- Finance
- Thesis
- Management
- Science/Math
- Statistics
- Writing
- Dissertations
- Essays
- Programming
- Healthcare
- Law

- Log in | Sign up

SPSS Exercises for MAT 301

TABLE OF CONTENTS

STAT1S: Exercise Using SPSS to Explore Levels of Measurement

Goals of Exercise

Part I—Introduction to Levels of Measurement

STAT2S:Exercise Using SPSS to Explore Measures of Central Tendency and Dispersion

Goals of Exercise

Part I – Measures of Central Tendency

Part II – Deciding Which Measure of Central Tendency to Use

Part III – Measures of Dispersion or Variation

STAT3S: Exercise Using SPSS to Explore Measures of Skewness and Kurtosis

Goals of Exercise

Part I – Measures of Skewness

Part II – Measures of Kurtosis

STAT4S: Exercise Using SPSS to Explore Graphs and Charts

Goals of Exercise

Part I – Pie Charts

Part II – Bar Charts

Part III – Histograms

Part IV – Box Plots

Part V – Conclusions

STAT5S: Exercise Using SPSS to Explore Hypothesis Testing – One-Sample t Test

Goals of Exercise

Part I – Simple Random Sampling

Part II. Hypothesis Testing – the One-Sample T test

Part III. Now It’s Your Turn

STAT6S: Exercise Using SPSS to Explore Hypothesis Testing – Independent-Samples

Goals of Exercise

Part I – Computing Means

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – Independent-Samples t Test

Part IV – Now it’s Your Turn Again

Part V – What Does Independent Samples Mean?

STAT7S: Exercise Using SPSS to Explore Hypothesis Testing – Paired-Samples t Test

Goals of Exercise

Part I – Populations and Samples

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – Paired-Samples t Test

Part IV – Now it’s Your Turn Again

STAT8S: Exercise Using SPSS to Explore Hypothesis Testing – One-Way Analysis of Variance

Goals of Exercise

Part I – Populations and Samples

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – One-Way Analysis of Variance

Part IV – Now it’s Your Turn Again

STAT9S:Exercise Using SPSS to Explore Crosstabulation

Goals of Exercise

Part I—Relationships between Variables

Part II – Interpreting the Percents

Part III – Now it’s Your Turn

Part IV – Adding another Variable into the Analysis

Part V – Now it’s Your Turn Again

STAT10S: Exercise Using SPSS to Explore Chi Square

Goals of Exercise

Part I—Relationships between Variables

Part II – Interpreting the Percents

Part III – Chi Square

Part IV – Now it’s Your Turn

Part V – Expected Values

Part VI – Now it’s Your Turn Again

STAT13S: Exercise Using SPSS to Explore Correlation

Goals of Exercise

Part I – Scatterplots

Part II – Now it’s Your Turn

Part III - Pearson Correlation Coefficient

Part IV – Now it’s Your Turn Again

Part V – Correlation Matrices

Part VI – The Correlation Ratio or Eta-Squared

Part VII – Your Turn

STAT1S: Exercise Using SPSS to Explore Levels of

Measurement

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is

gss14_subset_for_classes_STATISTICS.sav which is a subset of the XXXXXXXXXXGeneral Social Survey.

Some of the variables in the GSS have been recoded to make them easier to use and some new

variables have been created. The data have been weighted according to the instructions from the

National Opinion Research Center. This exercise uses FREQUENCIES in SPSS to introduce the

concept of levels of measurement (nominal, ordinal, interval, and ratio measures). A good

reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler,

John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is on

the Social Science Research and Instructional Council's Website . You have permission to use this

exercise and to revise it to fit your needs. Please send a copy of any revision to the author.

Included with this exercise (as separate files) are more detailed notes to the instructors, the SPSS

syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the

exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format)

Goals of Exercise

The goal of this exercise is to explore the concept of levels of measurement (nominal, ordinal,

interval, and ratio measures) which is an important consideration for the use of statistics. The

exercise also gives you practice in using FREQUENCIES in SPSS.

Part I—Introduction to Levels of Measurement

We use concepts all the time. We all know what a book is. But when we use the word “book” we’re

not talking about a particular book that we’re reading. We’re talking about books in general. In other

words, we’re talking about the concept to which we have given the name “book.” There are many

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT1S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT1S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT1S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT1S.docx

different types of books – paperback, hardback, small, large, short, long, and so on. But they all

have one thing in common – they all belong to the category “book.”

Let’s look at another example. Religiosity is a concept which refers to the degree of attachment

that individuals have to their religious preference. It’s different than religious preference which

refers to the religion with which they identify. Some people say they are Lutheran; others say they

are Roman Catholic; still others say they are Muslim; and others say they have no religious

preference. Religiosity and religious preference are both concepts.

A concept is an abstract idea. So there are the abstract ideas of book, religiosity, religious

preference, and many others. Since concepts are abstract ideas and not directly observable, we

must select measures or indicants of these concepts. Religiosity can be measured in a number of

different ways – how often people attend church, how often they pray, and how important they say

their religion is to them.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national

probability sample of adults in the United States conducted by the National Opinion Research

Center (NORC). The GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since.

For this exercise we’re going to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to

access this data set which is called gss14_subset_for_classes_STATISTICS.sav.

The GSS is an example of a social survey. The investigators selected a sample from the

population of all adults in the United States. This particular survey was conducted in XXXXXXXXXXand is a

relatively large sample of approximately 2,500 adults. In a survey we ask respondents questions

and use their answers as data for our analysis. The answers to these questions are used as

measures of various concepts. In the language of survey research these measures are typically

referred to as variables. Often we want to describe respondents in terms of social characteristics

such as marital status, education, and age. These are all variables in the GSS.

These measures are often classified in terms of their levels of measurement. S. S. Stevens

described measures as falling into one of four categories – nominal, ordinal, interval, or ratio. [1]

Here’s a brief description of each level.

A nominal measure is one in which objects (i.e. in our survey, these would be the respondents)

are sorted into a set of categories which are qualitatively different from each other. For example,

we could classify individuals by their marital status. Individuals could be married or widowed or

divorced or separated or never married. Our categories should be mutually exclusive and

exhaustive. Mutually exclusive means that every individual can be sorted into one and only one

category. Exhaustive means that every individual can be sorted into a category. We wouldn’t want

to use single as one of our categories because some people who are single can also be divorced

and therefore could be sorted into more than one category. We wouldn’t want to leave widowed off

our list of categories because then we wouldn’t have any place to sort these individuals.

The categories in a nominal level measure have no inherent order to them. This means that it

wouldn’t matter how we ordered the categories. They could be arranged in any number of different

ways. Run FREQUENCIES in SPSS for the variable d10_marital so you can see the frequency

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftn1

distribution for a nominal level variable. (See Frequencies in Chapter 4 of the SPSS online book

mentioned on page XXXXXXXXXXIt wouldn’t matter how we ordered these categories.

An ordinal measure is a nominal measure in which the categories are ordered from low to high or

from high to low. We could classify individuals in terms of the highest educational degree they

achieved. Some individuals did not complete high school; others graduated from high school but

didn’t go on to college. Other individuals completed a two-year junior college degree but then

stopped college. Still others completed their bachelor ’s degree and others went on to graduate

work and completed a master ’s degree or their doctorate. These categories are ordered from low to

high.

But notice that while the categories are ordered they lack an equal unit of measurement. That

means, for example, that the differences between categories are not necessarily equal. Run

FREQUENCIES in SPSS for d3_degree. Look at the categories. The GSS assigned values (i.e.,

numbers) to these categories in the following way:

● 0 = less than high school,

● 1 = high school degree,

● 2 = junior college,

● 3 = bachelors, and

● 4 = graduate.

The difference in education between the first two categories is not the same as the difference

between the last two categories. We might think they are because 0 minus 1 is equal to 3 minus 4

but this is misleading. These aren’t really numbers. They’re just symbols that we have used to

represent these categories. We could just as well have labeled them a, b, c, d, and e. They don’t

have the properties of real numbers. They can’t be added, subtracted, multiplied, and divided. All

we can say is that b is greater than a and that c is greater than b and so on.

An interval measure is an ordinal measure with equal units of measurement. For example,

consider temperature measured in degrees Fahrenheit. Now we have equal units of measurement

– degrees Fahrenheit. The difference between XXXXXXXXXXdegrees and XXXXXXXXXXdegrees is the same as the

difference between XXXXXXXXXXdegrees and XXXXXXXXXXdegrees. Now the numbers have the properties of real

numbers and we can add them and subtract them. But notice one thing about the Fahrenheit scale.

There is no absolute zero point. There can be both positive and negative temperatures. That

means that we can’t compare values by taking their ratios. For example, we can’t divide 80

degrees Fahrenheit by XXXXXXXXXXdegrees and conclude that XXXXXXXXXXis twice as hot at XXXXXXXXXXTo do that we would

need a measure with an absolute zero. [2]

A ratio measure is an interval measure with an absolute zero point. Run FREQUENCIES for

d9_sibs which is the number of siblings. This variable has an absolute zero point and all the

properties of nominal, ordinal, and interval measures and therefore is a ratio variable.

Notice that level of measurement is itself ordinal since it is ordered from low (nominal) to high

(ratio). It’s what we call a cumulative scale. Each level of measurement adds something to the

previous level.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftn2

Why is level of measurement important? One of the things that helps us decide which statistic to

use is the level of measurement of the variable(s) involved. For example, we might want to

describe the central tendency of a distribution. If the variable was nominal, we would use the

mode. If it was ordinal, we could use the mode or the median. If it was interval or ratio, we could

use the mode or median or mean. Central tendency will be the focus of another exercise

( STAT2S_pspp ).

Run FREQUENCIES for the following variables in the GSS:

● f4_satfin,

● f11_wealth.

● hap2_happy,

● p1_partyid,

● r1_relig,

● r4_denom,

● r8_reliten,

● s1_nummen,

● s2_numwomen,

● s9_premarsx, and

● d1_age.

For each variable, decide which level of measurement it represents and write a sentence or two

indicating why you think it is that level. Keep in mind that we’re only considering what SPSS calls

the valid responses. The missing responses represent missing data (e.g., don’t know or no answer

responses).

[1] Stanley Smith Stevens, 1946, “On the Theory of Scales of Measurement,” Science XXXXXXXXXX),

pp XXXXXXXXXX.

[2] You might wonder why we didn’t use an example from the GSS. There isn’t one. They don’t

occur in social science research very often. There are examples from the field of business. Think

about profit for businesses over a fiscal year. There is no absolute zero. Profit could be positive or

negative.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftnref2

STAT2S:Exercise Using SPSS to Explore Measures of Central

Tendency and Dispersion

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES in SPSS to explore measures of central tendency and dispersion. A good reference on

using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward

Nelson (Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research

and Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore measures of central tendency (mode, median, and mean) and

dispersion (range, interquartile range, standard deviation, and variance). The exercise also gives you

practice in using FREQUENCIES in SPSS.

Part I – Measures of Central Tendency

Data analysis always starts with describing variables one-at-a-time. Sometimes this is referred to as

univariate (one-variable) analysis. Central tendency refers to the center of the distribution.

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT2S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT2S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT2S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT2S.docx

There are three commonly used measures of central tendency – the mode, median, and mean of a

distribution. The mode is the most common value or values in a distribution [1] . The median is the middle

value of a distribution. [2] The mean is the sum of all the values divided by the number of values.

Run FREQUENCIES in SPSS for the variable d9_sibs. (See Chapter 4, Frequencies in the online SPSS

book mentioned on page XXXXXXXXXXOnce you have selected this variable click on the “Statistics” button and

check the boxes for mode, median, and mean. Then click on “Continue” and click on the “Charts” button.

Select “Histogram” and check the box for “Show normal curve on histograms.” Then click on “Continue.”

That will take you back to the screen where you selected the variable. Click on “OK” and SPSS will open

the Output window and display the results that you requested.

Your output will display the frequency distribution for d9_sibs and a box showing the mode, median, and

mean with the following values displayed.

● Mode = 2 meaning that two brothers and sisters was the most common answer XXXXXXXXXX%) from the

2,531 respondents who answered this question. However, not far behind are those with one sibling

(18.6%) and those with three siblings XXXXXXXXXX%). So while technically two siblings is the mode, what

you really found is that the most common values are one, two, and three siblings. Another part of

your output is the histogram which is a chart or graph of the frequency distribution. The histogram

clearly shows that one, two, and three are the most common values (i.e., the highest bars in the

histogram). So we would want to report that these three categories are the most common

responses.

● Median = 3 which means that three siblings is the middle category in this distribution. The middle

category is the category that contains the 50 th percentile which is the value that divides the

distribution into two equal parts. In other words, it’s the value that has 50% of the cases above it

and 50% of the cases below it. The cumulative percent column of the frequency distribution tells

you that 41.4% of the cases have two or fewer siblings and that 59.3% of the cases have three or

fewer siblings. So the middle case (i.e., the 50 th percentile) falls somewhere in the category of

three siblings. That is the median category.

● Mean = XXXXXXXXXXwhich is the sum of all the values in the distribution divided by the number of

responses. If you were to sum all these values that sum would be 9, XXXXXXXXXXDividing that by the

number of responses or 2,531 will give you the mean of 3.74.

Part II – Deciding Which Measure of Central Tendency to Use

The first thing to consider is the level of measurement (nominal, ordinal, interval, ratio) of your variable (see

Exercise STAT1S).

● If the variable is nominal, you have only one choice. You must use the mode.

● If the variable is ordinal, you could use the mode or the median. You should report both measures

of central tendency since they tell you different things about the distribution. The mode tells you

the most common value or values while the median tells you where the middle of the distribution

lies.

● If the variable is interval or ratio, you could use the mode or the median or the mean. Now it gets a

little more complicated. There are several things to consider.

○ How skewed is your distribution? [3] Go back and look at the histogram for d9_sibs. Notice

that there is a long tail to the right of the distribution. Most of the values are at the lower

level – one, two, and three siblings. But there are quite a few respondents who report

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn3

having four or more siblings and about 5% said they have ten or more siblings. That’s

what we call a positively skewed distribution where there is a long tail towards the right or

the positive direction. Now look at the median and mean. The mean XXXXXXXXXXis larger than

the median XXXXXXXXXXThe respondents with lots of siblings pull the mean up. That’s what

happens in a skewed distribution. The mean is pulled in the direction of the skew. The

opposite would happen in a negatively skewed distribution. The long tail would be towards

the left and the mean would be lower than the median. In a heavily skewed distribution the

mean is distorted and pulled considerably in the direction of the skew. So consider

reporting only the median in a heavily skewed distribution. That’s why you almost always

see median income reported and not mean income. Imagine what would happen if your

sample happened to include Bill Gates. The income distribution would have this very, very

large value which would pull the mean up but not affect the median.

○ Is there more than one clearly defined peak in your distribution? The number of siblings

has one clearly defined peak – one, two and three siblings. But what if there is more than

one clearly defined peak? For example, consider a hypothetical distribution of XXXXXXXXXXcases

in which there are XXXXXXXXXXcases with a value of two and fifty cases with a value of XXXXXXXXXXThe

median and mean would be five but there are really two centers of this distribution – two

and eight. The median and the mean aren’t telling the correct story about the center.

You’re better off reporting the two clearly defined peaks of this distribution and not

reporting the median and mean.

○ If your distribution is normal in appearance then the mode, median, and mean will all be

about the same. A normal distribution is a perfectly symmetrical distribution with a single

peak in the center. No empirical distribution is perfectly normal but distributions often are

approximately normal. Here we would report all three measures of central tendency. Go

back to your SPSS output and look at the histogram for d9_sibs. When you told SPSS to

give you the histogram you checked the box that said “Show normal curve on histograms.”

SPSS then superimposed the normal curve on the histogram. The normal curve doesn’t fit

the histogram perfectly particularly at the lower end but it does suggest that it

approximates a normal curve particularly at the upper end.

Run FREQUENCIES for the following variables. Once you have selected the variables click on the

“Statistics” button and check the boxes for mode, median, and mean. Then click on “Continue” and click

on the “Charts” button. Select “Histogram” and check the box for “Show normal curve on histograms.”

Then click on “Continue.” That will take you back to the screen where you selected the variables. Click on

“OK” and SPSS will open the Output window and display the results of what you requested. For each

variable write a sentence or two indicating which measure(s) of central tendency would be appropriate to

use to describe the center of the distribution and what the values of those statistics mean.

● hap2_happy

● p1_partyid

● r8_reliten

● s1_nummen

● s2_numwomen

● d1_age

Part III – Measures of Dispersion or Variation

Dispersion or variation refers to the degree that values in a distribution are spread out or dispersed. The

measures of dispersion that we’re going to discuss are appropriate for interval and ratio level variables (see

Exercise STAT1S). [4] We’re going to discuss four such measures – the range, the inter-quartile range, the

variance, and the standard deviation.

The range is the difference between the highest and the lowest values in the distribution. Run

FREQUENCIES for d1_age and compute the range by looking at the frequency distribution. You can also

ask SPSS to compute it for you. Click on “Statistics” and then click on “Range.” You should get XXXXXXXXXXwhich

is 89 – XXXXXXXXXXThe range is not a very stable measure since it depends on the two most extreme values – the

highest and lowest values. These are the values most likely to change from sample to sample.

A more stable measure of dispersion is the interquartile range which is the difference between the third

quartile (Q3) and the first quartile (Q1). The third quartile is the same thing as the seventy-fifth percentile

which is the value that has 25% of the cases above it and 75% of the cases below it. The first quartile is

the same as the twenty-fifth percentile which is the value that has 75% of the cases above it and 25% of

the cases below it. SPSS will calculate Q3 and Q1 for you. Click on the “Statistics” button and then click

on “Quartiles” in the “Percentiles” box in the upper left. Once you know Q3 and Q1 you can calculate the

interquartile range by subtracting Q1 from Q3. Since it’s not based on the most extreme values it will be

more stable from sample to sample. Go back to SPSS and calculate Q3 and Q1 for d1_age and then

calculate the interquartile range. Q3 will equal XXXXXXXXXXand Q1 will equal XXXXXXXXXXand the interquartile range will equal

60 – XXXXXXXXXXor 27.

The variance is the sum of the squared deviations from the mean divided by the number of cases minus 1

and the standard deviation is just the square root of the variance. Your instructor may want to go into more

detail on how to calculate the variance by hand. SPSS will also calculate it for you. Click on the “Statistics”

button and then click on “Variance” and on “Standard deviation.” The variance should equal XXXXXXXXXXand the

standard deviation will equal XXXXXXXXXX.

The variance and the standard deviation can never be negative. A value of 0 means that there is no

variation or dispersion at all in the distribution. All the values are the same. The more variation there is, the

larger the variance and standard deviation.

So what does the variance XXXXXXXXXXand the standard deviation XXXXXXXXXXof the age distribution mean?

That’s hard to answer because you don’t have anything to compare it to. But if you knew the standard

deviation for both men and women you would be able to determine whether men or women have more

variation. Instead of comparing the standard deviations for men and women you would compute a statistic

called the Coefficient of Relative Variation (CRV). CRV is equal to the standard deviation divided by the

mean of the distribution. A CRV of 2 means that the standard deviation is twice the mean and a CRV of

0.5 means that the standard deviation is one-half of the mean. You would compare the CRV’s for men and

women to see whether men or women have more variation relative to their respective means.

You might also have wondered why you need both the variance and the standard deviation when the

standard deviation is just the square root of the variance. You’ll just have to take my word for it that you will

need both as you go further in statistics.

Run FREQUENCIES for the following variables. Once you have selected the variables click on the

“Statistics” button and check the boxes for quartiles, range, variance, standard deviation, and mean. Then

click on “Continue.” That will take you back to the screen where you selected the variables. Click on “OK”

and SPSS will open the Output window and display the results of what you requested. For each variable

write a sentence or two indicating what the values of these statistics are for each of the variables and what

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn4

the values of those statistics mean. Compare the relative variation for the number of male sex partners

since the age of XXXXXXXXXXs1_nummen) and the number of female sex partners (s2_numwomen) by comparing

the CRV’s for each variable.

● s1_nummen

● s2_numwomen

● d9_sibs

[1] Frequency distributions can be grouped or ungrouped. Think of age. We could have a distribution that

lists all the ages in years of the respondents to our survey. One of the variables (d1_age) in our data set

does this. But we could also divide age into a series of categories such as under 30, XXXXXXXXXXto 39, XXXXXXXXXXto 49, 50

to 59, XXXXXXXXXXto 69, and XXXXXXXXXXand older. In a grouped frequency distribution the mode would be the most common

category or categories.

[2] In a grouped frequency distribution the median would be the category that contains the middle value.

[3] See Exercise STAT3S for a more thorough discussion of skewness.

[4] The Index of Qualitative Variation can be used to measure variation for nominal variables.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref4

STAT3S: Exercise Using SPSS to Explore Measures of

Skewness and Kurtosis

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES in SPSS to explore measures of skewness and kurtosis. A good reference on using

SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson

(Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word;

Goals of Exercise

The goal of this exercise is to explore measures of skewness and kurtosis. The exercise also gives you

practice in using FREQUENCIES in SPSS.

Part I – Measures of Skewness

A normal distribution is a unimodal (i.e., single peak) distribution that is perfectly symmetrical. In a normal

distribution the mean, median, and mode are all equal. Here’s a graph showing what a normal distribution

looks like.

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT3S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT3S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT3S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT3S.docx

The horizontal axis is marked off in terms of standard scores where a standard score tells us how many

standard deviations a value is from the mean of the normal distribution. So a standard score of XXXXXXXXXXis one

standard deviation above the mean and a standard score of XXXXXXXXXXis one standard deviation below the mean.

The percents tell us the percent of cases that you would expect between the mean and a particular

standard score if the distribution was perfectly normal. You would expect to find approximately 34% of the

cases between the mean and a standard score of XXXXXXXXXXor XXXXXXXXXXIn a normal distribution, the mean, median, and

mode are all equal and are at the center of the distribution. So the mean always has a standard score of

zero.

Skewness measures the deviation of a particular distribution from this symmetrical pattern. In a skewed

distribution one side has longer or fatter tails than the other side. If the longer tail is to the left, then it is

called a negatively skewed distribution. If the longer tail is to the right, then it is called a positively skewed

distribution. One way to remember this is to recall that any value to the left of zero is negative and any

value to the right of zero is positive. Here are graphs of positively and negatively skewed distributions

compared to a normal distribution.

The best way to determine the skewness of a distribution is to tell SPSS to give you a histogram along with

the mean and median. SPSS will also compute a measure of skewness. Run FREQUENCIES in SPSS for

the variables d1_age and d9_sibs. (See Frequencies in Chapter 4 of the online SPSS book mentioned on

page XXXXXXXXXXClick on the “Charts” button and select “Histogram” and “Show normal curve on histogram.” Then

click on “Continue.” Now click on “Statistics” and select mean, median, skewness and kurtosis. Then click

on “Continue” and on “OK.” We’ll talk about kurtosis in a little bit.

Notice that the mean is larger than the median for both variables. This means that the distribution is

positively skewed. But also notice that the mean for d9_sibs is quite a bit larger than the median in a

relative sense than is the case for d1_age. This suggests that the distribution for d9_sibs is the more

skewed of the two variables. Look at the histograms and you’ll see the same thing. Both variables are

positively skewed but d9_sibs is the more skewed variable. Now look at the skewness values — XXXXXXXXXXfor

d9_sibs and XXXXXXXXXXfor d1_age. The larger the skewness value, the more skewed the distribution. Positive

skewness values indicate a positive skew and negative values indicate a negative skew. There are various

rules of thumb suggested for what constitutes a lot of skew but for our purposes we’ll just say that the

larger the value, the more the skewness and the sign of the value indicates the direction of the skew.

Run FREQUENCIES for the following variables. Tell SPSS to give you the histogram and to show the

normal curve on the histogram. Also ask for the mean, median, and skewness. Write a paragraph for each

variable explaining what these statistics tell you about the skewness of the variables.

● d20_hrsrelax

● tv1_tvhours

Part II – Measures of Kurtosis

Kurtosis refers to the flatness or peakedness of a distribution relative to that of a normal distribution.

Distributions that are flatter than a normal distribution are called platykurtic and distributions that are more

peaked are called leptokurtic.

SPSS will compute a kurtosis measure. Negative values indicate a platykurtic distribution and positive

values indicate a leptokurtic distribution. The larger the kurtosis value, the more peaked or flat the

distribution is.

Look back at the output for d1_age and d9_sibs. For d1_age the kurtosis value was XXXXXXXXXXindicating a

flatter distribution and for d9_sibs kurtosis was XXXXXXXXXXindicating a more peaked distribution. To see this

visually look at your histograms.

Run FREQUENCIES for the following variables. Tell SPSS to give you the histogram and to show the

normal curve on the histogram. Also ask for kurtosis. Write a paragraph for each variable explaining what

these statistics tell you about the kurtosis of the variables.

● d22_maeduc

● d24_paeduc

● s6_sexfreq

STAT4S: Exercise Using SPSS to Explore Graphs and Charts

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES and EXPLORE in SPSS to explore different ways of creating graphs and charts. A good

reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey,

Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is on the Social Science

Research and Instructional Council's Website . You have permission to use this exercise and to revise

it to fit your needs. Please send a copy of any revision to the author. Included with this exercise (as

separate files) are more detailed notes to the instructors, the SPSS syntax necessary to carry out the

exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS output file). Please contact the

author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore different ways of graphing frequency distributions. The exercise also

gives you practice in using FREQUENCIES and EXPLORE in SPSS.

Part I – Pie Charts

A pie chart is a chart that shows the frequencies or percents of a variable with a small number of

categories. It is presented as a circle divided into a series of slices. The area of each slice is proportional

to the number of cases or the percent of cases in each category. It is normally used with nominal or ordinal

variables (see Exercise STAT1S) but can be used with interval or ratio variables which have a small

number of categories.

Run FREQUENCIES in SPSS for the variables p1_partyid, p4_polviews, and d12_childs. (See Chapter 4,

Frequencies in the online SPSS book mentioned on page XXXXXXXXXXClick on “Charts” and select “Pie charts.”

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT4S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT4S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT4S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT4S.docx

Notice that there is an option called “Chart Values” that allows you to select whether you want your table to

include “Percentages” or “Frequencies.” Usually you want to select “Percentages.”

Once SPSS has displayed the pie chart in the output window, you can double click anywhere inside the pie

chart to open the “Chart Editor.” Once you have opened the “Chart Editor” right-click anywhere inside one

of the pie slices in the “Chart Editor” and you will see a list of different ways you can edit your pie chart.

Click on “Show Data Labels” and then click on the “Data Value Labels” tab. If “Percent” is not listed in the

“Displayed” box, move it to that box and click on “Apply” and then “Close.” If it is listed in the “Displayed”

box, just click on close. This will close the “Properties” box. Click anywhere outside the “Chart Editor” and

you will see your edited pie chart. There are lots of other ways you could edit your chart. Explore some of

them if you are curious.

If you are wondering why you shouldn’t use pie charts for variables with a large number of categories,

create a pie chart for d1_age and you’ll see why.

Part II – Bar Charts

A bar chart is a chart that shows the frequencies or percents of a variable and is presented as a series of

vertical bars that do not touch each other. The height of each bar is proportional to the number of cases or

the percent of cases in each category. It is normally used with nominal or ordinal variables.

Run FREQUENCIES for the variables p1_partyid and p4_polviews. This time click on “Charts” and select

“Bar charts.” Select “Percentages” to display percents in the chart.

Part III – Histograms

A histogram is a graph that shows the frequencies or percents of a variable with a larger number of

categories. It is presented as a series of vertical bars that touch each other. The height of each bar is

proportional to the number of cases or the percent of cases in each category. It is used with interval or

ratio variables.

Run FREQUENCIES for the variables d1_age, d4_educ, and d12_childs [1] . Click on “Charts” and select

“Histogram.”

Look at the histogram for d1_age. Let’s say you want to redefine the width of each vertical bar.

Double-click anywhere inside the histogram which will open the “Chart Editor.” Now right click anywhere

inside the rectangles in the “Chart Editor” and click on “Properties Window.” This will open the “Properties”

box. Click on the tab for “Binning.” Click on “Custom” and “Interval width” under “X Axis.” Enter XXXXXXXXXXin the

“Interval width” box indicating that you want each vertical bar to represent an interval width of ten years.

Where do we want the first interval to start? We could let SPSS decide but let’s make the decision

ourselves. Click on “Custom value for anchor” and enter XXXXXXXXXXin the box. Click on “Apply” and look at your

histogram. Does it look how you want it to look? Is there any further editing you want to do? If you are

satisfied, click on “Close” to close the “Properties” box. Click anywhere outside the “Chart Editor” box and

you will see your edited histogram.

Part IV – Box Plots

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/496#_ftn1

A box plot is a graph that displays visually a number of characteristics of a frequency distribution:

● the third quartile (Q3),

● the first quartile (Q1),

● the interquartile range (IQR),

● the median,

● the range,

● outliers, and

● extreme values.

Run EXPLORE for d1_age, d4_educ, and d12_childs. (See Chapter 4, Explore in the online SPSS book.)

You can use the default settings for EXPLORE so all you have to do is click “OK” after you have selected

your variables.

The first thing you will see is various descriptive statistics for each variable. You’re probably familiar with

most of these. Then you’ll see the stem-and-leaf display which we’re not going to discuss. The last thing

you’ll see is the box plot. Let’s look at the boxplot for d1_age. The box is bounded at the top by the third

quartile (Q3) and at the bottom by the first quartile (Q1). The height of the box (Q3 – Q1) is the interquartile

range. The horizontal line inside the box represents the median. There are two vertical lines coming out of

the box. This line extends upward to the maximum value and downward to the minimum value. The

difference between the maximum and minimum values is the range.

You can also learn about skewness from the box plot. In a non-skewed distribution, the median will be in

the middle of the box halfway between the third and first quartiles. In a skewed distribution the median will

be either higher or lower in the box. Notice that for d1_age and d4_educ the median is in the middle of the

box suggesting that these distributions are not very skewed but for d12_childs the median is in the upper

part of the box suggesting that this is a positively skewed distribution.

Now look at the box plots for d4_educ and d12_childs. Here you’ll see some circles and numbers. The

circles represent outliers which are values that lie between XXXXXXXXXXand XXXXXXXXXXbox lengths above the third quartile

or below the first quartile. A box length is just another name for the interquartile range since the height of

the box is the interquartile range. The numbers are the case numbers in SPSS. Extreme values are

values that are more than XXXXXXXXXXbox lengths from the first or third quartiles. There aren’t any extreme values

in these distributions.

Sometimes you want to compare box plots for two or more groups of respondents. Let’s look at the box

plot for d1_age and compare the box plots for men and women. Run EXPLORE for d1_age but this time

put d5_sex in the “Factor List” box. Your output should now show the box plots for men and women

side-by-side.

Part V – Conclusions

We have talked about four different types of graphs – pie charts, bar charts, histograms, and box plots.

There are other types of graphs you could use but these are the four most commonly used graphs. There

are other ways to construct graphs in SPSS that your instructor might want to talk about. You can click on

“Graphs” in the menu bar at the top of the SPSS screen and then on “Chart Builder” but we aren’t going to

go into that in this exercise.

[1] There is a small problem with d12_childs. One of the categories is “eight or more” children. That means

we don’t know what these values actually are. They could be 8 or XXXXXXXXXXor XXXXXXXXXXor XXXXXXXXXXor something else. Since

there are so few cases in this category we’re going to ignore this problem.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/496#_ftnref1

STAT5S: Exercise Using SPSS to Explore Hypothesis Testing –

One-Sample t Test

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (one-sample t test) and SELECT CASES in SPSS to explore hypothesis testing and the

one-sample t test. A good reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by

Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is

on the Social Science Research and Instructional Center's Website . You have permission to use this

exercise and to revise it to fit your needs. Please send a copy of any revision to the author. Included with

this exercise (as separate files) are more detailed notes to the instructors, the SPSS syntax necessary to

carry out the exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS output file). Please

contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the one-sample t test. The exercise also gives

you practice in using COMPARE MEANS (one-sample t test) and SELECT CASES in SPSS.

Part I – Simple Random Sampling

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT5S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT5S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT5S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT5S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

There are many ways of selecting a probability sample but the most basic type of probability sample is a

simple random sample in which everyone in the sample has the same chance of being selected in the

sample. SPSS will select a simple random sample for you. We’re going to use the General Social Survey

(GSS) for this exercise. The GSS is a national probability sample of adults in the United States conducted

by the National Opinion Research Center (NORC). The GSS started in XXXXXXXXXXand has been an annual or

biannual survey ever since. For this exercise we’re going to use a subset of the XXXXXXXXXXGSS. Your instructor

will tell you how to access this data set which is called gss14_subset_for_classes_STATISTICS.sav. It’s a

large sample of about 2,500 individuals. To illustrate simple random sampling, we’re going to select a

simple random sample of 30% of all the individuals in the GSS. [1]

Start by getting a frequency distribution for the variable d4_educ which is the last year of school completed

by the respondent. (See, Chapter 4, Frequencies in the online SPSS book mentioned on page XXXXXXXXXXYou’ll

see that there are a total of 2,538 cases. One of those cases said he or she didn’t know. That means

there are 2,537 valid cases that answered the question.

Now click on Data in the menu bar at the top of the screen. (See Chapter 3, Select Cases in the online

SPSS book.) This will open a drop-down box. Click on SELECT CASES. Then click on “Random sample

of cases” and then on “Sample” in the box below. One of the options will already be selected and will say

“Approximately [box] % of all cases.” Fill in XXXXXXXXXXin the box indicating that you want to select a simple random

sample of 30% of all the cases in the GSS. Click on “Continue” and then on “OK.” Now run

FREQUENCIES again for the variable, d4_educ. Your sample will be smaller than before. This is a

random sample of all the cases in the GSS.

Part II. Hypothesis Testing – the One-Sample T test

Let’s think about our variable, d4_educ. What do we know about education in the United States? One

thing we know is that the average years of school completed has been increasing over the twentieth and

twenty-first centuries. It used to be that many people stopped after completing high school which would be

12 years of education. Now more go on to college. So we would hypothesize that the mean years of

school completed is now greater than XXXXXXXXXXHow could we test that hypothesis? We need a statistical

procedure to do that. The t test is one of a number of statistical tests that we can use to test such

hypotheses.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn1

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We can calculate the mean years of school completed by all the adults in the sample who answered the

question. But we want to test the hypothesis that the mean years of school completed in the population of

all adults is greater than XXXXXXXXXXWe’re going to use our sample data to test a hypothesis about the

population. [2]

What do we know about sampling? We know that no sample is ever a perfect representation of the

population from which the sample is drawn. This is because every sample contains some amount of

sampling error. Sampling error is inevitable. There is always some amount of sampling error present in

every sample. Another thing we know is that the larger the sample size, the less the sampling error.

So the hypothesis we want to test is that the mean years of school completed in the population is greater

than XXXXXXXXXXWe’ll call this our research hypothesis. It’s what we expect to be true. But there is no way to

prove the research hypothesis directly. So we’re going to use a method of indirect proof. We’re going to

set up another hypothesis that says that the research hypothesis is not true and call this the null

hypothesis. [3] In our case, the null hypothesis would be that the mean years of school completed in the

population is equal to XXXXXXXXXXIf we can reject the null hypothesis then we have evidence to support the

research hypothesis. If we can’t reject the null hypothesis then we don’t have any evidence in support of

the research hypothesis. You can see why this is called a method of indirect proof. We can’t prove the

research hypothesis directly but if we can reject the null hypothesis then we have indirect evidence that

supports the research hypothesis.

Here are our two hypotheses.

● research hypothesis – the population mean is greater than 12

● null hypothesis – the population mean is equal to 12

It’s the null hypothesis that we are going to test.

Before we carry out the t test, let’s make sure we are using the full GSS sample and not the 30% simple

random sample. Click on “Data” and on “Select Cases.” Select “All cases” and then click on OK. Now you

are using all the cases.

Now click on “Analyze” in the menu bar which will open a drop-down menu. Click on “Compare Means”

which will open another drop-down menu and click on “One-Sample T Test.” Move the variable, d4_educ,

over to the “Test Variable(s)” box on the right. Below the box on the right you will see a box called “Test

Value.” This is where we enter the value specified in the null hypothesis which in our case is XXXXXXXXXXAll you

have to do now is click on OK.

You should see two output boxes. The first box will have four values in it.

● N is the number of cases for which we have valid information [4] (i.e., the number of respondents

who answered the question). In this problem, N equals 2,537.

● Mean is the mean years of school completed by the respondents in the sample who answered the

question (see STAT2S). In this problem, the sample mean equals XXXXXXXXXX.

● Standard Deviation is a measure of dispersion (see STAT2S). In this problem, the standard

deviation equals XXXXXXXXXX.

● Standard Error of the Mean is an estimate of how much sampling error there is. In this problem,

the standard error equals .061.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn4

The second box will have five values in it.

● t is the value of the t test

● df is the number of degrees of freedom

● Significance (2-tailed) value

● Mean Difference

● 95% Confidence Interval of the Difference which we’re going to discuss in a later exercise

There is a formula for calculating the value of t in the t test. Your instructor may or may not want you to

learn how to calculate the value of t. I’m going to leave it to your instructor to do this. In this problem t

equals XXXXXXXXXX.

Degrees of freedom (df) is the number of values that are free to vary. If the sample mean equals XXXXXXXXXX

then how many values are free to vary? The answer is N – 1 which is 2,537 – 1 or 2, XXXXXXXXXXSee if you can

figure out why it’s 2, XXXXXXXXXXYour instructor will help you if you are having trouble figuring it out.

The significance value is a probability. It’s the probability that you would be wrong if you rejected the null

hypothesis. It’s XXXXXXXXXXwhich you would think is telling you that there is no chance of being wrong if you

rejected the null hypothesis. But it’s actually a rounded value and it means that the probability is less than

XXXXXXXXXXor less than five in ten thousand. So there is a chance of being wrong but it’s really, really small.

The mean difference is the difference between the sample mean XXXXXXXXXXand the value specified in the null

hypothesis XXXXXXXXXXSo it’s XXXXXXXXXX – XXXXXXXXXXor 1.68. [5] That’s the amount that your sample mean differs from the

value in the null hypothesis. If it’s positive, then your sample mean is larger than the value in the null and if

it’s negative, then your sample mean is smaller than the value in the null.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is less than XXXXXXXXXXThat tells you that the

probability of being wrong if you rejected the null hypothesis is less than five out of ten thousand. With

odds like that, of course, we’re going to reject the null hypothesis. A common rule is to reject the null

hypothesis if the significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the population mean was greater than XXXXXXXXXXWe’re

actually predicting the direction of the difference. We’re predicting that the population mean will be greater

than XXXXXXXXXXThat’s called a one-tailed test and we have to use a one-tailed significance value. It’s easy to get

the one-tailed significance value if we know the two-tailed significance value. If the two-tailed significance

value is less than XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or XXXXXXXXXX.

We still reject the null hypothesis which means that we have evidence to support our research hypothesis.

We haven’t proven the research hypothesis to be true but we have evidence to support it.

Part III. Now It’s Your Turn

There is another variable in the XXXXXXXXXXGSS called d18_hrs1 which is the number of hours that the

respondent worked last week if he or she was employed. Many people have suggested that Americans

are working longer hours than they used to. Since the traditional work week is XXXXXXXXXXhours, if it’s true that

we’re working more hours our research hypothesis would be that the mean number of hours worked last

week would be greater than XXXXXXXXXXDo a one-sample t test to test this hypothesis. For each value in the

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn5

output, explain what it means. Then decide whether you should reject or not reject the null hypothesis and

what this tells you about the research hypothesis.

I’ll tell you that you should reject the null hypothesis even though the mean difference was less than one

hour. You might wonder why you reject the null hypothesis when the mean difference is so small. Notice

that we have a large sample (N = 1, XXXXXXXXXXLet’s see what happens when we have a sample that’s only 10%

of that size. Take a simple random sample of 10% of the total sample. (Look back at Part I to see how to

do this.) Now we have a much smaller sample size. Rerun your t test and see what happens with a

smaller sample. For each value in the output, explain what it means. Then decide whether you should

reject or not reject the null hypothesis and what this tells you about the research hypothesis.

Now you probably won’t be able to reject the null hypothesis. [6] Why? Remember that we said the larger

the sample, the less the sampling error. If there is less sampling error, it’s going to be easier to reject the

null hypothesis. You can see this by looking at the standard error of the mean. It will probably be smaller in

the larger sample and bigger in the smaller sample. So when you have a really large sample don’t get too

excited when you reject the null hypothesis even though you have only a small mean difference.

[1] The GSS it itself not a simple random sample but rather is an example of a multistate cluster sample.

[2] Characteristics of a sample are called statistics while characteristics of a population are called

parameters.

[3] The null hypothesis is often called the hypothesis of no difference. We’re saying that the population

mean is still equal to XXXXXXXXXXIn other words, nothing has changed. There is no difference.

[4] Missing cases would include those who said they didn’t know or refused to answer the question.

[5] By the way, the value of the mean XXXXXXXXXXis a rounded value so that’s why the mean difference isn’t

exactly 1.68.

[6] Why probably? Because by chance you could get a much higher or lower mean which will produce a

larger t value and could mean that your significance value would be low enough to reject the null

hypothesis.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn6

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref4

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref5

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref6

STAT6S: Exercise Using SPSS to Explore Hypothesis Testing –

Independent-Samples

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (means and independent-samples t test) to explore hypothesis testing. A good reference on using

SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson

(Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the independent-samples t test. The exercise

also gives you practice in using COMPARE MEANS.

Part I – Computing Means

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT6S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT6S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT6S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT6S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

Let’s start by asking two questions.

● Do men and women differ in the number of years of school they have completed?

● Do men and women differ in the number of hours they worked in the last week?

Click on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6,

introduction in the online SPSS book mentioned on page XXXXXXXXXXSelect the variables d4_educ and d18_hrs1

and move them to the “Dependent List” box. These are the variables for which you are going to compute

means. Then select the variable d5_sex and move it to the “Independent List” box. This is the variable

which defines the groups you want to compare. In our case we want to compare men and women. The

output from SPSS will show you the mean, number of cases, and standard deviation for men and women

for these two variables.

Men and women differ very little in the number of years of school they completed. Men have completed a

little less than one-tenth of a year more than women. But men worked quite a bit more than women in the

last week – a difference of almost six hours. By the way, only respondents who are employed are included

in this calculation but both part-time and full-time employees are included.

Why can’t we just conclude that men and women have about the same education and that men work more

than women? If we were just describing the sample , we could. But what we want to do is to make

inferences about differences between men and women in the population . We have a sample of men and

a sample of women and some amount of sampling error will always be present in both samples. The larger

the sample, the less the sampling error and the smaller the sample, the more the sampling error. Because

of this sampling error we need to make use of hypothesis testing as we did in the previous exercise

(STAT5S).

Part II – Now it’s Your Turn

In this part of the exercise you want to compare men and women to answer these two questions.

● Do men and women differ in the number of hours per day they have to relax? This is variable

d20_hrsrelax in the GSS.

● Do men and women differ in the number of hours per day they watch television? This is variable

tv1_tvhours in the GSS.

Use SPSS to get the sample means and then compare them to begin answering these questions.

Part III – Hypothesis Testing – Independent-Samples t Test

In Part I we compared the mean scores for men and women for the following variables.

● d4_educ

● d18_hrs1

Now we want to determine if that difference is statistically significant by carrying out the

independent-samples t test.

A t test is used when you want to compare two groups. The “grouping variable” defines these two groups.

The variable, d5_sex, is a dichotomy. It has only two categories – male (value XXXXXXXXXXand female (value XXXXXXXXXXBut

any variable can be made into a dichotomy by establishing a cut point or by recoding. For example, the

variable f4_satfin (satisfaction with financial situation) has three categories – satisfied (value 1), more or

less satisfied (value 2), and not at all satisfied (value XXXXXXXXXXThe cut point is the value that makes this into a

dichotomy. All values less than the cut point are in one category and all values equal to or larger than the

cut point are in the other category. If your cut point is 3, then values 1 and 2 are in one category and value

3 is in the other category.

Click on “Analyze” and then on “Compare Means” and finally on “Independent-Samples T Test.” (See

Chapter 6, independent-samples t test in the online SPSS book.) Move the two variables listed above into

the “Test Variable(s)” box. These are the variables for which you want to compute the mean scores. Right

below the “Test Variable(s)” box is the “Grouping Variable” box. This is where you indicate which variable

defines the groups you want to compare. In this problem the grouping variable is d5_sex. Once you have

entered the grouping variable, then enter either the values of the two groups or the cut point.

In our case, you would enter 1 for male into Group 1 and 2 for females into Group XXXXXXXXXXIt wouldn’t matter

which was Group 1 and which was Group XXXXXXXXXXFinally click on “OK.”

You should see two boxes in the output screen. The first box gives you four pieces of information.

● N which is the number of males and females on which the t test is based. This includes only those

cases with valid information. In other words, cases with missing information (e.g., don’t know, no

answer) are excluded.

● Means for males and females.

● Standard deviations for males and females.

● Standard error of the mean for males and females which is an estimate of the amount of sampling

error for the two samples.

The second box has more information in it. The first thing you notice is that there are two t tests for each

variable. One assumes that the two populations (i.e., all males and all females) have equal population

variances and the other doesn’t make this assumption. In our two examples, both t tests give about the

same results. We’ll come back to this in a little bit. The rest of the second box has the following

information. Let’s look at the t test for d4_educ.

● t is the value of the t test which is XXXXXXXXXXfor both t tests. There is a formula for computing t which

your instructor may or may not want to cover in your course.

● Degrees of freedom in the first t test is (N males – XXXXXXXXXXN females – 1) = N males XXXXXXXXXXN females XXXXXXXXXX = 2,535.

In the second t test the degrees of freedom is estimated and turns out to be a decimal.

● The significance (two-tailed) value which we’ll cover in a little bit.

● The mean difference is the mean for the first group (males) – the mean for the second group

(females) = XXXXXXXXXX – XXXXXXXXXX = XXXXXXXXXXInstead of using the rounded values, SPSS carries the

computation out to more decimal points which results in a mean difference of XXXXXXXXXXIn other words,

males have XXXXXXXXXXof a year more education than females which is a very small difference.

● The standard error of the difference which is XXXXXXXXXXis an estimate of the amount of sampling error for

the difference score.

● 95% confidence interval of the difference which we’ll talk about in a later exercise.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean years of school completed by men and women in the sample who answered the

question. But we want to test the hypothesis that the mean years of school completed by men and women

in the population are different. We’re going to use our sample data to test a hypothesis about the

population.

The hypothesis we want to test is that the mean years of school completed by men in the population is

different than the mean years of school completed by women in the population. We’ll call this our research

hypothesis. It’s what we expect to be true. But there is no way to prove the research hypothesis directly.

So we’re going to use a method of indirect proof. We’re going to set up another hypothesis that says that

the research hypothesis is not true and call this the null hypothesis. If we can’t reject the null hypothesis

then we don’t have any evidence in support of the research hypothesis. You can see why this is called a

method of indirect proof. We can’t prove the research hypothesis directly but if we can reject the null

hypothesis then we have indirect evidence that supports the research hypothesis. We haven’t proven the

research hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

● research hypothesis – the population mean for men minus the population mean for women does

not equal XXXXXXXXXXIn other words, they are different from each other.

● null hypothesis – the population mean for men minus the population mean for women equals XXXXXXXXXXIn

other words, they are not different from each other.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXfor both t tests. That tells you that the

probability of being wrong if you rejected the null hypothesis is just about XXXXXXXXXXor XXXXXXXXXXtimes out of one

hundred. With odds like that, of course, we’re not going to reject the null hypothesis. A common rule is to

reject the null hypothesis if the significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the population mean for men minus the population

mean for women does not equal XXXXXXXXXXWe’re not predicting that one population mean will be larger or smaller

than the other. That’s called a two-tailed test and we have to use a two-tailed significance value. If we had

predicted that one population mean would be larger than the other that would be a two-tailed test. It’s easy

to get the one-tailed significance value if we know the two-tailed significance value. If the two-tailed

significance value is XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or .045.

We still haven’t explained why there are two t tests. As we said earlier, one assumes that the two

populations (i.e., all males and all females) have equal population variances and the other doesn’t make

this assumption. To compute the t value we need to estimate the population variances (see STAT2S). If

the population variances are about the same, we can pool our two samples to estimate the population

variance. If they are not about the same we wouldn’t want to do this. So how do we decide which t test to

use? Here’s where we’ll talk about the Levene’s test for the equality of variances which is in the second

box in your SPSS output. For this test, the null hypothesis is that the two population variances are equal.

The appropriate test would be the F test which we’re not going to discuss until a later exercise (STAT8S).

But we know how to interpret significance values so we can still make use of this test. The significance

value for the variable d4_educ is XXXXXXXXXXwhich is not less than XXXXXXXXXXso we do not reject the null hypothesis

that the population variances are equal. This means that we would use the t test that assumes equal

population variances.

Part IV – Now it’s Your Turn Again

In this part of the exercise you want to compare men and women to answer these two questions but this

time you want to test the appropriate null hypotheses.

● Do men and women differ in the number of hours per day they have to relax?

● Do men and women differ in the number of hours per day they watch television?

Use the independent-sample t test to carry out this part of the exercise. What are the research and the null

hypotheses? Do you reject or not reject the null hypotheses? Explain why.

Part V – What Does Independent Samples Mean?

Why do we call this t test the independent-samples t test? Independent samples are samples in which the

composition of one sample does not influence the composition of the other sample. In this exercise we’re

using the XXXXXXXXXXGSS which is a sample of adults in the United States. If we divide this sample into men and

women we would have a sample of men and a sample of women and they would be independent samples.

The individuals in one of the samples would not influence who is in the other sample.

Dependent samples are samples in which the composition of one sample does influence the composition of

the other sample. For example, if we have a sample of married couples and divide that sample into two

samples of men and women, then the men in one of the samples determines who the women are in the

other sample. The composition of the samples is dependent on each other. We’re going to discuss the

paired-samples t test in the next exercise (STAT7S).

STAT7S: Exercise Using SPSS to Explore Hypothesis Testing –

Paired-Samples t Test

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (paired-samples t test) to explore hypothesis testing. A good reference on using SPSS is SPSS

for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and

Elizabeth Nelson. The online version of the book is on the Social Science Research and Instructional

Council's Website . You have permission to use this exercise and to revise it to fit your needs. Please

send a copy of any revision to the author. Included with this exercise (as separate files) are more detailed

notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the

SPSS output for the exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the paired-samples t test. The exercise also

gives you practice in using COMPARE MEANS.

Part I – Populations and Samples

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT7S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT7S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT7S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT7S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

In STAT6S we compared means from two independent samples. Independent samples are samples in

which the composition of one sample does not influence the composition of the other sample. In this

exercise we’re using the XXXXXXXXXXGSS which is a sample of adults in the United States. If we divide this

sample into men and women we would have a sample of men and a sample of women and they would be

independent samples. The individuals in one of the samples would not influence who is in the other

sample.

In this exercise we’re going to compare means from two dependent samples. Dependent samples are

samples in which the composition of one sample influences the composition of the other sample. The 2014

GSS includes questions about the years of school completed by the respondent’s parents – d22_maeduc

and d24_paeduc. Let’s assume that we think that respondent’s fathers have more education than

respondent’s mothers. We would compare the mean years of school completed by mothers with the mean

years of school completed by fathers. If the respondent’s mother is in one sample, then the respondent’s

father must be in the other sample. The composition of the samples is therefore dependent on each other.

SPSS calls these paired-samples so we’ll use that term from now on.

Let’s start by asking whether fathers or mothers have more years of school? Click on “Analyze” in the

menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6, introduction in the online

SPSS book mentioned on page XXXXXXXXXXSelect the variables d22_maeduc and d24_paeduc and move them to

the “Dependent List” box. These are the variables for which you are going to compute means. The output

from SPSS will show you the mean, number of cases, and standard deviation for fathers and mothers.

Fathers have about two-tenths of a year more education than mothers. Why can’t we just conclude that

fathers have more education than mothers? If we were just describing the sample , we could. But what we

want to do is to make inferences about differences between fathers and mothers in the population . We

have a sample of fathers and a sample of mothers and some amount of sampling error will always be

present in both samples. The larger the sample, the less the sampling error and the smaller the sample,

the more the sampling error. Because of this sampling error we need to make use of hypothesis testing as

we did in the two previous exercises (STAT5S and STAT6S).

Part II – Now it’s Your Turn

In this part of the exercise you want to compare the years of school completed by respondents and their

spouses to determine whether men have more education than their spouses or whether women have more

education than their spouses.

Use SPSS to get the sample means as we did in Part I and then compare them to begin answering this

question. But we need to be careful here. Respondents could be either male or female. We need to

separate respondents into two groups – men and women – and then separately compare male

respondents with their spouses and female respondents with their spouses. We can do this by putting the

variables d4_educ and d29_speduc into the “Dependent List” box and d5_sex into the “Independent List”

box.

Part III – Hypothesis Testing – Paired-Samples t Test

In Part I we compared the mean years of school completed by fathers and mothers. Now we want to

determine if this difference is statistically significant by carrying out the paired-samples t test.

Click on “Analyze” and then on “Compare Means” and finally on “Paired-Samples T Test.” (See Chapter 6,

paired-samples t test in the online SPSS book.) Move the two variables listed above into the “Paired

Variables” box. Do this by selecting d22_maeduc and click on the arrow to move it into the “Variable 1”

box. Then select the other variable, d24_paeduc, and click on the arrow to move it into the “Variable 2”

box. Now click on “OK” and SPSS will carry out the paired-samples t test. It doesn’t matter which variable

you put in the “Variable 1” and “Variable 2” boxes.

You should see three boxes in the output screen. The first box gives you four pieces of information.

● Means for mothers and fathers.

● N which is the number of mothers and fathers on which the t test is based. This includes only

those cases with valid information. In other words, cases with missing information (e.g., don’t

know, no answer) are excluded.

● Standard deviations for mothers and fathers.

● Standard error of the mean for mothers and fathers which is an estimate of the amount of sampling

error for the two samples.

The second box gives you the paired sample correlation which is the correlation between mother’s and

father’s years of school completed for the paired samples. If you haven’t discussed correlation yet don’t

worry about what this means.

The third box has more information in it. With paired samples what we do is subtract the years of school

completed for one parent in each pair from the years of school completed for the other parent in the same

pair. Since we put mother’s years of school completed in variable 1 and father’s education in variable 2

SPSS will subtract father’s education from mother’s education. So if the father completed XXXXXXXXXXyears and the

mother completed XXXXXXXXXXyears we would subtract XXXXXXXXXXfrom XXXXXXXXXXwhich would give you XXXXXXXXXXFor this pair the father

completed two more years than the mother.

The third box gives you the following information.

● The mean difference score for all the pairs in the sample which is XXXXXXXXXXThis means that fathers

had an average of almost two-tenths of a year more education than the mothers. By the way, in

Part I when we compared the means for d22_maeduc and d24_paeduc the difference was 0.22.

Here the mean difference score is XXXXXXXXXXWhy aren’t they the same? See if you can figure this out.

(Hint: it has something to do with comparing differences for pairs.)

● The standard deviation of the difference scores for all these pairs which is XXXXXXXXXX.

● The standard error of the mean which is an estimate of the amount of sampling error.

● The 95% confidence interval for the mean difference score. If you haven’t talked about confidence

intervals yet, just ignore this. We’ll talk about confidence intervals in a later exercise.

● The value of t for the paired-sample t test which is XXXXXXXXXXThere is a formula for computing t which

your instructor may or may not want to cover in your course.

● The degrees of freedom for the t test which is 1,795 which is the number of pairs minus one or

1,796 – 1 or 1, XXXXXXXXXXIn other words, 1,795 of the difference scores are free to vary. Once these

difference scores are fixed, then the final difference score is fixed or determined.

● The two-tailed significance value which is XXXXXXXXXXwhich we’ll cover next.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean years of school completed by respondent’s fathers and mothers in the sample who

answered the question. But we want to test the hypothesis that the mean years of school completed by

fathers is greater than the mean for mothers in the population . We’re going to use our sample data to test

a hypothesis about the population.

The hypothesis we want to test is that the mean years of school completed by fathers is greater than the

mean years of school completed by mothers in the population. We’ll call this our research hypothesis. It’s

what we expect to be true. But there is no way to prove the research hypothesis directly. So we’re going

to use a method of indirect proof. We’re going to set up another hypothesis that says that the research

hypothesis is not true and call this the null hypothesis. If we can’t reject the null hypothesis then we don’t

have any evidence in support of the research hypothesis. You can see why this is called a method of

indirect proof. We can’t prove the research hypothesis directly but if we can reject the null hypothesis then

we have indirect evidence that supports the research hypothesis. We haven’t proven the research

hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

· research hypothesis – the mean difference score in the population is negative. In other words, the

mean years of school completed by fathers is greater than the mean years for mothers for all pairs in the

population.

· null hypothesis – the mean difference score for all pairs in the population is equal to 0.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXThat tells you that the probability of being

wrong if you rejected the null hypothesis is XXXXXXXXXXor 2 times out of one hundred. With odds like that, of

course, we’re going to reject the null hypothesis. A common rule is to reject the null hypothesis if the

significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the mean difference score for all pairs in the

population was less than XXXXXXXXXXWe’re predicting that the mean difference score for all pairs in the population

will be negative. That’s called a one-tailed test and we have to use a one-tailed significance value. It’s

easy to get the one-tailed significance value if we know the two-tailed significance value. If the two-tailed

significance value is XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or .010.

We still reject the null hypothesis which means that we have evidence to support our research hypothesis.

We haven’t proven the research hypothesis to be true but we have evidence to support it.

Part IV – Now it’s Your Turn Again

In this part of the exercise you want to compare the years of school completed by respondents and their

spouses to determine if women have more education than their spouses but this time you want to test the

appropriate null hypotheses.

Remember from Part II that we have to test this hypothesis first for men and then for women. We’re going

to do this by selecting out all the men and then computing the paired-samples t test. Do this by clicking on

“Data” in the menu bar and then clicking on “Select Cases.” Select “If condition is satisfied” and then click

on “If” in the box below. Select d5_sex and move it to the box on the right by clicking on the arrow pointing

to the right. Now click on the equals sign and then on 1 so the expression in the box reads “d5_sex = 1”.

Click on “Continue” and then on “OK”. To make sure you have selected out the males run a frequency

distribution for d5_sex. You should only see the males (i.e., value XXXXXXXXXXNow carry out the paired-samples t

test. Repeat this for the females (i.e., value XXXXXXXXXXby selecting out the females and then running the

paired-samples t test again.

For each paired-sample t test, state the research and the null hypotheses. Do you reject or not reject the

null hypotheses? Explain why.

STAT8S: Exercise Using SPSS to Explore Hypothesis Testing –

One-Way Analysis of Variance

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS and one-way analysis of variance to explore hypothesis testing. A good reference on using SPSS

is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor),

and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and one-way analysis of variance (sometimes

abbreviated one-way anova). The exercise also gives you practice in using COMPARE MEANS.

Part I – Populations and Samples

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT8S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT8S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT8S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT8S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

In STAT6S and STAT7S we used the t test to compare means from two samples. In STAT6S the means

were from two independent samples while in STAT7S they were from paired samples. But what if we

wanted to compare means from more than two samples? For that we need to use a statistical test called

analysis of variance. In fact, the t test is a special case of analysis of variance.

The XXXXXXXXXXGSS includes a variable (d3_degree) that describes the highest degree in school that the person

achieved. The categories are less than high school, high school, junior college, bachelor’s degree,

graduate degree. Another variable is the number of hours per day that respondents say they watch

television (tv1_tvhours). We want to find out if there is any relationship between these two variables. One

way to answer this question would be to see if respondents with different levels of education watch different

amounts of television. For example, you might suspect that the more education respondents have, the less

television they watch.

Let’s start by looking at the mean number of hours that people watch television broken down by highest

educational degree. Click on “Analyze” in the menu bar and then on “Compare Means” and finally on

“Means.” (See Chapter 6, introduction in the online SPSS book mentioned on page XXXXXXXXXXSelect the variable

tv1_tvhours and move it to the “Dependent List” box. This is the variable for which you are going to

compute means. Then select the variable d3_degree and move it to the “Independent List” box. The

output from SPSS will show you the mean, number of cases, and standard deviation for the different levels

of education.

Respondents with more education watch less television than those with less education. For example,

respondents with a graduate degree watch an average of XXXXXXXXXXhours of television per day while those who

haven’t completed high school watch an average of XXXXXXXXXXhours – a difference of about two hours. Why

can’t we just conclude those with more education watch less television than those with less education? If

we were just describing the sample , we could. But what we want to do is to make inferences about

differences in the population . We have five samples from five different levels of education and some

amount of sampling error will always be present in all these samples. The larger the samples, the less the

sampling error and the smaller the samples, the more the sampling error. Because of this sampling error

we need to make use of hypothesis testing as we did in the three previous exercises (STAT5S, STAT6S,

and STAT7S).

Part II – Now it’s Your Turn

In this part of the exercise you want to determine whether people who live in some regions of the country

(d25_region) watch more television (tv1_tvhours) than people in other regions. Use SPSS to get the

sample means as we did in Part I and then compare them to begin answering this question. Write one or

two paragraphs describing the regions in which people watch more and less television.

Part III – Hypothesis Testing – One-Way Analysis of Variance

In Part I we compared the mean hours of television watched per day for different levels of education. Now

we want to determine if these differences are statistically significant by carrying out a one-way analysis of

variance.

Click on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” Select the

variables tv1_tvhours and move it to the “Dependent List” box. Then select the variable d3_degree and

move it to the “Independent List” box. Now click on “Options” in the upper-right corner and then check the

“Anova table and eta” box. Finally click on “Continue” and then on “OK.”

You should see four boxes in the output screen. The first box tells you how many cases are included in the

analysis and how many cases are excluded. Any variable with missing data will be excluded.

The second table shows you the mean, number of cases, and standard deviation for each of the five levels

of education.

The third table gives you results of the one-way analysis of variance. We’re not going to explain these

statistics in this exercise. Your instructor will decide how much to cover on the calculation and meaning of

these statistics.

● Between groups and within groups sum of squares.

● Degrees of freedom for the between groups and within groups sum of squares.

● Mean square for the between groups and within groups sum of squares.

● F statistic.

● Significance value.

The fourth box gives you the value of Eta and Eta squared which measure the degree of association

between the two variables. Again we’ll leave it to your instructor to talk about these measures.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean number of hours per day that respondents watch television for each level of

education in the sample . But we want to test the hypothesis that the amount respondents watch television

varies by level of education in the population . We’re going to use our sample data to test a hypothesis

about the population.

Our hypothesis is that the mean number of hours watching television is higher for some levels of education

than for other levels in the population. We’ll call this our research hypothesis. It’s what we expect to be

true. But there is no way to prove the research hypothesis directly. So we’re going to use a method of

indirect proof. We’re going to set up another hypothesis that says that the mean number of hours watching

television is the same for all levels of education in the population and call this the null hypothesis. If we

can’t reject the null hypothesis then we don’t have any evidence in support of the research hypothesis. You

can see why this is called a method of indirect proof. We can’t prove the research hypothesis directly but if

we can reject the null hypothesis then we have indirect evidence that supports the research hypothesis.

We haven’t proven the research hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

● research hypothesis – the mean number of hours watching television for at least one level of

education is different from at least one other population mean.

● null hypothesis – the mean number of hours watching television is the same for all five levels of

education in the population.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the F test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXwhich actually means less than XXXXXXXXXX

since XXXXXXXXXXis a rounded value. That tells you that the probability of being wrong if you rejected the null

hypothesis is less than 5 out of ten thousand. With odds like that, of course, we’re going to reject the null

hypothesis. A common rule is to reject the null hypothesis if the significance value is less than XXXXXXXXXXor less

than five out of one hundred.

So what have we learned? We learned that the mean number of hours watching television for at least one

of the populations is different from at least one other population. But which ones? There are statistical

tests for answering this question. But we’re not going to cover that although your instructor might want to

discuss these tests.

Part IV – Now it’s Your Turn Again

In Part II you computed the mean number of hours that respondents watched television for each of the nine

regions of the country. Now we want to determine if these differences are statistically significant by

carrying out a one-way analysis of variance as described in Part III. Indicate what the research and null

hypotheses are and whether you can reject the null hypothesis. What does that tell you about the research

hypothesis?

STAT9S:Exercise Using SPSS to Explore Crosstabulation

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CROSSTABS

in SPSS to explore crosstabulation. A good reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA

Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online

version of the book is on the Social Science Research and Instructional Council's Website . You have

permission to use this exercise and to revise it to fit your needs. Please send a copy of any revision to the

author. Included with this exercise (as separate files) are more detailed notes to the instructors, the SPSS

syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS

output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to introduce crosstabulation as a statistical tool to explore relationships between

variables. The exercise also gives you practice in using CROSSTABS in SPSS.

Part I—Relationships between Variables

In exercises STAT5S through STAT8S we used sample means to analyze relationships between variables.

For example, we compared men and women to see if they differed in the number of years of school

completed and the number of hours they worked in the previous week and discovered that men and

women had about the same amount of education but that men worked more hours than women. We were

able to compute means because years of school completed and hours worked are both ratio level

variables. The mean assumes interval or ratio level measurement (see STAT2S).

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT9S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT9S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT9S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT9S.docx

But what if we wanted to explore relationships between variables that weren’t interval or ratio?

Crosstabulation can be used to look at the relationship between nominal and ordinal variables. Let’s

compare men and women (d5_sex) in terms of the following:

● opinion about abortion (a1_abany),

● fear of crime (c1_fear),

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● gun ownership (g2_owngun),

● voting (p5_pres08), and

● religiosity (r8_reliten).

Before we look at the relationship between sex and these other variables, we need to talk about

independent and dependent variables. The dependent variable is whatever you are trying to explain. In

our case, that would be how people feel about abortion, fear of crime, gun control and ownership, voting

and religiosity. The independent variable is some variable that you think might help you explain why some

people think abortion should be legal and others think it shouldn’t be legal or any of the other variables in

our list above. In our case, that would be sex. Normally we put the dependent variable in the row and the

independent variable in the column. We’ll follow that convention in this exercise.

Let’s start with the first two variables in our list. We’re going to use a1_abany as our measure of opinion

about abortion. Respondents were asked if they thought abortion ought to be legal for any reason. And

we’re going to use c1_fear as our measure of fear of crime. Respondents were asked if they were afraid to

walk alone at night in their neighborhood. Run CROSSTABS to produce two tables. (See Chapter 5,

Crosstabs in the online SPSS book.) One will be for the relationship between d5_sex and a1_abany. The

other will be for d5_sex and c1_fear. Put the independent variable in the column and the dependent

variable in the row. If you don’t ask for percents, SPSS will give you only the counts (i.e., frequencies) so

be sure to ask for the percents. SPSS can compute the row percents, column percents, and total

percents. Your instructor will probably talk about how to compute these different percents. But how do you

know which percents to ask for? Here’s a simple rule for computing percents.

● If your independent variable is in the column, then you want to use the column percents.

● If your independent variable is in the row, then you want to use the row percents.

Since you put the independent variable in the column, you want the column percents.

Part II – Interpreting the Percents

Your first table should look like this.

It’s easy to make sure that you have the correct percents. Your independent variable (d5_sex) should be in

the column and it is. Column percents should sum down to 100% and they do.

How are you going to interpret these percents? Here’s a simple rule for interpreting percents.

● If your percents sum down to 100%, then compare the percents across.

● If your percents sum across to 100%, then compare the percents down.

Since the percents sum down to 100%, you want to compare across.

Look at the first row. Approximately 47% of men think abortion should be legal for any reason compared to

44% of women. There’s a difference of 3.6% which is really small. We never want to make too much of

small differences. Why not? No sample is ever a perfect representation of the population from which the

sample is drawn. This is because every sample contains some amount of sampling error. Sampling error

is inevitable. There is always some amount of sampling error present in every sample. The larger the

sample size, the less the sampling error and the smaller the sample size, the more the sampling error. So

in this case we would conclude that there probably isn’t any difference in the population between men and

women in their approval of abortion for any reason.

Now let’s look at your second table.

This time the percent difference is quite a bit larger. About 22% of men are afraid to walk alone at night in

their neighborhood compared to 39% of women. This is a difference of 16.8%. This is a much larger

difference and we have reason to think that women are more fearful of being a victim of crime than men.

Part III – Now it’s Your Turn

Choose two of the tables from the following list and compare men and women:

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● voting (p5_pres08), and

● religiosity (r8_reliten).

Make sure that you put the independent variable in the column and the dependent variable in the row. Be

sure to ask for the correct percents. What are values of the percents that you want to compare? What is

the percent difference? Does it look to you that there is much of a difference between men and women in

the variables you chose?

Part IV – Adding another Variable into the Analysis

So far we have only looked at variables two at a time. Often we want to add other variables into the

analysis. Let’s focus on the difference between men and women (d5_sex) in terms of gun ownership

(g2_owngun). First let’s get the two-variable table which should look like this.

Men were more likely to own guns by 9.5%. But what if we wanted to include social class in this analysis?

The XXXXXXXXXXGSS asked respondents whether they thought of themselves as lower, working, middle, or upper

class. This is variable d11_class. What we want to do is to hold constant perceived social class. In other

words, we want to divide our sample into four groups with each group consisting of one of these four

classes and then look at the relationship between d5_sex and g2_owngun separately for each of these four

groups.

We can do this by going back to the SPSS dialog box where we requested the crosstabulation and putting

the variable d11_class in the third box down right below the “Column(s)” box. (See Chapter 8, Crosstabs

Revisited in the online SPSS book.) Your table should look like this.

This table is more complicated. Notice that the table is actually divided into four tables with one on top of

the other. At the top we have those who said they were lower class, then working, middle and upper class.

Let’s look at the percent differences for each of these tables – 12.0%, 9.6%, 9.4%, and 0.4%. The first

three tables are similar to the two-variable table – 9.5% compared to 12.0%, 9.6%, and 9.4%. Remember

not to make too much out of small differences because of sampling error. But the last table for upper class

has a much smaller difference – 0.4%. In other words, when we look at only those who see themselves as

upper class, there really isn’t any difference between men and women in terms of gun ownership.

But notice something else. There are fewer people who say they are lower and upper class than say they

are working or middle class. There are only XXXXXXXXXXrespondents in the lower class table and even fewer, 48

respondents, in the upper class table. We’ll have more to say about this in the next exercise (STAT10S).

Part V – Now it’s Your Turn Again

In Part II we compared men and women (d5_sex) in terms of fear of crime (c1_fear). Run this table again

but this time add social class (d11_class) into the analysis as we did in Part IV. What happens to the

percent difference when you hold constant class? What does this tell you?

STAT10S: Exercise Using SPSS to Explore Chi Square

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CROSSTABS

in SPSS to explore the Chi Square test. A good reference on using SPSS is SPSS for Windows Version

23.0 A Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The

online version of the book is on the Social Science Research and Instructional Council's Website . You

have permission to use this exercise and to revise it to fit your needs. Please send a copy of any revision

to the author. Included with this exercise (as separate files) are more detailed notes to the instructors, the

SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the exercise

(SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; .docx format)

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; .docx format)

Goals of Exercise

The goal of this exercise is to introduce Chi Square as a test of significance. The exercise also gives you

practice in using CROSSTABS in SPSS.

Part I—Relationships between Variables

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

The XXXXXXXXXXGSS is a sample from the population of all adults in the United States at the time the survey was

done. In the previous exercise (STAT9S) we used crosstabulation and percents to describe the

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT10S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT10S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT10S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT10S.docx

relationship between pairs of variables in the sample. But we want to go beyond just describing the

sample. We want to use the sample data to make inferences about the population from which the sample

was selected. Chi Square is a statistical test of significance that we can use to test hypotheses about the

population. Chi Square is the appropriate test when your variables are nominal or ordinal (see STAT1S).

In STAT9S we started by using crosstabulation to look at the relationship between sex and opinion about

abortion. We’re going to use a1_abany as our measure of opinion about abortion. Respondents were

asked if they thought abortion ought to be legal for any reason. Run CROSSTABS to produce the table.

(See Chapter 5, Crosstabs in the online SPSS book mentioned on page XXXXXXXXXXYou want to get the

crosstabulation of d5_sex and a1_abany. Put the independent variable in the column and the dependent

variable in the row. Since your independent variable is in the column, you want to use the column

percents.

Part II – Interpreting the Percents

Your table should look like this.

Since your percents sum down to 100% (i.e., column percents), you want to compare the percents across.

Look at the first row. Approximately 47% of men think abortion should be legal for any reason compared to

44% of women. There’s a difference of 3.6% which seems small. We never want to make too much of

small differences. Why not? No sample is ever a perfect representation of the population from which the

sample is drawn. This is because every sample contains some amount of sampling error. Sampling error

is inevitable. There is always some amount of sampling error present in every sample. The larger the

sample size, the less the sampling error and the smaller the sample size, the more the sampling error.

But what is a small percent difference? Probably you would agree that a one to four percent difference is

small. But what about a five or six or seven percent difference? Is that small? Or is it large enough for us

to conclude that there is a difference between men and women in the population. Here’s where we can

use Chi Square.

Part III – Chi Square

Let’s assume that you think that sex and opinion about abortion are related to each other. We’ll call this our

research hypothesis. It’s what we expect to be true. But there is no way to prove the research hypothesis

directly. So we’re going to use a method of indirect proof. We’re going to set up another hypothesis that

says that the research hypothesis is not true and call this the null hypothesis. In our case, the null

hypothesis would be that the two variables are unrelated to each other. [1] In statistical terms, we often say

that the two variables are independent of each other. If we can reject the null hypothesis then we have

evidence to support the research hypothesis. If we can’t reject the null hypothesis then we don’t have any

evidence in support of the research hypothesis. You can see why this is called a method of indirect proof.

We can’t prove the research hypothesis directly but if we can reject the null hypothesis then we have

indirect evidence that supports the research hypothesis.

Here are our two hypotheses.

● research hypothesis – sex and opinion about abortion are related to each other

● null hypothesis – sex and opinion about abortion are unrelated to each other; in other words, they

are independent of each other

It's the null hypothesis that we are going to test.

SPSS will compute Chi Square for you. Follow the same procedure you used to get the crosstabulation

between d5_sex and a1_abany. Remember to get the column percents. Then click on the “Statistics”

button in the upper right of the dialog box. Check the box for “Chi-Square” and then click on “Continue”

and then on “OK.”

Now you will see another output box below the crosstabulation called “Chi-Square Tests.” We want the test

that is called “Pearson Chi-Square” in the first row of the box. Ignore all the other rows in this box. [2] You

should see three values to the right of “Pearson Chi-Square.”

● The value of Chi Square is XXXXXXXXXXYour instructor may or may not want to go into the computation

of the Chi Square value but we’re not going to cover the computation in this exercise.

● The degrees of freedom (df) is XXXXXXXXXXDegrees of freedom is number of values that are free to vary. In

a table with two columns and two rows only one of the cell frequencies is free to vary assuming the

marginal frequencies are fixed. The marginal frequencies are the values in the margins of the

table. There are XXXXXXXXXXmales and XXXXXXXXXXfemales in this table and there are XXXXXXXXXXthat think abortion

should be legal for any reason and XXXXXXXXXXwho think abortion should not be legal for any reason. Try

filling in any one of the cell frequencies in the table. The other three cell frequencies are then fixed

assuming we keep the marginal frequencies the same so there is one degree of freedom.

● The two-tailed significance value is XXXXXXXXXX. [3] This tells us that there is a probability of XXXXXXXXXXthat we

would be wrong if we rejected the null hypothesis. In other words, we would be wrong XXXXXXXXXXout of

1,000 times. With odds like that, of course, we’re not going to reject the null hypothesis. A

common rule is to reject the null hypothesis if the significance value is less than XXXXXXXXXXor less than

five out of one hundred. Since XXXXXXXXXXis not smaller than .05, we don’t reject the null hypothesis.

Since we can’t reject the null hypothesis, we don’t have any support for our research hypothesis.

Part IV – Now it’s Your Turn

Choose any two of the tables from the following list and compare men and women using crosstabulation

and Chi Square:

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn3

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● gun ownership (g2_owngun),

● voting (p6_pres12), and

● religiosity (r8_reliten).

Make sure that you put the independent variable in the column and the dependent variable in the row. Be

sure to ask for the correct percents and Chi Square. What are the research hypothesis and the null

hypothesis? Do you reject the null hypothesis? How do you know? What does that tell you about the

research hypothesis?

Part V – Expected Values

We said we weren’t going to talk about how you compute Chi Square but we do have to introduce the idea

of expected values. The computation of Chi Square is based on comparing the observed cell frequencies

(i.e., the cell frequencies that you see in the table that SPSS gives you) and the cell frequencies that you

would expect by chance assuming the null hypothesis was true. SPSS will also compute these expected

frequencies for you. Rerun the crosstabulation for d5_sex and a1_abany remembering to ask for the

column percents and Chi Square. But this time when you click on the “Cells” button to ask for the column

percents look in the upper left of the dialog box where it says “Counts.” “Observed” is selected as the

default. These are the observed cell frequencies. Click on the “Expected” box to get the expected cell

frequencies.

Now you will see both the observed and the expected cell frequencies in your output table. Notice that they

aren’t very different. The closer they are to each other, the smaller Chi Square will be. The more different

they are, the larger Chi Square will be. The larger Chi Square is, the more likely you are to be able to reject

the null hypothesis.

Chi Square assumes that all the expected cell frequencies are greater than five. We can see from the table

that this is the case for this table. But we don’t have to get the expected frequencies to see this. Look

back at the “Chi-Square Tests” table in your output. Look at footnote a. It tells you that the smallest

expected cell frequency is XXXXXXXXXXSo clearly all four expected cell frequencies are at least five. If it’s just

a little bit below five, that’s no problem. But if it gets down around three you have a problem. What you’ll

have to do is to combine rows or columns that have small marginal frequencies.

For example, run the crosstabulation of d5_sex and d9_sibs which is the number of brothers and sisters

that the respondent has. [4] The minimum expected frequency is so small that it rounds to XXXXXXXXXXThat’s

because there are only a few respondents with more than XXXXXXXXXXsiblings. You will need to recode the number

of siblings into fewer categories making sure that you don’t have any categories with a really small number

of cases.

Part VI – Now it’s Your Turn Again

Look back at the two tables you ran in Part III and see if any of your expected frequencies were less than

five. What does that tell you?

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn4

[1] The null hypothesis is often called the hypothesis of no difference. We’re saying that there is no

relationship between these two variables. In other words, there’s nothing there.

[2] Unfortunately there is no way to tell SPSS to just give us the “Pearson Chi-Square.”

[3] What do we mean by two-tailed? We’re not predicting the direction of the relationship. We’re not

predicting that men are more likely to think abortion should be legal or that women are more likely. So it’s a

two-tailed test.

[4] Number of siblings is a ratio level variable. You can use Chi Square with ratio level variables but usually

there are better tests. We’re just using this as an example.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref4

STAT13S: Exercise Using SPSS to Explore Correlation

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CORRELATE

and COMPARE MEANS in SPSS to explore correlation. A good reference on using SPSS is SPSS for

Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth

Nelson. The online version of the book is on the Social Science Research and Instructional Council's

Website . You have permission to use this exercise and to revise it to fit your needs. Please send a copy of

any revision to the author. Included with this exercise (as separate files) are more detailed notes to the

instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output

for the exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; .docx format)

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; .docx format)

Goals of Exercise

The goal of this exercise is to introduce measures of correlation. The exercise also gives you practice

using CORRELATE and COMPARE MEANS in SPSS.

Part I – Scatterplots

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT13S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT13S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT13S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT13S-3.docx

In a previous exercise (STAT11S) we considered different measures of association that can be used to

determine the strength of the relationship between two variables that have nominal or ordinal level

measurement (see STAT1S). In this exercise we’re going to look at two different measures that are

appropriate for interval and ratio level variables. The terminology also changes in the sense that we’ll refer

to these measures as correlations rather than measures of association.

Before we look at these measures let’s talk about a type of graph that is used to display the relationship

between two variables called a scatterplot. SPSS refers to it as a Scatter/Dot chart. Click on GRAPH in the

menu bar at the top of the SPSS screen. Click on “Chart Builder” in the dropdown menu. A dialog box will

open up that will ask you to define the level of measurement for each variable and to provide labels for the

values. Click on “OK” since that has been done for you. In the bottom half of the dialog box the “Gallery”

tab should be selected by default. On the left you can choose the type of graph you want to build. Look

down the list and click on “Scatter/Dot.” There are eight different scatterplots that SPSS can create. If you

point your mouse at each of them you will see a label for the scatterplots. The one on the upper left is

called a “Simple Scatter.” Click and drag the icon up to the large box in the upper right of the dialog box.

Now all you have to do is to click and drag the variables you want to the X-Axis and Y-Axis. If you want to

treat one of these variables as independent, then put that variable on the X-Axis and the dependent

variable on the Y-Axis. So all our scatterplots will look the same let’s put d22_maeduc on the X-Axis and

d24_paeduc on the Y-Axis. Click “OK” and SPSS will display your graph.

Now let’s look for the general pattern to our scatterplot. You see more cases in the upper right and lower

left of the plot and fewer cases in the upper left and lower right. In general, as one of the variables

increases, the other variable tends to increase as well. Moreover, you can imagine drawing a straight line

that represents this relationship. The line would start in the lower left and continue towards the upper right

of the plot. That’s what we call a positive linear relationship. [1] But how strong is the relationship and where

exactly would you draw the straight line? The Pearson Correlation Coefficient will tell us the strength of the

linear relationship and linear regression will show us the straight line that best fits the data points. We’ll talk

about the Pearson Correlation Coefficient in part 3 of this exercise and linear regression in exercise

STAT14S.

Part II – Now it’s Your Turn

Use GRAPH in SPSS to create the scatter plot for the years of school completed by the respondent

(d4_educ) and the spouse’s years of school completed (d29_speduc). So all our plots will look the same,

put d29_speduc on the X-Axis and d4_educ on the Y-Axis. Look at your scatterplot and decide if the

scatterplot has a pattern to it. What is that pattern? Do you think it is a linear relationship? Is it a positive

linear or a negative linear relationship?

Part III - Pearson Correlation Coefficient

The Pearson Correlation Coefficient (r) is a numerical value that tells us how strongly related two variables

are. It varies between XXXXXXXXXXand XXXXXXXXXXThe sign indicates the direction of the relationship. A positive value means

that as one variable increases, the other variable also increases while a negative value means that as one

variable increases, the other variable decreases. The closer the value is to 1, the stronger the linear

relationship and the closer it is to 0, the weaker the linear relationship.

The usual way to interpret the Pearson Coefficient is to square its value. In other words, if r equals .5, then

we square XXXXXXXXXXwhich gives us XXXXXXXXXXThis is often called the Coefficient of Determination. This means that one

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/510#_ftn1

of the variables explains 25% of the variation of the other variable. Since the Pearson Correlation is a

symmetric measure in the sense that neither variable is designated as independent or dependent we could

say that 25% of the variation in the first variable is explained by the second variable or reverse this and say

that 25% of the variation in the second variable is explained by the first variable. It’s important not to read

causality into this statement. We’re not saying that one variable causes the other variable. We’re just

saying that 25% of the variation in one of the variables can be accounted for by the other variable.

The Pearson Correlation Coefficient assumes that the relationship between the two variables is linear. This

means that the relationship can be represented by a straight line. In geometric terms, this means that the

slope of the line is the same for every point on that line. Here are some examples of a positive and a

negative linear relationship and an example of the lack of any relationship.

Pearson r would be positive and close to 1 in the left-hand example, negative and close to XXXXXXXXXXin the middle

example, and closer to 0 in the right-hand example. You can search for “free images of a positive linear

relationship” to see more examples of linear relationships.

But what if the relationship is not linear? Search for “free images of a curvilinear relationship” and you’ll see

examples that look like this.

Here the relationship can’t be represented by a straight line. We would need a line with a bend in it to

capture this relationship. While there clearly is a relationship between these two variables, Pearson r would

be closer to XXXXXXXXXXPearson r does not measure the strength of a curvilinear relationship; it only measures the

strength of linear relationships.

Another way to think of correlation is to say that the Pearson Correlation Coefficient measures the fit of the

line to the data points. If r was equal to +1, then all the data points would fit on the line that has a positive

slope (i.e., starts in the lower left and ends in the upper right). If r was equal to -1, then all the data points

would fit on the line that has a negative slope (i.e., starts in the upper left and ends in the lower right). (See

the diagram above.)

Let’s get the Pearson Coefficient for the two variables in our scatterplot in Part XXXXXXXXXXSee Chapter 7,

Correlation in the online SPSS book mentioned on page XXXXXXXXXXClick on Analyze in the menu bar and then click

on CORRELATE. In the dropdown box, click on “Bivariate.” Bivariate just means that you want to compute

a correlation for two variables – d22_maeduc and d24_paeduc. Move these two variables into the

“Variable(s)” box. Make sure that the box for the Pearson Correlation Coefficient is checked which it should

be since this is the default. Notice that the circle for “Two-tailed” is filled in for “Test of Significance.” A

two-tailed significance test is used when you don’t make any prediction as to whether the relationship is

positive or negative. In our case, we would expect that the relationship would be greater than zero (i.e.,

positive) so we would want to use a one-tailed test. Click on the circle for one-tailed to change the

selection. Notice also that “Flag significant correlations” is checked. That means that SPSS will tell you

when a relationship is statistically significant. Now click “OK” and SPSS will display your correlation

coefficient.

You should see four correlations. The correlations in the upper left and lower right will be 1 since the

correlation of any variable with itself will always be XXXXXXXXXXThe correlation in the upper right and lower left will

both be XXXXXXXXXXThat’s because the correlation of variable X with variable Y is the same as the correlation of

variable Y with variable X. Pearson r is a symmetric measure (see STAT11S) meaning that we don’t

designate one of the variables as the dependent variable and the other as the independent variable. Notice

that the Pearson r is statistically significant using a one-tailed test at the XXXXXXXXXXlevel of significance. A Pearson

r of XXXXXXXXXXis really pretty large. You don’t see r’s that big very often. That’s telling us that the linear

regression line that we’re going to talk about in STAT14S fits the data points reasonably well.

Part IV – Now it’s Your Turn Again

Use CORRELATE in SPSS to get the Pearson Correlation Coefficient for the years of school completed by

the respondent (d4_educ) and the spouse’s years of school completed (d29_speduc). What does this

Pearson Correlation Coefficient tell you about the relationship between these two variables?

Part V – Correlation Matrices

What if you wanted to see the values of r for a set of variables? Let’s think of the four variables in Parts 1

through 4 as a set. That means that we want to see the values for r for each pair of variables. This time

move all four of the variables into the “Variable(s)” box (i.e., d4_educ, d22_maeduc, d24_paeduc, and

d29_speduc) and click on “OK.” That would mean we would calculate six coefficients. (Make sure you can

list all six.)

What did we learn from these correlations? First, the correlation of any variable with itself is XXXXXXXXXXSecond, the

correlations above the 1’s are the same as the correlations below the 1’s. They’re just the mirror image of

each other. That’s because r is a symmetric measure. Third, all the correlations are fairly large. Fourth, the

largest correlations are between father’s and mother’s education and between the respondent’s education

and the spouse’s education.

Part VI – The Correlation Ratio or Eta-Squared

The Pearson Correlation Coefficient assumes that both variables are interval or ratio variables (see

STAT1S). But what if one of the variables was nominal or ordinal and the other variable was interval or

ratio? This leads us back to one-way analysis of variance which we discussed in exercise STAT8S. Click

on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6,

one-way analysis of variance in the online SPSS book mentioned on page XXXXXXXXXXSelect the variable

tv1_tvhours and move it to the “Dependent List” box. This is the variable for which you are going to

compute means. Then select the variable d3_degree and move it to the “Independent List” box. Notice that

we’re using our independent variable to predict our dependent variable. Now click on “Options” in the

upper-right corner and then check the “Anova table and eta” box. Finally click on “Continue” and then on

“OK.”

The F test in the one-way analysis of variance tells us to reject the null hypothesis that all the population

means are equal. So we know that at least one pair of population means are not equal. But that doesn’t tell

us how strongly related these two variables are. The SPSS output tells us that eta is equal to XXXXXXXXXXand

eta-squared is equal to XXXXXXXXXXThis tells us that 5.1% of the variation in the dependent variable, number of

hours the respondent watches television, can be explained or accounted for by the independent variable,

highest education degree. This doesn’t seem like much but it’s not an atypical outcome for many research

findings.

Part VII – Your Turn

In Exercise STAT8S you computed the mean number of hours that respondents watched television

(tv1_tvhours) for each of the nine regions of the country (d25_region). Then you determined if these

differences were statistically significant by carrying out a one-way analysis of variance. Repeat the

one-way analysis of variance but this time focus on eta-squared. What percent of the variation in television

viewing can be explained by the region of the country in which the respondent lived?

[1] This assumes that the variables are coded low to high (or high to low) on both the X-Axis and the Y-Axis.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/510#_ftnref1

TABLE OF CONTENTS

STAT1S: Exercise Using SPSS to Explore Levels of Measurement

Goals of Exercise

Part I—Introduction to Levels of Measurement

STAT2S:Exercise Using SPSS to Explore Measures of Central Tendency and Dispersion

Goals of Exercise

Part I – Measures of Central Tendency

Part II – Deciding Which Measure of Central Tendency to Use

Part III – Measures of Dispersion or Variation

STAT3S: Exercise Using SPSS to Explore Measures of Skewness and Kurtosis

Goals of Exercise

Part I – Measures of Skewness

Part II – Measures of Kurtosis

STAT4S: Exercise Using SPSS to Explore Graphs and Charts

Goals of Exercise

Part I – Pie Charts

Part II – Bar Charts

Part III – Histograms

Part IV – Box Plots

Part V – Conclusions

STAT5S: Exercise Using SPSS to Explore Hypothesis Testing – One-Sample t Test

Goals of Exercise

Part I – Simple Random Sampling

Part II. Hypothesis Testing – the One-Sample T test

Part III. Now It’s Your Turn

STAT6S: Exercise Using SPSS to Explore Hypothesis Testing – Independent-Samples

Goals of Exercise

Part I – Computing Means

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – Independent-Samples t Test

Part IV – Now it’s Your Turn Again

Part V – What Does Independent Samples Mean?

STAT7S: Exercise Using SPSS to Explore Hypothesis Testing – Paired-Samples t Test

Goals of Exercise

Part I – Populations and Samples

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – Paired-Samples t Test

Part IV – Now it’s Your Turn Again

STAT8S: Exercise Using SPSS to Explore Hypothesis Testing – One-Way Analysis of Variance

Goals of Exercise

Part I – Populations and Samples

Part II – Now it’s Your Turn

Part III – Hypothesis Testing – One-Way Analysis of Variance

Part IV – Now it’s Your Turn Again

STAT9S:Exercise Using SPSS to Explore Crosstabulation

Goals of Exercise

Part I—Relationships between Variables

Part II – Interpreting the Percents

Part III – Now it’s Your Turn

Part IV – Adding another Variable into the Analysis

Part V – Now it’s Your Turn Again

STAT10S: Exercise Using SPSS to Explore Chi Square

Goals of Exercise

Part I—Relationships between Variables

Part II – Interpreting the Percents

Part III – Chi Square

Part IV – Now it’s Your Turn

Part V – Expected Values

Part VI – Now it’s Your Turn Again

STAT13S: Exercise Using SPSS to Explore Correlation

Goals of Exercise

Part I – Scatterplots

Part II – Now it’s Your Turn

Part III - Pearson Correlation Coefficient

Part IV – Now it’s Your Turn Again

Part V – Correlation Matrices

Part VI – The Correlation Ratio or Eta-Squared

Part VII – Your Turn

STAT1S: Exercise Using SPSS to Explore Levels of

Measurement

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is

gss14_subset_for_classes_STATISTICS.sav which is a subset of the XXXXXXXXXXGeneral Social Survey.

Some of the variables in the GSS have been recoded to make them easier to use and some new

variables have been created. The data have been weighted according to the instructions from the

National Opinion Research Center. This exercise uses FREQUENCIES in SPSS to introduce the

concept of levels of measurement (nominal, ordinal, interval, and ratio measures). A good

reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler,

John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is on

the Social Science Research and Instructional Council's Website . You have permission to use this

exercise and to revise it to fit your needs. Please send a copy of any revision to the author.

Included with this exercise (as separate files) are more detailed notes to the instructors, the SPSS

syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the

exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format)

Goals of Exercise

The goal of this exercise is to explore the concept of levels of measurement (nominal, ordinal,

interval, and ratio measures) which is an important consideration for the use of statistics. The

exercise also gives you practice in using FREQUENCIES in SPSS.

Part I—Introduction to Levels of Measurement

We use concepts all the time. We all know what a book is. But when we use the word “book” we’re

not talking about a particular book that we’re reading. We’re talking about books in general. In other

words, we’re talking about the concept to which we have given the name “book.” There are many

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT1S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT1S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT1S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT1S.docx

different types of books – paperback, hardback, small, large, short, long, and so on. But they all

have one thing in common – they all belong to the category “book.”

Let’s look at another example. Religiosity is a concept which refers to the degree of attachment

that individuals have to their religious preference. It’s different than religious preference which

refers to the religion with which they identify. Some people say they are Lutheran; others say they

are Roman Catholic; still others say they are Muslim; and others say they have no religious

preference. Religiosity and religious preference are both concepts.

A concept is an abstract idea. So there are the abstract ideas of book, religiosity, religious

preference, and many others. Since concepts are abstract ideas and not directly observable, we

must select measures or indicants of these concepts. Religiosity can be measured in a number of

different ways – how often people attend church, how often they pray, and how important they say

their religion is to them.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national

probability sample of adults in the United States conducted by the National Opinion Research

Center (NORC). The GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since.

For this exercise we’re going to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to

access this data set which is called gss14_subset_for_classes_STATISTICS.sav.

The GSS is an example of a social survey. The investigators selected a sample from the

population of all adults in the United States. This particular survey was conducted in XXXXXXXXXXand is a

relatively large sample of approximately 2,500 adults. In a survey we ask respondents questions

and use their answers as data for our analysis. The answers to these questions are used as

measures of various concepts. In the language of survey research these measures are typically

referred to as variables. Often we want to describe respondents in terms of social characteristics

such as marital status, education, and age. These are all variables in the GSS.

These measures are often classified in terms of their levels of measurement. S. S. Stevens

described measures as falling into one of four categories – nominal, ordinal, interval, or ratio. [1]

Here’s a brief description of each level.

A nominal measure is one in which objects (i.e. in our survey, these would be the respondents)

are sorted into a set of categories which are qualitatively different from each other. For example,

we could classify individuals by their marital status. Individuals could be married or widowed or

divorced or separated or never married. Our categories should be mutually exclusive and

exhaustive. Mutually exclusive means that every individual can be sorted into one and only one

category. Exhaustive means that every individual can be sorted into a category. We wouldn’t want

to use single as one of our categories because some people who are single can also be divorced

and therefore could be sorted into more than one category. We wouldn’t want to leave widowed off

our list of categories because then we wouldn’t have any place to sort these individuals.

The categories in a nominal level measure have no inherent order to them. This means that it

wouldn’t matter how we ordered the categories. They could be arranged in any number of different

ways. Run FREQUENCIES in SPSS for the variable d10_marital so you can see the frequency

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftn1

distribution for a nominal level variable. (See Frequencies in Chapter 4 of the SPSS online book

mentioned on page XXXXXXXXXXIt wouldn’t matter how we ordered these categories.

An ordinal measure is a nominal measure in which the categories are ordered from low to high or

from high to low. We could classify individuals in terms of the highest educational degree they

achieved. Some individuals did not complete high school; others graduated from high school but

didn’t go on to college. Other individuals completed a two-year junior college degree but then

stopped college. Still others completed their bachelor ’s degree and others went on to graduate

work and completed a master ’s degree or their doctorate. These categories are ordered from low to

high.

But notice that while the categories are ordered they lack an equal unit of measurement. That

means, for example, that the differences between categories are not necessarily equal. Run

FREQUENCIES in SPSS for d3_degree. Look at the categories. The GSS assigned values (i.e.,

numbers) to these categories in the following way:

● 0 = less than high school,

● 1 = high school degree,

● 2 = junior college,

● 3 = bachelors, and

● 4 = graduate.

The difference in education between the first two categories is not the same as the difference

between the last two categories. We might think they are because 0 minus 1 is equal to 3 minus 4

but this is misleading. These aren’t really numbers. They’re just symbols that we have used to

represent these categories. We could just as well have labeled them a, b, c, d, and e. They don’t

have the properties of real numbers. They can’t be added, subtracted, multiplied, and divided. All

we can say is that b is greater than a and that c is greater than b and so on.

An interval measure is an ordinal measure with equal units of measurement. For example,

consider temperature measured in degrees Fahrenheit. Now we have equal units of measurement

– degrees Fahrenheit. The difference between XXXXXXXXXXdegrees and XXXXXXXXXXdegrees is the same as the

difference between XXXXXXXXXXdegrees and XXXXXXXXXXdegrees. Now the numbers have the properties of real

numbers and we can add them and subtract them. But notice one thing about the Fahrenheit scale.

There is no absolute zero point. There can be both positive and negative temperatures. That

means that we can’t compare values by taking their ratios. For example, we can’t divide 80

degrees Fahrenheit by XXXXXXXXXXdegrees and conclude that XXXXXXXXXXis twice as hot at XXXXXXXXXXTo do that we would

need a measure with an absolute zero. [2]

A ratio measure is an interval measure with an absolute zero point. Run FREQUENCIES for

d9_sibs which is the number of siblings. This variable has an absolute zero point and all the

properties of nominal, ordinal, and interval measures and therefore is a ratio variable.

Notice that level of measurement is itself ordinal since it is ordered from low (nominal) to high

(ratio). It’s what we call a cumulative scale. Each level of measurement adds something to the

previous level.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftn2

Why is level of measurement important? One of the things that helps us decide which statistic to

use is the level of measurement of the variable(s) involved. For example, we might want to

describe the central tendency of a distribution. If the variable was nominal, we would use the

mode. If it was ordinal, we could use the mode or the median. If it was interval or ratio, we could

use the mode or median or mean. Central tendency will be the focus of another exercise

( STAT2S_pspp ).

Run FREQUENCIES for the following variables in the GSS:

● f4_satfin,

● f11_wealth.

● hap2_happy,

● p1_partyid,

● r1_relig,

● r4_denom,

● r8_reliten,

● s1_nummen,

● s2_numwomen,

● s9_premarsx, and

● d1_age.

For each variable, decide which level of measurement it represents and write a sentence or two

indicating why you think it is that level. Keep in mind that we’re only considering what SPSS calls

the valid responses. The missing responses represent missing data (e.g., don’t know or no answer

responses).

[1] Stanley Smith Stevens, 1946, “On the Theory of Scales of Measurement,” Science XXXXXXXXXX),

pp XXXXXXXXXX.

[2] You might wonder why we didn’t use an example from the GSS. There isn’t one. They don’t

occur in social science research very often. There are examples from the field of business. Think

about profit for businesses over a fiscal year. There is no absolute zero. Profit could be positive or

negative.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/493#_ftnref2

STAT2S:Exercise Using SPSS to Explore Measures of Central

Tendency and Dispersion

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES in SPSS to explore measures of central tendency and dispersion. A good reference on

using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward

Nelson (Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research

and Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore measures of central tendency (mode, median, and mean) and

dispersion (range, interquartile range, standard deviation, and variance). The exercise also gives you

practice in using FREQUENCIES in SPSS.

Part I – Measures of Central Tendency

Data analysis always starts with describing variables one-at-a-time. Sometimes this is referred to as

univariate (one-variable) analysis. Central tendency refers to the center of the distribution.

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT2S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT2S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT2S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT2S.docx

There are three commonly used measures of central tendency – the mode, median, and mean of a

distribution. The mode is the most common value or values in a distribution [1] . The median is the middle

value of a distribution. [2] The mean is the sum of all the values divided by the number of values.

Run FREQUENCIES in SPSS for the variable d9_sibs. (See Chapter 4, Frequencies in the online SPSS

book mentioned on page XXXXXXXXXXOnce you have selected this variable click on the “Statistics” button and

check the boxes for mode, median, and mean. Then click on “Continue” and click on the “Charts” button.

Select “Histogram” and check the box for “Show normal curve on histograms.” Then click on “Continue.”

That will take you back to the screen where you selected the variable. Click on “OK” and SPSS will open

the Output window and display the results that you requested.

Your output will display the frequency distribution for d9_sibs and a box showing the mode, median, and

mean with the following values displayed.

● Mode = 2 meaning that two brothers and sisters was the most common answer XXXXXXXXXX%) from the

2,531 respondents who answered this question. However, not far behind are those with one sibling

(18.6%) and those with three siblings XXXXXXXXXX%). So while technically two siblings is the mode, what

you really found is that the most common values are one, two, and three siblings. Another part of

your output is the histogram which is a chart or graph of the frequency distribution. The histogram

clearly shows that one, two, and three are the most common values (i.e., the highest bars in the

histogram). So we would want to report that these three categories are the most common

responses.

● Median = 3 which means that three siblings is the middle category in this distribution. The middle

category is the category that contains the 50 th percentile which is the value that divides the

distribution into two equal parts. In other words, it’s the value that has 50% of the cases above it

and 50% of the cases below it. The cumulative percent column of the frequency distribution tells

you that 41.4% of the cases have two or fewer siblings and that 59.3% of the cases have three or

fewer siblings. So the middle case (i.e., the 50 th percentile) falls somewhere in the category of

three siblings. That is the median category.

● Mean = XXXXXXXXXXwhich is the sum of all the values in the distribution divided by the number of

responses. If you were to sum all these values that sum would be 9, XXXXXXXXXXDividing that by the

number of responses or 2,531 will give you the mean of 3.74.

Part II – Deciding Which Measure of Central Tendency to Use

The first thing to consider is the level of measurement (nominal, ordinal, interval, ratio) of your variable (see

Exercise STAT1S).

● If the variable is nominal, you have only one choice. You must use the mode.

● If the variable is ordinal, you could use the mode or the median. You should report both measures

of central tendency since they tell you different things about the distribution. The mode tells you

the most common value or values while the median tells you where the middle of the distribution

lies.

● If the variable is interval or ratio, you could use the mode or the median or the mean. Now it gets a

little more complicated. There are several things to consider.

○ How skewed is your distribution? [3] Go back and look at the histogram for d9_sibs. Notice

that there is a long tail to the right of the distribution. Most of the values are at the lower

level – one, two, and three siblings. But there are quite a few respondents who report

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn3

having four or more siblings and about 5% said they have ten or more siblings. That’s

what we call a positively skewed distribution where there is a long tail towards the right or

the positive direction. Now look at the median and mean. The mean XXXXXXXXXXis larger than

the median XXXXXXXXXXThe respondents with lots of siblings pull the mean up. That’s what

happens in a skewed distribution. The mean is pulled in the direction of the skew. The

opposite would happen in a negatively skewed distribution. The long tail would be towards

the left and the mean would be lower than the median. In a heavily skewed distribution the

mean is distorted and pulled considerably in the direction of the skew. So consider

reporting only the median in a heavily skewed distribution. That’s why you almost always

see median income reported and not mean income. Imagine what would happen if your

sample happened to include Bill Gates. The income distribution would have this very, very

large value which would pull the mean up but not affect the median.

○ Is there more than one clearly defined peak in your distribution? The number of siblings

has one clearly defined peak – one, two and three siblings. But what if there is more than

one clearly defined peak? For example, consider a hypothetical distribution of XXXXXXXXXXcases

in which there are XXXXXXXXXXcases with a value of two and fifty cases with a value of XXXXXXXXXXThe

median and mean would be five but there are really two centers of this distribution – two

and eight. The median and the mean aren’t telling the correct story about the center.

You’re better off reporting the two clearly defined peaks of this distribution and not

reporting the median and mean.

○ If your distribution is normal in appearance then the mode, median, and mean will all be

about the same. A normal distribution is a perfectly symmetrical distribution with a single

peak in the center. No empirical distribution is perfectly normal but distributions often are

approximately normal. Here we would report all three measures of central tendency. Go

back to your SPSS output and look at the histogram for d9_sibs. When you told SPSS to

give you the histogram you checked the box that said “Show normal curve on histograms.”

SPSS then superimposed the normal curve on the histogram. The normal curve doesn’t fit

the histogram perfectly particularly at the lower end but it does suggest that it

approximates a normal curve particularly at the upper end.

Run FREQUENCIES for the following variables. Once you have selected the variables click on the

“Statistics” button and check the boxes for mode, median, and mean. Then click on “Continue” and click

on the “Charts” button. Select “Histogram” and check the box for “Show normal curve on histograms.”

Then click on “Continue.” That will take you back to the screen where you selected the variables. Click on

“OK” and SPSS will open the Output window and display the results of what you requested. For each

variable write a sentence or two indicating which measure(s) of central tendency would be appropriate to

use to describe the center of the distribution and what the values of those statistics mean.

● hap2_happy

● p1_partyid

● r8_reliten

● s1_nummen

● s2_numwomen

● d1_age

Part III – Measures of Dispersion or Variation

Dispersion or variation refers to the degree that values in a distribution are spread out or dispersed. The

measures of dispersion that we’re going to discuss are appropriate for interval and ratio level variables (see

Exercise STAT1S). [4] We’re going to discuss four such measures – the range, the inter-quartile range, the

variance, and the standard deviation.

The range is the difference between the highest and the lowest values in the distribution. Run

FREQUENCIES for d1_age and compute the range by looking at the frequency distribution. You can also

ask SPSS to compute it for you. Click on “Statistics” and then click on “Range.” You should get XXXXXXXXXXwhich

is 89 – XXXXXXXXXXThe range is not a very stable measure since it depends on the two most extreme values – the

highest and lowest values. These are the values most likely to change from sample to sample.

A more stable measure of dispersion is the interquartile range which is the difference between the third

quartile (Q3) and the first quartile (Q1). The third quartile is the same thing as the seventy-fifth percentile

which is the value that has 25% of the cases above it and 75% of the cases below it. The first quartile is

the same as the twenty-fifth percentile which is the value that has 75% of the cases above it and 25% of

the cases below it. SPSS will calculate Q3 and Q1 for you. Click on the “Statistics” button and then click

on “Quartiles” in the “Percentiles” box in the upper left. Once you know Q3 and Q1 you can calculate the

interquartile range by subtracting Q1 from Q3. Since it’s not based on the most extreme values it will be

more stable from sample to sample. Go back to SPSS and calculate Q3 and Q1 for d1_age and then

calculate the interquartile range. Q3 will equal XXXXXXXXXXand Q1 will equal XXXXXXXXXXand the interquartile range will equal

60 – XXXXXXXXXXor 27.

The variance is the sum of the squared deviations from the mean divided by the number of cases minus 1

and the standard deviation is just the square root of the variance. Your instructor may want to go into more

detail on how to calculate the variance by hand. SPSS will also calculate it for you. Click on the “Statistics”

button and then click on “Variance” and on “Standard deviation.” The variance should equal XXXXXXXXXXand the

standard deviation will equal XXXXXXXXXX.

The variance and the standard deviation can never be negative. A value of 0 means that there is no

variation or dispersion at all in the distribution. All the values are the same. The more variation there is, the

larger the variance and standard deviation.

So what does the variance XXXXXXXXXXand the standard deviation XXXXXXXXXXof the age distribution mean?

That’s hard to answer because you don’t have anything to compare it to. But if you knew the standard

deviation for both men and women you would be able to determine whether men or women have more

variation. Instead of comparing the standard deviations for men and women you would compute a statistic

called the Coefficient of Relative Variation (CRV). CRV is equal to the standard deviation divided by the

mean of the distribution. A CRV of 2 means that the standard deviation is twice the mean and a CRV of

0.5 means that the standard deviation is one-half of the mean. You would compare the CRV’s for men and

women to see whether men or women have more variation relative to their respective means.

You might also have wondered why you need both the variance and the standard deviation when the

standard deviation is just the square root of the variance. You’ll just have to take my word for it that you will

need both as you go further in statistics.

Run FREQUENCIES for the following variables. Once you have selected the variables click on the

“Statistics” button and check the boxes for quartiles, range, variance, standard deviation, and mean. Then

click on “Continue.” That will take you back to the screen where you selected the variables. Click on “OK”

and SPSS will open the Output window and display the results of what you requested. For each variable

write a sentence or two indicating what the values of these statistics are for each of the variables and what

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftn4

the values of those statistics mean. Compare the relative variation for the number of male sex partners

since the age of XXXXXXXXXXs1_nummen) and the number of female sex partners (s2_numwomen) by comparing

the CRV’s for each variable.

● s1_nummen

● s2_numwomen

● d9_sibs

[1] Frequency distributions can be grouped or ungrouped. Think of age. We could have a distribution that

lists all the ages in years of the respondents to our survey. One of the variables (d1_age) in our data set

does this. But we could also divide age into a series of categories such as under 30, XXXXXXXXXXto 39, XXXXXXXXXXto 49, 50

to 59, XXXXXXXXXXto 69, and XXXXXXXXXXand older. In a grouped frequency distribution the mode would be the most common

category or categories.

[2] In a grouped frequency distribution the median would be the category that contains the middle value.

[3] See Exercise STAT3S for a more thorough discussion of skewness.

[4] The Index of Qualitative Variation can be used to measure variation for nominal variables.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/494#_ftnref4

STAT3S: Exercise Using SPSS to Explore Measures of

Skewness and Kurtosis

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES in SPSS to explore measures of skewness and kurtosis. A good reference on using

SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson

(Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word;

Goals of Exercise

The goal of this exercise is to explore measures of skewness and kurtosis. The exercise also gives you

practice in using FREQUENCIES in SPSS.

Part I – Measures of Skewness

A normal distribution is a unimodal (i.e., single peak) distribution that is perfectly symmetrical. In a normal

distribution the mean, median, and mode are all equal. Here’s a graph showing what a normal distribution

looks like.

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT3S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT3S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT3S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT3S.docx

The horizontal axis is marked off in terms of standard scores where a standard score tells us how many

standard deviations a value is from the mean of the normal distribution. So a standard score of XXXXXXXXXXis one

standard deviation above the mean and a standard score of XXXXXXXXXXis one standard deviation below the mean.

The percents tell us the percent of cases that you would expect between the mean and a particular

standard score if the distribution was perfectly normal. You would expect to find approximately 34% of the

cases between the mean and a standard score of XXXXXXXXXXor XXXXXXXXXXIn a normal distribution, the mean, median, and

mode are all equal and are at the center of the distribution. So the mean always has a standard score of

zero.

Skewness measures the deviation of a particular distribution from this symmetrical pattern. In a skewed

distribution one side has longer or fatter tails than the other side. If the longer tail is to the left, then it is

called a negatively skewed distribution. If the longer tail is to the right, then it is called a positively skewed

distribution. One way to remember this is to recall that any value to the left of zero is negative and any

value to the right of zero is positive. Here are graphs of positively and negatively skewed distributions

compared to a normal distribution.

The best way to determine the skewness of a distribution is to tell SPSS to give you a histogram along with

the mean and median. SPSS will also compute a measure of skewness. Run FREQUENCIES in SPSS for

the variables d1_age and d9_sibs. (See Frequencies in Chapter 4 of the online SPSS book mentioned on

page XXXXXXXXXXClick on the “Charts” button and select “Histogram” and “Show normal curve on histogram.” Then

click on “Continue.” Now click on “Statistics” and select mean, median, skewness and kurtosis. Then click

on “Continue” and on “OK.” We’ll talk about kurtosis in a little bit.

Notice that the mean is larger than the median for both variables. This means that the distribution is

positively skewed. But also notice that the mean for d9_sibs is quite a bit larger than the median in a

relative sense than is the case for d1_age. This suggests that the distribution for d9_sibs is the more

skewed of the two variables. Look at the histograms and you’ll see the same thing. Both variables are

positively skewed but d9_sibs is the more skewed variable. Now look at the skewness values — XXXXXXXXXXfor

d9_sibs and XXXXXXXXXXfor d1_age. The larger the skewness value, the more skewed the distribution. Positive

skewness values indicate a positive skew and negative values indicate a negative skew. There are various

rules of thumb suggested for what constitutes a lot of skew but for our purposes we’ll just say that the

larger the value, the more the skewness and the sign of the value indicates the direction of the skew.

Run FREQUENCIES for the following variables. Tell SPSS to give you the histogram and to show the

normal curve on the histogram. Also ask for the mean, median, and skewness. Write a paragraph for each

variable explaining what these statistics tell you about the skewness of the variables.

● d20_hrsrelax

● tv1_tvhours

Part II – Measures of Kurtosis

Kurtosis refers to the flatness or peakedness of a distribution relative to that of a normal distribution.

Distributions that are flatter than a normal distribution are called platykurtic and distributions that are more

peaked are called leptokurtic.

SPSS will compute a kurtosis measure. Negative values indicate a platykurtic distribution and positive

values indicate a leptokurtic distribution. The larger the kurtosis value, the more peaked or flat the

distribution is.

Look back at the output for d1_age and d9_sibs. For d1_age the kurtosis value was XXXXXXXXXXindicating a

flatter distribution and for d9_sibs kurtosis was XXXXXXXXXXindicating a more peaked distribution. To see this

visually look at your histograms.

Run FREQUENCIES for the following variables. Tell SPSS to give you the histogram and to show the

normal curve on the histogram. Also ask for kurtosis. Write a paragraph for each variable explaining what

these statistics tell you about the kurtosis of the variables.

● d22_maeduc

● d24_paeduc

● s6_sexfreq

STAT4S: Exercise Using SPSS to Explore Graphs and Charts

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses

FREQUENCIES and EXPLORE in SPSS to explore different ways of creating graphs and charts. A good

reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey,

Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is on the Social Science

Research and Instructional Council's Website . You have permission to use this exercise and to revise

it to fit your needs. Please send a copy of any revision to the author. Included with this exercise (as

separate files) are more detailed notes to the instructors, the SPSS syntax necessary to carry out the

exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS output file). Please contact the

author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore different ways of graphing frequency distributions. The exercise also

gives you practice in using FREQUENCIES and EXPLORE in SPSS.

Part I – Pie Charts

A pie chart is a chart that shows the frequencies or percents of a variable with a small number of

categories. It is presented as a circle divided into a series of slices. The area of each slice is proportional

to the number of cases or the percent of cases in each category. It is normally used with nominal or ordinal

variables (see Exercise STAT1S) but can be used with interval or ratio variables which have a small

number of categories.

Run FREQUENCIES in SPSS for the variables p1_partyid, p4_polviews, and d12_childs. (See Chapter 4,

Frequencies in the online SPSS book mentioned on page XXXXXXXXXXClick on “Charts” and select “Pie charts.”

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT4S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT4S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT4S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT4S.docx

Notice that there is an option called “Chart Values” that allows you to select whether you want your table to

include “Percentages” or “Frequencies.” Usually you want to select “Percentages.”

Once SPSS has displayed the pie chart in the output window, you can double click anywhere inside the pie

chart to open the “Chart Editor.” Once you have opened the “Chart Editor” right-click anywhere inside one

of the pie slices in the “Chart Editor” and you will see a list of different ways you can edit your pie chart.

Click on “Show Data Labels” and then click on the “Data Value Labels” tab. If “Percent” is not listed in the

“Displayed” box, move it to that box and click on “Apply” and then “Close.” If it is listed in the “Displayed”

box, just click on close. This will close the “Properties” box. Click anywhere outside the “Chart Editor” and

you will see your edited pie chart. There are lots of other ways you could edit your chart. Explore some of

them if you are curious.

If you are wondering why you shouldn’t use pie charts for variables with a large number of categories,

create a pie chart for d1_age and you’ll see why.

Part II – Bar Charts

A bar chart is a chart that shows the frequencies or percents of a variable and is presented as a series of

vertical bars that do not touch each other. The height of each bar is proportional to the number of cases or

the percent of cases in each category. It is normally used with nominal or ordinal variables.

Run FREQUENCIES for the variables p1_partyid and p4_polviews. This time click on “Charts” and select

“Bar charts.” Select “Percentages” to display percents in the chart.

Part III – Histograms

A histogram is a graph that shows the frequencies or percents of a variable with a larger number of

categories. It is presented as a series of vertical bars that touch each other. The height of each bar is

proportional to the number of cases or the percent of cases in each category. It is used with interval or

ratio variables.

Run FREQUENCIES for the variables d1_age, d4_educ, and d12_childs [1] . Click on “Charts” and select

“Histogram.”

Look at the histogram for d1_age. Let’s say you want to redefine the width of each vertical bar.

Double-click anywhere inside the histogram which will open the “Chart Editor.” Now right click anywhere

inside the rectangles in the “Chart Editor” and click on “Properties Window.” This will open the “Properties”

box. Click on the tab for “Binning.” Click on “Custom” and “Interval width” under “X Axis.” Enter XXXXXXXXXXin the

“Interval width” box indicating that you want each vertical bar to represent an interval width of ten years.

Where do we want the first interval to start? We could let SPSS decide but let’s make the decision

ourselves. Click on “Custom value for anchor” and enter XXXXXXXXXXin the box. Click on “Apply” and look at your

histogram. Does it look how you want it to look? Is there any further editing you want to do? If you are

satisfied, click on “Close” to close the “Properties” box. Click anywhere outside the “Chart Editor” box and

you will see your edited histogram.

Part IV – Box Plots

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/496#_ftn1

A box plot is a graph that displays visually a number of characteristics of a frequency distribution:

● the third quartile (Q3),

● the first quartile (Q1),

● the interquartile range (IQR),

● the median,

● the range,

● outliers, and

● extreme values.

Run EXPLORE for d1_age, d4_educ, and d12_childs. (See Chapter 4, Explore in the online SPSS book.)

You can use the default settings for EXPLORE so all you have to do is click “OK” after you have selected

your variables.

The first thing you will see is various descriptive statistics for each variable. You’re probably familiar with

most of these. Then you’ll see the stem-and-leaf display which we’re not going to discuss. The last thing

you’ll see is the box plot. Let’s look at the boxplot for d1_age. The box is bounded at the top by the third

quartile (Q3) and at the bottom by the first quartile (Q1). The height of the box (Q3 – Q1) is the interquartile

range. The horizontal line inside the box represents the median. There are two vertical lines coming out of

the box. This line extends upward to the maximum value and downward to the minimum value. The

difference between the maximum and minimum values is the range.

You can also learn about skewness from the box plot. In a non-skewed distribution, the median will be in

the middle of the box halfway between the third and first quartiles. In a skewed distribution the median will

be either higher or lower in the box. Notice that for d1_age and d4_educ the median is in the middle of the

box suggesting that these distributions are not very skewed but for d12_childs the median is in the upper

part of the box suggesting that this is a positively skewed distribution.

Now look at the box plots for d4_educ and d12_childs. Here you’ll see some circles and numbers. The

circles represent outliers which are values that lie between XXXXXXXXXXand XXXXXXXXXXbox lengths above the third quartile

or below the first quartile. A box length is just another name for the interquartile range since the height of

the box is the interquartile range. The numbers are the case numbers in SPSS. Extreme values are

values that are more than XXXXXXXXXXbox lengths from the first or third quartiles. There aren’t any extreme values

in these distributions.

Sometimes you want to compare box plots for two or more groups of respondents. Let’s look at the box

plot for d1_age and compare the box plots for men and women. Run EXPLORE for d1_age but this time

put d5_sex in the “Factor List” box. Your output should now show the box plots for men and women

side-by-side.

Part V – Conclusions

We have talked about four different types of graphs – pie charts, bar charts, histograms, and box plots.

There are other types of graphs you could use but these are the four most commonly used graphs. There

are other ways to construct graphs in SPSS that your instructor might want to talk about. You can click on

“Graphs” in the menu bar at the top of the SPSS screen and then on “Chart Builder” but we aren’t going to

go into that in this exercise.

[1] There is a small problem with d12_childs. One of the categories is “eight or more” children. That means

we don’t know what these values actually are. They could be 8 or XXXXXXXXXXor XXXXXXXXXXor XXXXXXXXXXor something else. Since

there are so few cases in this category we’re going to ignore this problem.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/496#_ftnref1

STAT5S: Exercise Using SPSS to Explore Hypothesis Testing –

One-Sample t Test

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (one-sample t test) and SELECT CASES in SPSS to explore hypothesis testing and the

one-sample t test. A good reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by

Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online version of the book is

on the Social Science Research and Instructional Center's Website . You have permission to use this

exercise and to revise it to fit your needs. Please send a copy of any revision to the author. Included with

this exercise (as separate files) are more detailed notes to the instructors, the SPSS syntax necessary to

carry out the exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS output file). Please

contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the one-sample t test. The exercise also gives

you practice in using COMPARE MEANS (one-sample t test) and SELECT CASES in SPSS.

Part I – Simple Random Sampling

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT5S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT5S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT5S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT5S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

There are many ways of selecting a probability sample but the most basic type of probability sample is a

simple random sample in which everyone in the sample has the same chance of being selected in the

sample. SPSS will select a simple random sample for you. We’re going to use the General Social Survey

(GSS) for this exercise. The GSS is a national probability sample of adults in the United States conducted

by the National Opinion Research Center (NORC). The GSS started in XXXXXXXXXXand has been an annual or

biannual survey ever since. For this exercise we’re going to use a subset of the XXXXXXXXXXGSS. Your instructor

will tell you how to access this data set which is called gss14_subset_for_classes_STATISTICS.sav. It’s a

large sample of about 2,500 individuals. To illustrate simple random sampling, we’re going to select a

simple random sample of 30% of all the individuals in the GSS. [1]

Start by getting a frequency distribution for the variable d4_educ which is the last year of school completed

by the respondent. (See, Chapter 4, Frequencies in the online SPSS book mentioned on page XXXXXXXXXXYou’ll

see that there are a total of 2,538 cases. One of those cases said he or she didn’t know. That means

there are 2,537 valid cases that answered the question.

Now click on Data in the menu bar at the top of the screen. (See Chapter 3, Select Cases in the online

SPSS book.) This will open a drop-down box. Click on SELECT CASES. Then click on “Random sample

of cases” and then on “Sample” in the box below. One of the options will already be selected and will say

“Approximately [box] % of all cases.” Fill in XXXXXXXXXXin the box indicating that you want to select a simple random

sample of 30% of all the cases in the GSS. Click on “Continue” and then on “OK.” Now run

FREQUENCIES again for the variable, d4_educ. Your sample will be smaller than before. This is a

random sample of all the cases in the GSS.

Part II. Hypothesis Testing – the One-Sample T test

Let’s think about our variable, d4_educ. What do we know about education in the United States? One

thing we know is that the average years of school completed has been increasing over the twentieth and

twenty-first centuries. It used to be that many people stopped after completing high school which would be

12 years of education. Now more go on to college. So we would hypothesize that the mean years of

school completed is now greater than XXXXXXXXXXHow could we test that hypothesis? We need a statistical

procedure to do that. The t test is one of a number of statistical tests that we can use to test such

hypotheses.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn1

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We can calculate the mean years of school completed by all the adults in the sample who answered the

question. But we want to test the hypothesis that the mean years of school completed in the population of

all adults is greater than XXXXXXXXXXWe’re going to use our sample data to test a hypothesis about the

population. [2]

What do we know about sampling? We know that no sample is ever a perfect representation of the

population from which the sample is drawn. This is because every sample contains some amount of

sampling error. Sampling error is inevitable. There is always some amount of sampling error present in

every sample. Another thing we know is that the larger the sample size, the less the sampling error.

So the hypothesis we want to test is that the mean years of school completed in the population is greater

than XXXXXXXXXXWe’ll call this our research hypothesis. It’s what we expect to be true. But there is no way to

prove the research hypothesis directly. So we’re going to use a method of indirect proof. We’re going to

set up another hypothesis that says that the research hypothesis is not true and call this the null

hypothesis. [3] In our case, the null hypothesis would be that the mean years of school completed in the

population is equal to XXXXXXXXXXIf we can reject the null hypothesis then we have evidence to support the

research hypothesis. If we can’t reject the null hypothesis then we don’t have any evidence in support of

the research hypothesis. You can see why this is called a method of indirect proof. We can’t prove the

research hypothesis directly but if we can reject the null hypothesis then we have indirect evidence that

supports the research hypothesis.

Here are our two hypotheses.

● research hypothesis – the population mean is greater than 12

● null hypothesis – the population mean is equal to 12

It’s the null hypothesis that we are going to test.

Before we carry out the t test, let’s make sure we are using the full GSS sample and not the 30% simple

random sample. Click on “Data” and on “Select Cases.” Select “All cases” and then click on OK. Now you

are using all the cases.

Now click on “Analyze” in the menu bar which will open a drop-down menu. Click on “Compare Means”

which will open another drop-down menu and click on “One-Sample T Test.” Move the variable, d4_educ,

over to the “Test Variable(s)” box on the right. Below the box on the right you will see a box called “Test

Value.” This is where we enter the value specified in the null hypothesis which in our case is XXXXXXXXXXAll you

have to do now is click on OK.

You should see two output boxes. The first box will have four values in it.

● N is the number of cases for which we have valid information [4] (i.e., the number of respondents

who answered the question). In this problem, N equals 2,537.

● Mean is the mean years of school completed by the respondents in the sample who answered the

question (see STAT2S). In this problem, the sample mean equals XXXXXXXXXX.

● Standard Deviation is a measure of dispersion (see STAT2S). In this problem, the standard

deviation equals XXXXXXXXXX.

● Standard Error of the Mean is an estimate of how much sampling error there is. In this problem,

the standard error equals .061.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn4

The second box will have five values in it.

● t is the value of the t test

● df is the number of degrees of freedom

● Significance (2-tailed) value

● Mean Difference

● 95% Confidence Interval of the Difference which we’re going to discuss in a later exercise

There is a formula for calculating the value of t in the t test. Your instructor may or may not want you to

learn how to calculate the value of t. I’m going to leave it to your instructor to do this. In this problem t

equals XXXXXXXXXX.

Degrees of freedom (df) is the number of values that are free to vary. If the sample mean equals XXXXXXXXXX

then how many values are free to vary? The answer is N – 1 which is 2,537 – 1 or 2, XXXXXXXXXXSee if you can

figure out why it’s 2, XXXXXXXXXXYour instructor will help you if you are having trouble figuring it out.

The significance value is a probability. It’s the probability that you would be wrong if you rejected the null

hypothesis. It’s XXXXXXXXXXwhich you would think is telling you that there is no chance of being wrong if you

rejected the null hypothesis. But it’s actually a rounded value and it means that the probability is less than

XXXXXXXXXXor less than five in ten thousand. So there is a chance of being wrong but it’s really, really small.

The mean difference is the difference between the sample mean XXXXXXXXXXand the value specified in the null

hypothesis XXXXXXXXXXSo it’s XXXXXXXXXX – XXXXXXXXXXor 1.68. [5] That’s the amount that your sample mean differs from the

value in the null hypothesis. If it’s positive, then your sample mean is larger than the value in the null and if

it’s negative, then your sample mean is smaller than the value in the null.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is less than XXXXXXXXXXThat tells you that the

probability of being wrong if you rejected the null hypothesis is less than five out of ten thousand. With

odds like that, of course, we’re going to reject the null hypothesis. A common rule is to reject the null

hypothesis if the significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the population mean was greater than XXXXXXXXXXWe’re

actually predicting the direction of the difference. We’re predicting that the population mean will be greater

than XXXXXXXXXXThat’s called a one-tailed test and we have to use a one-tailed significance value. It’s easy to get

the one-tailed significance value if we know the two-tailed significance value. If the two-tailed significance

value is less than XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or XXXXXXXXXX.

We still reject the null hypothesis which means that we have evidence to support our research hypothesis.

We haven’t proven the research hypothesis to be true but we have evidence to support it.

Part III. Now It’s Your Turn

There is another variable in the XXXXXXXXXXGSS called d18_hrs1 which is the number of hours that the

respondent worked last week if he or she was employed. Many people have suggested that Americans

are working longer hours than they used to. Since the traditional work week is XXXXXXXXXXhours, if it’s true that

we’re working more hours our research hypothesis would be that the mean number of hours worked last

week would be greater than XXXXXXXXXXDo a one-sample t test to test this hypothesis. For each value in the

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn5

output, explain what it means. Then decide whether you should reject or not reject the null hypothesis and

what this tells you about the research hypothesis.

I’ll tell you that you should reject the null hypothesis even though the mean difference was less than one

hour. You might wonder why you reject the null hypothesis when the mean difference is so small. Notice

that we have a large sample (N = 1, XXXXXXXXXXLet’s see what happens when we have a sample that’s only 10%

of that size. Take a simple random sample of 10% of the total sample. (Look back at Part I to see how to

do this.) Now we have a much smaller sample size. Rerun your t test and see what happens with a

smaller sample. For each value in the output, explain what it means. Then decide whether you should

reject or not reject the null hypothesis and what this tells you about the research hypothesis.

Now you probably won’t be able to reject the null hypothesis. [6] Why? Remember that we said the larger

the sample, the less the sampling error. If there is less sampling error, it’s going to be easier to reject the

null hypothesis. You can see this by looking at the standard error of the mean. It will probably be smaller in

the larger sample and bigger in the smaller sample. So when you have a really large sample don’t get too

excited when you reject the null hypothesis even though you have only a small mean difference.

[1] The GSS it itself not a simple random sample but rather is an example of a multistate cluster sample.

[2] Characteristics of a sample are called statistics while characteristics of a population are called

parameters.

[3] The null hypothesis is often called the hypothesis of no difference. We’re saying that the population

mean is still equal to XXXXXXXXXXIn other words, nothing has changed. There is no difference.

[4] Missing cases would include those who said they didn’t know or refused to answer the question.

[5] By the way, the value of the mean XXXXXXXXXXis a rounded value so that’s why the mean difference isn’t

exactly 1.68.

[6] Why probably? Because by chance you could get a much higher or lower mean which will produce a

larger t value and could mean that your significance value would be low enough to reject the null

hypothesis.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_edn6

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref4

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref5

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/500#_ednref6

STAT6S: Exercise Using SPSS to Explore Hypothesis Testing –

Independent-Samples

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (means and independent-samples t test) to explore hypothesis testing. A good reference on using

SPSS is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson

(Editor), and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the independent-samples t test. The exercise

also gives you practice in using COMPARE MEANS.

Part I – Computing Means

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT6S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT6S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT6S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT6S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

Let’s start by asking two questions.

● Do men and women differ in the number of years of school they have completed?

● Do men and women differ in the number of hours they worked in the last week?

Click on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6,

introduction in the online SPSS book mentioned on page XXXXXXXXXXSelect the variables d4_educ and d18_hrs1

and move them to the “Dependent List” box. These are the variables for which you are going to compute

means. Then select the variable d5_sex and move it to the “Independent List” box. This is the variable

which defines the groups you want to compare. In our case we want to compare men and women. The

output from SPSS will show you the mean, number of cases, and standard deviation for men and women

for these two variables.

Men and women differ very little in the number of years of school they completed. Men have completed a

little less than one-tenth of a year more than women. But men worked quite a bit more than women in the

last week – a difference of almost six hours. By the way, only respondents who are employed are included

in this calculation but both part-time and full-time employees are included.

Why can’t we just conclude that men and women have about the same education and that men work more

than women? If we were just describing the sample , we could. But what we want to do is to make

inferences about differences between men and women in the population . We have a sample of men and

a sample of women and some amount of sampling error will always be present in both samples. The larger

the sample, the less the sampling error and the smaller the sample, the more the sampling error. Because

of this sampling error we need to make use of hypothesis testing as we did in the previous exercise

(STAT5S).

Part II – Now it’s Your Turn

In this part of the exercise you want to compare men and women to answer these two questions.

● Do men and women differ in the number of hours per day they have to relax? This is variable

d20_hrsrelax in the GSS.

● Do men and women differ in the number of hours per day they watch television? This is variable

tv1_tvhours in the GSS.

Use SPSS to get the sample means and then compare them to begin answering these questions.

Part III – Hypothesis Testing – Independent-Samples t Test

In Part I we compared the mean scores for men and women for the following variables.

● d4_educ

● d18_hrs1

Now we want to determine if that difference is statistically significant by carrying out the

independent-samples t test.

A t test is used when you want to compare two groups. The “grouping variable” defines these two groups.

The variable, d5_sex, is a dichotomy. It has only two categories – male (value XXXXXXXXXXand female (value XXXXXXXXXXBut

any variable can be made into a dichotomy by establishing a cut point or by recoding. For example, the

variable f4_satfin (satisfaction with financial situation) has three categories – satisfied (value 1), more or

less satisfied (value 2), and not at all satisfied (value XXXXXXXXXXThe cut point is the value that makes this into a

dichotomy. All values less than the cut point are in one category and all values equal to or larger than the

cut point are in the other category. If your cut point is 3, then values 1 and 2 are in one category and value

3 is in the other category.

Click on “Analyze” and then on “Compare Means” and finally on “Independent-Samples T Test.” (See

Chapter 6, independent-samples t test in the online SPSS book.) Move the two variables listed above into

the “Test Variable(s)” box. These are the variables for which you want to compute the mean scores. Right

below the “Test Variable(s)” box is the “Grouping Variable” box. This is where you indicate which variable

defines the groups you want to compare. In this problem the grouping variable is d5_sex. Once you have

entered the grouping variable, then enter either the values of the two groups or the cut point.

In our case, you would enter 1 for male into Group 1 and 2 for females into Group XXXXXXXXXXIt wouldn’t matter

which was Group 1 and which was Group XXXXXXXXXXFinally click on “OK.”

You should see two boxes in the output screen. The first box gives you four pieces of information.

● N which is the number of males and females on which the t test is based. This includes only those

cases with valid information. In other words, cases with missing information (e.g., don’t know, no

answer) are excluded.

● Means for males and females.

● Standard deviations for males and females.

● Standard error of the mean for males and females which is an estimate of the amount of sampling

error for the two samples.

The second box has more information in it. The first thing you notice is that there are two t tests for each

variable. One assumes that the two populations (i.e., all males and all females) have equal population

variances and the other doesn’t make this assumption. In our two examples, both t tests give about the

same results. We’ll come back to this in a little bit. The rest of the second box has the following

information. Let’s look at the t test for d4_educ.

● t is the value of the t test which is XXXXXXXXXXfor both t tests. There is a formula for computing t which

your instructor may or may not want to cover in your course.

● Degrees of freedom in the first t test is (N males – XXXXXXXXXXN females – 1) = N males XXXXXXXXXXN females XXXXXXXXXX = 2,535.

In the second t test the degrees of freedom is estimated and turns out to be a decimal.

● The significance (two-tailed) value which we’ll cover in a little bit.

● The mean difference is the mean for the first group (males) – the mean for the second group

(females) = XXXXXXXXXX – XXXXXXXXXX = XXXXXXXXXXInstead of using the rounded values, SPSS carries the

computation out to more decimal points which results in a mean difference of XXXXXXXXXXIn other words,

males have XXXXXXXXXXof a year more education than females which is a very small difference.

● The standard error of the difference which is XXXXXXXXXXis an estimate of the amount of sampling error for

the difference score.

● 95% confidence interval of the difference which we’ll talk about in a later exercise.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean years of school completed by men and women in the sample who answered the

question. But we want to test the hypothesis that the mean years of school completed by men and women

in the population are different. We’re going to use our sample data to test a hypothesis about the

population.

The hypothesis we want to test is that the mean years of school completed by men in the population is

different than the mean years of school completed by women in the population. We’ll call this our research

hypothesis. It’s what we expect to be true. But there is no way to prove the research hypothesis directly.

So we’re going to use a method of indirect proof. We’re going to set up another hypothesis that says that

the research hypothesis is not true and call this the null hypothesis. If we can’t reject the null hypothesis

then we don’t have any evidence in support of the research hypothesis. You can see why this is called a

method of indirect proof. We can’t prove the research hypothesis directly but if we can reject the null

hypothesis then we have indirect evidence that supports the research hypothesis. We haven’t proven the

research hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

● research hypothesis – the population mean for men minus the population mean for women does

not equal XXXXXXXXXXIn other words, they are different from each other.

● null hypothesis – the population mean for men minus the population mean for women equals XXXXXXXXXXIn

other words, they are not different from each other.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXfor both t tests. That tells you that the

probability of being wrong if you rejected the null hypothesis is just about XXXXXXXXXXor XXXXXXXXXXtimes out of one

hundred. With odds like that, of course, we’re not going to reject the null hypothesis. A common rule is to

reject the null hypothesis if the significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the population mean for men minus the population

mean for women does not equal XXXXXXXXXXWe’re not predicting that one population mean will be larger or smaller

than the other. That’s called a two-tailed test and we have to use a two-tailed significance value. If we had

predicted that one population mean would be larger than the other that would be a two-tailed test. It’s easy

to get the one-tailed significance value if we know the two-tailed significance value. If the two-tailed

significance value is XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or .045.

We still haven’t explained why there are two t tests. As we said earlier, one assumes that the two

populations (i.e., all males and all females) have equal population variances and the other doesn’t make

this assumption. To compute the t value we need to estimate the population variances (see STAT2S). If

the population variances are about the same, we can pool our two samples to estimate the population

variance. If they are not about the same we wouldn’t want to do this. So how do we decide which t test to

use? Here’s where we’ll talk about the Levene’s test for the equality of variances which is in the second

box in your SPSS output. For this test, the null hypothesis is that the two population variances are equal.

The appropriate test would be the F test which we’re not going to discuss until a later exercise (STAT8S).

But we know how to interpret significance values so we can still make use of this test. The significance

value for the variable d4_educ is XXXXXXXXXXwhich is not less than XXXXXXXXXXso we do not reject the null hypothesis

that the population variances are equal. This means that we would use the t test that assumes equal

population variances.

Part IV – Now it’s Your Turn Again

In this part of the exercise you want to compare men and women to answer these two questions but this

time you want to test the appropriate null hypotheses.

● Do men and women differ in the number of hours per day they have to relax?

● Do men and women differ in the number of hours per day they watch television?

Use the independent-sample t test to carry out this part of the exercise. What are the research and the null

hypotheses? Do you reject or not reject the null hypotheses? Explain why.

Part V – What Does Independent Samples Mean?

Why do we call this t test the independent-samples t test? Independent samples are samples in which the

composition of one sample does not influence the composition of the other sample. In this exercise we’re

using the XXXXXXXXXXGSS which is a sample of adults in the United States. If we divide this sample into men and

women we would have a sample of men and a sample of women and they would be independent samples.

The individuals in one of the samples would not influence who is in the other sample.

Dependent samples are samples in which the composition of one sample does influence the composition of

the other sample. For example, if we have a sample of married couples and divide that sample into two

samples of men and women, then the men in one of the samples determines who the women are in the

other sample. The composition of the samples is dependent on each other. We’re going to discuss the

paired-samples t test in the next exercise (STAT7S).

STAT7S: Exercise Using SPSS to Explore Hypothesis Testing –

Paired-Samples t Test

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS (paired-samples t test) to explore hypothesis testing. A good reference on using SPSS is SPSS

for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and

Elizabeth Nelson. The online version of the book is on the Social Science Research and Instructional

Council's Website . You have permission to use this exercise and to revise it to fit your needs. Please

send a copy of any revision to the author. Included with this exercise (as separate files) are more detailed

notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the

SPSS output for the exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the paired-samples t test. The exercise also

gives you practice in using COMPARE MEANS.

Part I – Populations and Samples

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT7S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT7S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT7S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT7S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

In STAT6S we compared means from two independent samples. Independent samples are samples in

which the composition of one sample does not influence the composition of the other sample. In this

exercise we’re using the XXXXXXXXXXGSS which is a sample of adults in the United States. If we divide this

sample into men and women we would have a sample of men and a sample of women and they would be

independent samples. The individuals in one of the samples would not influence who is in the other

sample.

In this exercise we’re going to compare means from two dependent samples. Dependent samples are

samples in which the composition of one sample influences the composition of the other sample. The 2014

GSS includes questions about the years of school completed by the respondent’s parents – d22_maeduc

and d24_paeduc. Let’s assume that we think that respondent’s fathers have more education than

respondent’s mothers. We would compare the mean years of school completed by mothers with the mean

years of school completed by fathers. If the respondent’s mother is in one sample, then the respondent’s

father must be in the other sample. The composition of the samples is therefore dependent on each other.

SPSS calls these paired-samples so we’ll use that term from now on.

Let’s start by asking whether fathers or mothers have more years of school? Click on “Analyze” in the

menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6, introduction in the online

SPSS book mentioned on page XXXXXXXXXXSelect the variables d22_maeduc and d24_paeduc and move them to

the “Dependent List” box. These are the variables for which you are going to compute means. The output

from SPSS will show you the mean, number of cases, and standard deviation for fathers and mothers.

Fathers have about two-tenths of a year more education than mothers. Why can’t we just conclude that

fathers have more education than mothers? If we were just describing the sample , we could. But what we

want to do is to make inferences about differences between fathers and mothers in the population . We

have a sample of fathers and a sample of mothers and some amount of sampling error will always be

present in both samples. The larger the sample, the less the sampling error and the smaller the sample,

the more the sampling error. Because of this sampling error we need to make use of hypothesis testing as

we did in the two previous exercises (STAT5S and STAT6S).

Part II – Now it’s Your Turn

In this part of the exercise you want to compare the years of school completed by respondents and their

spouses to determine whether men have more education than their spouses or whether women have more

education than their spouses.

Use SPSS to get the sample means as we did in Part I and then compare them to begin answering this

question. But we need to be careful here. Respondents could be either male or female. We need to

separate respondents into two groups – men and women – and then separately compare male

respondents with their spouses and female respondents with their spouses. We can do this by putting the

variables d4_educ and d29_speduc into the “Dependent List” box and d5_sex into the “Independent List”

box.

Part III – Hypothesis Testing – Paired-Samples t Test

In Part I we compared the mean years of school completed by fathers and mothers. Now we want to

determine if this difference is statistically significant by carrying out the paired-samples t test.

Click on “Analyze” and then on “Compare Means” and finally on “Paired-Samples T Test.” (See Chapter 6,

paired-samples t test in the online SPSS book.) Move the two variables listed above into the “Paired

Variables” box. Do this by selecting d22_maeduc and click on the arrow to move it into the “Variable 1”

box. Then select the other variable, d24_paeduc, and click on the arrow to move it into the “Variable 2”

box. Now click on “OK” and SPSS will carry out the paired-samples t test. It doesn’t matter which variable

you put in the “Variable 1” and “Variable 2” boxes.

You should see three boxes in the output screen. The first box gives you four pieces of information.

● Means for mothers and fathers.

● N which is the number of mothers and fathers on which the t test is based. This includes only

those cases with valid information. In other words, cases with missing information (e.g., don’t

know, no answer) are excluded.

● Standard deviations for mothers and fathers.

● Standard error of the mean for mothers and fathers which is an estimate of the amount of sampling

error for the two samples.

The second box gives you the paired sample correlation which is the correlation between mother’s and

father’s years of school completed for the paired samples. If you haven’t discussed correlation yet don’t

worry about what this means.

The third box has more information in it. With paired samples what we do is subtract the years of school

completed for one parent in each pair from the years of school completed for the other parent in the same

pair. Since we put mother’s years of school completed in variable 1 and father’s education in variable 2

SPSS will subtract father’s education from mother’s education. So if the father completed XXXXXXXXXXyears and the

mother completed XXXXXXXXXXyears we would subtract XXXXXXXXXXfrom XXXXXXXXXXwhich would give you XXXXXXXXXXFor this pair the father

completed two more years than the mother.

The third box gives you the following information.

● The mean difference score for all the pairs in the sample which is XXXXXXXXXXThis means that fathers

had an average of almost two-tenths of a year more education than the mothers. By the way, in

Part I when we compared the means for d22_maeduc and d24_paeduc the difference was 0.22.

Here the mean difference score is XXXXXXXXXXWhy aren’t they the same? See if you can figure this out.

(Hint: it has something to do with comparing differences for pairs.)

● The standard deviation of the difference scores for all these pairs which is XXXXXXXXXX.

● The standard error of the mean which is an estimate of the amount of sampling error.

● The 95% confidence interval for the mean difference score. If you haven’t talked about confidence

intervals yet, just ignore this. We’ll talk about confidence intervals in a later exercise.

● The value of t for the paired-sample t test which is XXXXXXXXXXThere is a formula for computing t which

your instructor may or may not want to cover in your course.

● The degrees of freedom for the t test which is 1,795 which is the number of pairs minus one or

1,796 – 1 or 1, XXXXXXXXXXIn other words, 1,795 of the difference scores are free to vary. Once these

difference scores are fixed, then the final difference score is fixed or determined.

● The two-tailed significance value which is XXXXXXXXXXwhich we’ll cover next.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean years of school completed by respondent’s fathers and mothers in the sample who

answered the question. But we want to test the hypothesis that the mean years of school completed by

fathers is greater than the mean for mothers in the population . We’re going to use our sample data to test

a hypothesis about the population.

The hypothesis we want to test is that the mean years of school completed by fathers is greater than the

mean years of school completed by mothers in the population. We’ll call this our research hypothesis. It’s

what we expect to be true. But there is no way to prove the research hypothesis directly. So we’re going

to use a method of indirect proof. We’re going to set up another hypothesis that says that the research

hypothesis is not true and call this the null hypothesis. If we can’t reject the null hypothesis then we don’t

have any evidence in support of the research hypothesis. You can see why this is called a method of

indirect proof. We can’t prove the research hypothesis directly but if we can reject the null hypothesis then

we have indirect evidence that supports the research hypothesis. We haven’t proven the research

hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

· research hypothesis – the mean difference score in the population is negative. In other words, the

mean years of school completed by fathers is greater than the mean years for mothers for all pairs in the

population.

· null hypothesis – the mean difference score for all pairs in the population is equal to 0.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXThat tells you that the probability of being

wrong if you rejected the null hypothesis is XXXXXXXXXXor 2 times out of one hundred. With odds like that, of

course, we’re going to reject the null hypothesis. A common rule is to reject the null hypothesis if the

significance value is less than XXXXXXXXXXor less than five out of one hundred.

But wait a minute. The SPSS output said this was a two-tailed significance value. What does that mean?

Look back at the research hypothesis which was that the mean difference score for all pairs in the

population was less than XXXXXXXXXXWe’re predicting that the mean difference score for all pairs in the population

will be negative. That’s called a one-tailed test and we have to use a one-tailed significance value. It’s

easy to get the one-tailed significance value if we know the two-tailed significance value. If the two-tailed

significance value is XXXXXXXXXXthen the one-tailed significance value is half that or XXXXXXXXXXdivided by two or .010.

We still reject the null hypothesis which means that we have evidence to support our research hypothesis.

We haven’t proven the research hypothesis to be true but we have evidence to support it.

Part IV – Now it’s Your Turn Again

In this part of the exercise you want to compare the years of school completed by respondents and their

spouses to determine if women have more education than their spouses but this time you want to test the

appropriate null hypotheses.

Remember from Part II that we have to test this hypothesis first for men and then for women. We’re going

to do this by selecting out all the men and then computing the paired-samples t test. Do this by clicking on

“Data” in the menu bar and then clicking on “Select Cases.” Select “If condition is satisfied” and then click

on “If” in the box below. Select d5_sex and move it to the box on the right by clicking on the arrow pointing

to the right. Now click on the equals sign and then on 1 so the expression in the box reads “d5_sex = 1”.

Click on “Continue” and then on “OK”. To make sure you have selected out the males run a frequency

distribution for d5_sex. You should only see the males (i.e., value XXXXXXXXXXNow carry out the paired-samples t

test. Repeat this for the females (i.e., value XXXXXXXXXXby selecting out the females and then running the

paired-samples t test again.

For each paired-sample t test, state the research and the null hypotheses. Do you reject or not reject the

null hypotheses? Explain why.

STAT8S: Exercise Using SPSS to Explore Hypothesis Testing –

One-Way Analysis of Variance

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses COMPARE

MEANS and one-way analysis of variance to explore hypothesis testing. A good reference on using SPSS

is SPSS for Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor),

and Elizabeth Nelson. The online version of the book is on the Social Science Research and

Instructional Council's Website . You have permission to use this exercise and to revise it to fit your

needs. Please send a copy of any revision to the author. Included with this exercise (as separate files) are

more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax

file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional

information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and one-way analysis of variance (sometimes

abbreviated one-way anova). The exercise also gives you practice in using COMPARE MEANS.

Part I – Populations and Samples

Populations are the complete set of objects that we want to study. For example, a population might be all

the individuals that live in the United States at a particular point in time. The U.S. does a complete

enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).

We call this a census. Another example of a population is all the students in a particular school or all

college students in your state. Populations are often large and it’s too costly and time consuming to carry

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT8S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT8S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT8S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT8S.docx

out a complete enumeration. So what we do is to select a sample from the population where a sample is a

subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a

population. The mean age of a sample is a statistic while the mean age of the population is a parameter.

We use statistics to make inferences about parameters. In other words, we use the mean age of the

sample to make an inference about the mean age of the population. Notice that the mean age of the

sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples. Probability samples are samples in which every object in

the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).

This isn’t the case for non-probability samples. An example of a non-probability sample is an instant poll

which you hear about on radio and television shows. A show might invite you to go to a website and

answer a question such as whether you favor or oppose same-sex marriage. This is a purely volunteer

sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

In STAT6S and STAT7S we used the t test to compare means from two samples. In STAT6S the means

were from two independent samples while in STAT7S they were from paired samples. But what if we

wanted to compare means from more than two samples? For that we need to use a statistical test called

analysis of variance. In fact, the t test is a special case of analysis of variance.

The XXXXXXXXXXGSS includes a variable (d3_degree) that describes the highest degree in school that the person

achieved. The categories are less than high school, high school, junior college, bachelor’s degree,

graduate degree. Another variable is the number of hours per day that respondents say they watch

television (tv1_tvhours). We want to find out if there is any relationship between these two variables. One

way to answer this question would be to see if respondents with different levels of education watch different

amounts of television. For example, you might suspect that the more education respondents have, the less

television they watch.

Let’s start by looking at the mean number of hours that people watch television broken down by highest

educational degree. Click on “Analyze” in the menu bar and then on “Compare Means” and finally on

“Means.” (See Chapter 6, introduction in the online SPSS book mentioned on page XXXXXXXXXXSelect the variable

tv1_tvhours and move it to the “Dependent List” box. This is the variable for which you are going to

compute means. Then select the variable d3_degree and move it to the “Independent List” box. The

output from SPSS will show you the mean, number of cases, and standard deviation for the different levels

of education.

Respondents with more education watch less television than those with less education. For example,

respondents with a graduate degree watch an average of XXXXXXXXXXhours of television per day while those who

haven’t completed high school watch an average of XXXXXXXXXXhours – a difference of about two hours. Why

can’t we just conclude those with more education watch less television than those with less education? If

we were just describing the sample , we could. But what we want to do is to make inferences about

differences in the population . We have five samples from five different levels of education and some

amount of sampling error will always be present in all these samples. The larger the samples, the less the

sampling error and the smaller the samples, the more the sampling error. Because of this sampling error

we need to make use of hypothesis testing as we did in the three previous exercises (STAT5S, STAT6S,

and STAT7S).

Part II – Now it’s Your Turn

In this part of the exercise you want to determine whether people who live in some regions of the country

(d25_region) watch more television (tv1_tvhours) than people in other regions. Use SPSS to get the

sample means as we did in Part I and then compare them to begin answering this question. Write one or

two paragraphs describing the regions in which people watch more and less television.

Part III – Hypothesis Testing – One-Way Analysis of Variance

In Part I we compared the mean hours of television watched per day for different levels of education. Now

we want to determine if these differences are statistically significant by carrying out a one-way analysis of

variance.

Click on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” Select the

variables tv1_tvhours and move it to the “Dependent List” box. Then select the variable d3_degree and

move it to the “Independent List” box. Now click on “Options” in the upper-right corner and then check the

“Anova table and eta” box. Finally click on “Continue” and then on “OK.”

You should see four boxes in the output screen. The first box tells you how many cases are included in the

analysis and how many cases are excluded. Any variable with missing data will be excluded.

The second table shows you the mean, number of cases, and standard deviation for each of the five levels

of education.

The third table gives you results of the one-way analysis of variance. We’re not going to explain these

statistics in this exercise. Your instructor will decide how much to cover on the calculation and meaning of

these statistics.

● Between groups and within groups sum of squares.

● Degrees of freedom for the between groups and within groups sum of squares.

● Mean square for the between groups and within groups sum of squares.

● F statistic.

● Significance value.

The fourth box gives you the value of Eta and Eta squared which measure the degree of association

between the two variables. Again we’ll leave it to your instructor to talk about these measures.

Notice how we are going about this. We have a sample of adults in the United States (i.e., the XXXXXXXXXXGSS).

We calculate the mean number of hours per day that respondents watch television for each level of

education in the sample . But we want to test the hypothesis that the amount respondents watch television

varies by level of education in the population . We’re going to use our sample data to test a hypothesis

about the population.

Our hypothesis is that the mean number of hours watching television is higher for some levels of education

than for other levels in the population. We’ll call this our research hypothesis. It’s what we expect to be

true. But there is no way to prove the research hypothesis directly. So we’re going to use a method of

indirect proof. We’re going to set up another hypothesis that says that the mean number of hours watching

television is the same for all levels of education in the population and call this the null hypothesis. If we

can’t reject the null hypothesis then we don’t have any evidence in support of the research hypothesis. You

can see why this is called a method of indirect proof. We can’t prove the research hypothesis directly but if

we can reject the null hypothesis then we have indirect evidence that supports the research hypothesis.

We haven’t proven the research hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

● research hypothesis – the mean number of hours watching television for at least one level of

education is different from at least one other population mean.

● null hypothesis – the mean number of hours watching television is the same for all five levels of

education in the population.

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the F test to decide whether to reject or not reject the null

hypothesis. Look again at the significance value which is XXXXXXXXXXwhich actually means less than XXXXXXXXXX

since XXXXXXXXXXis a rounded value. That tells you that the probability of being wrong if you rejected the null

hypothesis is less than 5 out of ten thousand. With odds like that, of course, we’re going to reject the null

hypothesis. A common rule is to reject the null hypothesis if the significance value is less than XXXXXXXXXXor less

than five out of one hundred.

So what have we learned? We learned that the mean number of hours watching television for at least one

of the populations is different from at least one other population. But which ones? There are statistical

tests for answering this question. But we’re not going to cover that although your instructor might want to

discuss these tests.

Part IV – Now it’s Your Turn Again

In Part II you computed the mean number of hours that respondents watched television for each of the nine

regions of the country. Now we want to determine if these differences are statistically significant by

carrying out a one-way analysis of variance as described in Part III. Indicate what the research and null

hypotheses are and whether you can reject the null hypothesis. What does that tell you about the research

hypothesis?

STAT9S:Exercise Using SPSS to Explore Crosstabulation

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CROSSTABS

in SPSS to explore crosstabulation. A good reference on using SPSS is SPSS for Windows Version XXXXXXXXXXA

Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The online

version of the book is on the Social Science Research and Instructional Council's Website . You have

permission to use this exercise and to revise it to fit your needs. Please send a copy of any revision to the

author. Included with this exercise (as separate files) are more detailed notes to the instructors, the SPSS

syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS

output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; docx format).

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to introduce crosstabulation as a statistical tool to explore relationships between

variables. The exercise also gives you practice in using CROSSTABS in SPSS.

Part I—Relationships between Variables

In exercises STAT5S through STAT8S we used sample means to analyze relationships between variables.

For example, we compared men and women to see if they differed in the number of years of school

completed and the number of hours they worked in the previous week and discovered that men and

women had about the same amount of education but that men worked more hours than women. We were

able to compute means because years of school completed and hours worked are both ratio level

variables. The mean assumes interval or ratio level measurement (see STAT2S).

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT9S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT9S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT9S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT9S.docx

But what if we wanted to explore relationships between variables that weren’t interval or ratio?

Crosstabulation can be used to look at the relationship between nominal and ordinal variables. Let’s

compare men and women (d5_sex) in terms of the following:

● opinion about abortion (a1_abany),

● fear of crime (c1_fear),

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● gun ownership (g2_owngun),

● voting (p5_pres08), and

● religiosity (r8_reliten).

Before we look at the relationship between sex and these other variables, we need to talk about

independent and dependent variables. The dependent variable is whatever you are trying to explain. In

our case, that would be how people feel about abortion, fear of crime, gun control and ownership, voting

and religiosity. The independent variable is some variable that you think might help you explain why some

people think abortion should be legal and others think it shouldn’t be legal or any of the other variables in

our list above. In our case, that would be sex. Normally we put the dependent variable in the row and the

independent variable in the column. We’ll follow that convention in this exercise.

Let’s start with the first two variables in our list. We’re going to use a1_abany as our measure of opinion

about abortion. Respondents were asked if they thought abortion ought to be legal for any reason. And

we’re going to use c1_fear as our measure of fear of crime. Respondents were asked if they were afraid to

walk alone at night in their neighborhood. Run CROSSTABS to produce two tables. (See Chapter 5,

Crosstabs in the online SPSS book.) One will be for the relationship between d5_sex and a1_abany. The

other will be for d5_sex and c1_fear. Put the independent variable in the column and the dependent

variable in the row. If you don’t ask for percents, SPSS will give you only the counts (i.e., frequencies) so

be sure to ask for the percents. SPSS can compute the row percents, column percents, and total

percents. Your instructor will probably talk about how to compute these different percents. But how do you

know which percents to ask for? Here’s a simple rule for computing percents.

● If your independent variable is in the column, then you want to use the column percents.

● If your independent variable is in the row, then you want to use the row percents.

Since you put the independent variable in the column, you want the column percents.

Part II – Interpreting the Percents

Your first table should look like this.

It’s easy to make sure that you have the correct percents. Your independent variable (d5_sex) should be in

the column and it is. Column percents should sum down to 100% and they do.

How are you going to interpret these percents? Here’s a simple rule for interpreting percents.

● If your percents sum down to 100%, then compare the percents across.

● If your percents sum across to 100%, then compare the percents down.

Since the percents sum down to 100%, you want to compare across.

Look at the first row. Approximately 47% of men think abortion should be legal for any reason compared to

44% of women. There’s a difference of 3.6% which is really small. We never want to make too much of

small differences. Why not? No sample is ever a perfect representation of the population from which the

sample is drawn. This is because every sample contains some amount of sampling error. Sampling error

is inevitable. There is always some amount of sampling error present in every sample. The larger the

sample size, the less the sampling error and the smaller the sample size, the more the sampling error. So

in this case we would conclude that there probably isn’t any difference in the population between men and

women in their approval of abortion for any reason.

Now let’s look at your second table.

This time the percent difference is quite a bit larger. About 22% of men are afraid to walk alone at night in

their neighborhood compared to 39% of women. This is a difference of 16.8%. This is a much larger

difference and we have reason to think that women are more fearful of being a victim of crime than men.

Part III – Now it’s Your Turn

Choose two of the tables from the following list and compare men and women:

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● voting (p5_pres08), and

● religiosity (r8_reliten).

Make sure that you put the independent variable in the column and the dependent variable in the row. Be

sure to ask for the correct percents. What are values of the percents that you want to compare? What is

the percent difference? Does it look to you that there is much of a difference between men and women in

the variables you chose?

Part IV – Adding another Variable into the Analysis

So far we have only looked at variables two at a time. Often we want to add other variables into the

analysis. Let’s focus on the difference between men and women (d5_sex) in terms of gun ownership

(g2_owngun). First let’s get the two-variable table which should look like this.

Men were more likely to own guns by 9.5%. But what if we wanted to include social class in this analysis?

The XXXXXXXXXXGSS asked respondents whether they thought of themselves as lower, working, middle, or upper

class. This is variable d11_class. What we want to do is to hold constant perceived social class. In other

words, we want to divide our sample into four groups with each group consisting of one of these four

classes and then look at the relationship between d5_sex and g2_owngun separately for each of these four

groups.

We can do this by going back to the SPSS dialog box where we requested the crosstabulation and putting

the variable d11_class in the third box down right below the “Column(s)” box. (See Chapter 8, Crosstabs

Revisited in the online SPSS book.) Your table should look like this.

This table is more complicated. Notice that the table is actually divided into four tables with one on top of

the other. At the top we have those who said they were lower class, then working, middle and upper class.

Let’s look at the percent differences for each of these tables – 12.0%, 9.6%, 9.4%, and 0.4%. The first

three tables are similar to the two-variable table – 9.5% compared to 12.0%, 9.6%, and 9.4%. Remember

not to make too much out of small differences because of sampling error. But the last table for upper class

has a much smaller difference – 0.4%. In other words, when we look at only those who see themselves as

upper class, there really isn’t any difference between men and women in terms of gun ownership.

But notice something else. There are fewer people who say they are lower and upper class than say they

are working or middle class. There are only XXXXXXXXXXrespondents in the lower class table and even fewer, 48

respondents, in the upper class table. We’ll have more to say about this in the next exercise (STAT10S).

Part V – Now it’s Your Turn Again

In Part II we compared men and women (d5_sex) in terms of fear of crime (c1_fear). Run this table again

but this time add social class (d11_class) into the analysis as we did in Part IV. What happens to the

percent difference when you hold constant class? What does this tell you?

STAT10S: Exercise Using SPSS to Explore Chi Square

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CROSSTABS

in SPSS to explore the Chi Square test. A good reference on using SPSS is SPSS for Windows Version

23.0 A Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson. The

online version of the book is on the Social Science Research and Instructional Council's Website . You

have permission to use this exercise and to revise it to fit your needs. Please send a copy of any revision

to the author. Included with this exercise (as separate files) are more detailed notes to the instructors, the

SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the exercise

(SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; .docx format)

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; .docx format)

Goals of Exercise

The goal of this exercise is to introduce Chi Square as a test of significance. The exercise also gives you

practice in using CROSSTABS in SPSS.

Part I—Relationships between Variables

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

The XXXXXXXXXXGSS is a sample from the population of all adults in the United States at the time the survey was

done. In the previous exercise (STAT9S) we used crosstabulation and percents to describe the

https://web.archive.org/web/ XXXXXXXXXX/mailto: XXXXXXXXXX

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT10S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT10S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT10S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT10S.docx

relationship between pairs of variables in the sample. But we want to go beyond just describing the

sample. We want to use the sample data to make inferences about the population from which the sample

was selected. Chi Square is a statistical test of significance that we can use to test hypotheses about the

population. Chi Square is the appropriate test when your variables are nominal or ordinal (see STAT1S).

In STAT9S we started by using crosstabulation to look at the relationship between sex and opinion about

abortion. We’re going to use a1_abany as our measure of opinion about abortion. Respondents were

asked if they thought abortion ought to be legal for any reason. Run CROSSTABS to produce the table.

(See Chapter 5, Crosstabs in the online SPSS book mentioned on page XXXXXXXXXXYou want to get the

crosstabulation of d5_sex and a1_abany. Put the independent variable in the column and the dependent

variable in the row. Since your independent variable is in the column, you want to use the column

percents.

Part II – Interpreting the Percents

Your table should look like this.

Since your percents sum down to 100% (i.e., column percents), you want to compare the percents across.

Look at the first row. Approximately 47% of men think abortion should be legal for any reason compared to

44% of women. There’s a difference of 3.6% which seems small. We never want to make too much of

small differences. Why not? No sample is ever a perfect representation of the population from which the

sample is drawn. This is because every sample contains some amount of sampling error. Sampling error

is inevitable. There is always some amount of sampling error present in every sample. The larger the

sample size, the less the sampling error and the smaller the sample size, the more the sampling error.

But what is a small percent difference? Probably you would agree that a one to four percent difference is

small. But what about a five or six or seven percent difference? Is that small? Or is it large enough for us

to conclude that there is a difference between men and women in the population. Here’s where we can

use Chi Square.

Part III – Chi Square

Let’s assume that you think that sex and opinion about abortion are related to each other. We’ll call this our

research hypothesis. It’s what we expect to be true. But there is no way to prove the research hypothesis

directly. So we’re going to use a method of indirect proof. We’re going to set up another hypothesis that

says that the research hypothesis is not true and call this the null hypothesis. In our case, the null

hypothesis would be that the two variables are unrelated to each other. [1] In statistical terms, we often say

that the two variables are independent of each other. If we can reject the null hypothesis then we have

evidence to support the research hypothesis. If we can’t reject the null hypothesis then we don’t have any

evidence in support of the research hypothesis. You can see why this is called a method of indirect proof.

We can’t prove the research hypothesis directly but if we can reject the null hypothesis then we have

indirect evidence that supports the research hypothesis.

Here are our two hypotheses.

● research hypothesis – sex and opinion about abortion are related to each other

● null hypothesis – sex and opinion about abortion are unrelated to each other; in other words, they

are independent of each other

It's the null hypothesis that we are going to test.

SPSS will compute Chi Square for you. Follow the same procedure you used to get the crosstabulation

between d5_sex and a1_abany. Remember to get the column percents. Then click on the “Statistics”

button in the upper right of the dialog box. Check the box for “Chi-Square” and then click on “Continue”

and then on “OK.”

Now you will see another output box below the crosstabulation called “Chi-Square Tests.” We want the test

that is called “Pearson Chi-Square” in the first row of the box. Ignore all the other rows in this box. [2] You

should see three values to the right of “Pearson Chi-Square.”

● The value of Chi Square is XXXXXXXXXXYour instructor may or may not want to go into the computation

of the Chi Square value but we’re not going to cover the computation in this exercise.

● The degrees of freedom (df) is XXXXXXXXXXDegrees of freedom is number of values that are free to vary. In

a table with two columns and two rows only one of the cell frequencies is free to vary assuming the

marginal frequencies are fixed. The marginal frequencies are the values in the margins of the

table. There are XXXXXXXXXXmales and XXXXXXXXXXfemales in this table and there are XXXXXXXXXXthat think abortion

should be legal for any reason and XXXXXXXXXXwho think abortion should not be legal for any reason. Try

filling in any one of the cell frequencies in the table. The other three cell frequencies are then fixed

assuming we keep the marginal frequencies the same so there is one degree of freedom.

● The two-tailed significance value is XXXXXXXXXX. [3] This tells us that there is a probability of XXXXXXXXXXthat we

would be wrong if we rejected the null hypothesis. In other words, we would be wrong XXXXXXXXXXout of

1,000 times. With odds like that, of course, we’re not going to reject the null hypothesis. A

common rule is to reject the null hypothesis if the significance value is less than XXXXXXXXXXor less than

five out of one hundred. Since XXXXXXXXXXis not smaller than .05, we don’t reject the null hypothesis.

Since we can’t reject the null hypothesis, we don’t have any support for our research hypothesis.

Part IV – Now it’s Your Turn

Choose any two of the tables from the following list and compare men and women using crosstabulation

and Chi Square:

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn3

● satisfaction with current financial situation (f4_satfin),

● opinion about gun control (g1_gunlaw),

● gun ownership (g2_owngun),

● voting (p6_pres12), and

● religiosity (r8_reliten).

Make sure that you put the independent variable in the column and the dependent variable in the row. Be

sure to ask for the correct percents and Chi Square. What are the research hypothesis and the null

hypothesis? Do you reject the null hypothesis? How do you know? What does that tell you about the

research hypothesis?

Part V – Expected Values

We said we weren’t going to talk about how you compute Chi Square but we do have to introduce the idea

of expected values. The computation of Chi Square is based on comparing the observed cell frequencies

(i.e., the cell frequencies that you see in the table that SPSS gives you) and the cell frequencies that you

would expect by chance assuming the null hypothesis was true. SPSS will also compute these expected

frequencies for you. Rerun the crosstabulation for d5_sex and a1_abany remembering to ask for the

column percents and Chi Square. But this time when you click on the “Cells” button to ask for the column

percents look in the upper left of the dialog box where it says “Counts.” “Observed” is selected as the

default. These are the observed cell frequencies. Click on the “Expected” box to get the expected cell

frequencies.

Now you will see both the observed and the expected cell frequencies in your output table. Notice that they

aren’t very different. The closer they are to each other, the smaller Chi Square will be. The more different

they are, the larger Chi Square will be. The larger Chi Square is, the more likely you are to be able to reject

the null hypothesis.

Chi Square assumes that all the expected cell frequencies are greater than five. We can see from the table

that this is the case for this table. But we don’t have to get the expected frequencies to see this. Look

back at the “Chi-Square Tests” table in your output. Look at footnote a. It tells you that the smallest

expected cell frequency is XXXXXXXXXXSo clearly all four expected cell frequencies are at least five. If it’s just

a little bit below five, that’s no problem. But if it gets down around three you have a problem. What you’ll

have to do is to combine rows or columns that have small marginal frequencies.

For example, run the crosstabulation of d5_sex and d9_sibs which is the number of brothers and sisters

that the respondent has. [4] The minimum expected frequency is so small that it rounds to XXXXXXXXXXThat’s

because there are only a few respondents with more than XXXXXXXXXXsiblings. You will need to recode the number

of siblings into fewer categories making sure that you don’t have any categories with a really small number

of cases.

Part VI – Now it’s Your Turn Again

Look back at the two tables you ran in Part III and see if any of your expected frequencies were less than

five. What does that tell you?

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftn4

[1] The null hypothesis is often called the hypothesis of no difference. We’re saying that there is no

relationship between these two variables. In other words, there’s nothing there.

[2] Unfortunately there is no way to tell SPSS to just give us the “Pearson Chi-Square.”

[3] What do we mean by two-tailed? We’re not predicting the direction of the relationship. We’re not

predicting that men are more likely to think abortion should be legal or that women are more likely. So it’s a

two-tailed test.

[4] Number of siblings is a ratio level variable. You can use Chi Square with ratio level variables but usually

there are better tests. We’re just using this as an example.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref1

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref2

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref3

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/507#_ftnref4

STAT13S: Exercise Using SPSS to Explore Correlation

Author: Ed Nelson

Department of Sociology M/S SS97

California State University, Fresno

Fresno, CA XXXXXXXXXX

Email: XXXXXXXXXX

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav

which is a subset of the XXXXXXXXXXGeneral Social Survey. Some of the variables in the GSS have been recoded

to make them easier to use and some new variables have been created. The data have been weighted

according to the instructions from the National Opinion Research Center. This exercise uses CORRELATE

and COMPARE MEANS in SPSS to explore correlation. A good reference on using SPSS is SPSS for

Windows Version XXXXXXXXXXA Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth

Nelson. The online version of the book is on the Social Science Research and Instructional Council's

Website . You have permission to use this exercise and to revise it to fit your needs. Please send a copy of

any revision to the author. Included with this exercise (as separate files) are more detailed notes to the

instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output

for the exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

● Data subset (.sav format)

● Extended notes for instructors (MS Word; .docx format)

● Syntax file (.sps format)

● Output file (.spv format)

● This page (MS Word; .docx format)

Goals of Exercise

The goal of this exercise is to introduce measures of correlation. The exercise also gives you practice

using CORRELATE and COMPARE MEANS in SPSS.

Part I – Scatterplots

We’re going to use the General Social Survey (GSS) for this exercise. The GSS is a national probability

sample of adults in the United States conducted by the National Opinion Research Center (NORC). The

GSS started in XXXXXXXXXXand has been an annual or biannual survey ever since. For this exercise we’re going

to use a subset of the XXXXXXXXXXGSS. Your instructor will tell you how to access this data set which is called

gss14_subset_for_classes_STATISTICS.sav.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/582

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/gss14_subset_for_classes_STATISTICS.sav

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/Extended_Notes_for_Instructors_for_STAT13S.docx

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Syntax_for_STAT13S.sps

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/SPSS_Output_for_STAT13S.spv

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/files/STAT13S-3.docx

In a previous exercise (STAT11S) we considered different measures of association that can be used to

determine the strength of the relationship between two variables that have nominal or ordinal level

measurement (see STAT1S). In this exercise we’re going to look at two different measures that are

appropriate for interval and ratio level variables. The terminology also changes in the sense that we’ll refer

to these measures as correlations rather than measures of association.

Before we look at these measures let’s talk about a type of graph that is used to display the relationship

between two variables called a scatterplot. SPSS refers to it as a Scatter/Dot chart. Click on GRAPH in the

menu bar at the top of the SPSS screen. Click on “Chart Builder” in the dropdown menu. A dialog box will

open up that will ask you to define the level of measurement for each variable and to provide labels for the

values. Click on “OK” since that has been done for you. In the bottom half of the dialog box the “Gallery”

tab should be selected by default. On the left you can choose the type of graph you want to build. Look

down the list and click on “Scatter/Dot.” There are eight different scatterplots that SPSS can create. If you

point your mouse at each of them you will see a label for the scatterplots. The one on the upper left is

called a “Simple Scatter.” Click and drag the icon up to the large box in the upper right of the dialog box.

Now all you have to do is to click and drag the variables you want to the X-Axis and Y-Axis. If you want to

treat one of these variables as independent, then put that variable on the X-Axis and the dependent

variable on the Y-Axis. So all our scatterplots will look the same let’s put d22_maeduc on the X-Axis and

d24_paeduc on the Y-Axis. Click “OK” and SPSS will display your graph.

Now let’s look for the general pattern to our scatterplot. You see more cases in the upper right and lower

left of the plot and fewer cases in the upper left and lower right. In general, as one of the variables

increases, the other variable tends to increase as well. Moreover, you can imagine drawing a straight line

that represents this relationship. The line would start in the lower left and continue towards the upper right

of the plot. That’s what we call a positive linear relationship. [1] But how strong is the relationship and where

exactly would you draw the straight line? The Pearson Correlation Coefficient will tell us the strength of the

linear relationship and linear regression will show us the straight line that best fits the data points. We’ll talk

about the Pearson Correlation Coefficient in part 3 of this exercise and linear regression in exercise

STAT14S.

Part II – Now it’s Your Turn

Use GRAPH in SPSS to create the scatter plot for the years of school completed by the respondent

(d4_educ) and the spouse’s years of school completed (d29_speduc). So all our plots will look the same,

put d29_speduc on the X-Axis and d4_educ on the Y-Axis. Look at your scatterplot and decide if the

scatterplot has a pattern to it. What is that pattern? Do you think it is a linear relationship? Is it a positive

linear or a negative linear relationship?

Part III - Pearson Correlation Coefficient

The Pearson Correlation Coefficient (r) is a numerical value that tells us how strongly related two variables

are. It varies between XXXXXXXXXXand XXXXXXXXXXThe sign indicates the direction of the relationship. A positive value means

that as one variable increases, the other variable also increases while a negative value means that as one

variable increases, the other variable decreases. The closer the value is to 1, the stronger the linear

relationship and the closer it is to 0, the weaker the linear relationship.

The usual way to interpret the Pearson Coefficient is to square its value. In other words, if r equals .5, then

we square XXXXXXXXXXwhich gives us XXXXXXXXXXThis is often called the Coefficient of Determination. This means that one

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/510#_ftn1

of the variables explains 25% of the variation of the other variable. Since the Pearson Correlation is a

symmetric measure in the sense that neither variable is designated as independent or dependent we could

say that 25% of the variation in the first variable is explained by the second variable or reverse this and say

that 25% of the variation in the second variable is explained by the first variable. It’s important not to read

causality into this statement. We’re not saying that one variable causes the other variable. We’re just

saying that 25% of the variation in one of the variables can be accounted for by the other variable.

The Pearson Correlation Coefficient assumes that the relationship between the two variables is linear. This

means that the relationship can be represented by a straight line. In geometric terms, this means that the

slope of the line is the same for every point on that line. Here are some examples of a positive and a

negative linear relationship and an example of the lack of any relationship.

Pearson r would be positive and close to 1 in the left-hand example, negative and close to XXXXXXXXXXin the middle

example, and closer to 0 in the right-hand example. You can search for “free images of a positive linear

relationship” to see more examples of linear relationships.

But what if the relationship is not linear? Search for “free images of a curvilinear relationship” and you’ll see

examples that look like this.

Here the relationship can’t be represented by a straight line. We would need a line with a bend in it to

capture this relationship. While there clearly is a relationship between these two variables, Pearson r would

be closer to XXXXXXXXXXPearson r does not measure the strength of a curvilinear relationship; it only measures the

strength of linear relationships.

Another way to think of correlation is to say that the Pearson Correlation Coefficient measures the fit of the

line to the data points. If r was equal to +1, then all the data points would fit on the line that has a positive

slope (i.e., starts in the lower left and ends in the upper right). If r was equal to -1, then all the data points

would fit on the line that has a negative slope (i.e., starts in the upper left and ends in the lower right). (See

the diagram above.)

Let’s get the Pearson Coefficient for the two variables in our scatterplot in Part XXXXXXXXXXSee Chapter 7,

Correlation in the online SPSS book mentioned on page XXXXXXXXXXClick on Analyze in the menu bar and then click

on CORRELATE. In the dropdown box, click on “Bivariate.” Bivariate just means that you want to compute

a correlation for two variables – d22_maeduc and d24_paeduc. Move these two variables into the

“Variable(s)” box. Make sure that the box for the Pearson Correlation Coefficient is checked which it should

be since this is the default. Notice that the circle for “Two-tailed” is filled in for “Test of Significance.” A

two-tailed significance test is used when you don’t make any prediction as to whether the relationship is

positive or negative. In our case, we would expect that the relationship would be greater than zero (i.e.,

positive) so we would want to use a one-tailed test. Click on the circle for one-tailed to change the

selection. Notice also that “Flag significant correlations” is checked. That means that SPSS will tell you

when a relationship is statistically significant. Now click “OK” and SPSS will display your correlation

coefficient.

You should see four correlations. The correlations in the upper left and lower right will be 1 since the

correlation of any variable with itself will always be XXXXXXXXXXThe correlation in the upper right and lower left will

both be XXXXXXXXXXThat’s because the correlation of variable X with variable Y is the same as the correlation of

variable Y with variable X. Pearson r is a symmetric measure (see STAT11S) meaning that we don’t

designate one of the variables as the dependent variable and the other as the independent variable. Notice

that the Pearson r is statistically significant using a one-tailed test at the XXXXXXXXXXlevel of significance. A Pearson

r of XXXXXXXXXXis really pretty large. You don’t see r’s that big very often. That’s telling us that the linear

regression line that we’re going to talk about in STAT14S fits the data points reasonably well.

Part IV – Now it’s Your Turn Again

Use CORRELATE in SPSS to get the Pearson Correlation Coefficient for the years of school completed by

the respondent (d4_educ) and the spouse’s years of school completed (d29_speduc). What does this

Pearson Correlation Coefficient tell you about the relationship between these two variables?

Part V – Correlation Matrices

What if you wanted to see the values of r for a set of variables? Let’s think of the four variables in Parts 1

through 4 as a set. That means that we want to see the values for r for each pair of variables. This time

move all four of the variables into the “Variable(s)” box (i.e., d4_educ, d22_maeduc, d24_paeduc, and

d29_speduc) and click on “OK.” That would mean we would calculate six coefficients. (Make sure you can

list all six.)

What did we learn from these correlations? First, the correlation of any variable with itself is XXXXXXXXXXSecond, the

correlations above the 1’s are the same as the correlations below the 1’s. They’re just the mirror image of

each other. That’s because r is a symmetric measure. Third, all the correlations are fairly large. Fourth, the

largest correlations are between father’s and mother’s education and between the respondent’s education

and the spouse’s education.

Part VI – The Correlation Ratio or Eta-Squared

The Pearson Correlation Coefficient assumes that both variables are interval or ratio variables (see

STAT1S). But what if one of the variables was nominal or ordinal and the other variable was interval or

ratio? This leads us back to one-way analysis of variance which we discussed in exercise STAT8S. Click

on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.” (See Chapter 6,

one-way analysis of variance in the online SPSS book mentioned on page XXXXXXXXXXSelect the variable

tv1_tvhours and move it to the “Dependent List” box. This is the variable for which you are going to

compute means. Then select the variable d3_degree and move it to the “Independent List” box. Notice that

we’re using our independent variable to predict our dependent variable. Now click on “Options” in the

upper-right corner and then check the “Anova table and eta” box. Finally click on “Continue” and then on

“OK.”

The F test in the one-way analysis of variance tells us to reject the null hypothesis that all the population

means are equal. So we know that at least one pair of population means are not equal. But that doesn’t tell

us how strongly related these two variables are. The SPSS output tells us that eta is equal to XXXXXXXXXXand

eta-squared is equal to XXXXXXXXXXThis tells us that 5.1% of the variation in the dependent variable, number of

hours the respondent watches television, can be explained or accounted for by the independent variable,

highest education degree. This doesn’t seem like much but it’s not an atypical outcome for many research

findings.

Part VII – Your Turn

In Exercise STAT8S you computed the mean number of hours that respondents watched television

(tv1_tvhours) for each of the nine regions of the country (d25_region). Then you determined if these

differences were statistically significant by carrying out a one-way analysis of variance. Repeat the

one-way analysis of variance but this time focus on eta-squared. What percent of the variation in television

viewing can be explained by the region of the country in which the respondent lived?

[1] This assumes that the variables are coded low to high (or high to low) on both the X-Axis and the Y-Axis.

https://web.archive.org/web/ XXXXXXXXXX/http://ssric.org/node/510#_ftnref1

Answered 1 days AfterMay 10, 2021

- Chi-Square Goodness of Fit and Independence (50 points) For this homework assignment, you will use hand calculations and JMP software to work through the problems in order to develop an understanding...Oct 19, 2021
- glossary PGA Golf Tournament: RoundDay 1Thursday 2Friday 3Saturday 4Sunday Golf round regulation: Shots HoleLength (yards)TeeFairwayGreenTotal# Holes Par 3100 - 30010234 Par 4300 -...Oct 19, 2021
- PS298Lab Assignment 1Fall 1996 PS295 OC 1 Due: Thursday October 28 – Week 7 Day 1 at 11:59pm [Total: 50 MARKS] This assignment is composed of four sections. In Part A you will apply your knowledge...Oct 18, 2021
- → States.jmp contains the following variables (variables names are listed in the first row).</o:p> stateState abbreviation</o:p> agrEmployment in agriculture (percent), 1990</o:p>...Oct 17, 2021
- PSY3901 Week 7 Assignment Dataset.sav PSY3901 Week 7 Assignment Instruction Manual.pdf PSY3901: Experimental Psychology 1 Week 7 Assignment 1 Week 7 Assignment: Two-Factor ANOVA and Chi-Square Test...SolvedOct 17, 2021

- The first post of at least 400 words will be due thursday , October 21, 2021, during the Public Policy week. This should be an overview of the issue and a substantive argument either pro or con based...Oct 20, 2021
- // Preconditions: string s, int key</o:p> // Postconditions : encrypted s</o:p> void encryptROT(string& s, int key)</o:p> {</o:p> for (int I = 0; I </o:p> {</o:p>...Oct 20, 2021
- The Future Of Clinical Trials: How AI, Big Tech, & Covid-19 Could Make Drug Development Cheaper, Faster, & More Effective 2021 COVER OPTION 2 The Future Of Clinical Trials: How AI, Big Tech, &...Oct 20, 2021
- Clinical Scenario Analysis · Due 25 Oct by 2000 · Points 100 · Submitting an external tool Students will be required to view the clinical scenario provided and apply the safety and quality matrix (SAC...Oct 20, 2021
- This is to be a reflective commentary of 1000 words (maximum) covering learning from completing the applied engagement portfolio. This assessment is essential to your learning as it gives you the...Oct 20, 2021

Copy and Paste Your Assignment Here

Copyright © 2021. All rights reserved.