Information Technology Reference
In-Depth Information
10.15 SPSS AND SAS DATA FILE STRUCTURES
10.15.1 THE MULTIVARIATE STRUCTURE USED BY SPSS
One major difference between SPSS and SAS is in the way that they require
the data file to be structured in order to handle a repeated measure (a
within-subjects variable) in the design. As we have seen, SPSS uses the
same data structure for any analysis. This type of data structure, which we
have not needed to label before now, is often referred to as multivariate or
wide form.
In multivariate structure, all of the information for each case is con-
tained in one row, using each column to hold one piece of information. In
a repeated measures design, participants will have multiple scores on the
dependent variable. For example, having five levels of a within-subjects
variable (e.g., pre1, pre2, post1, post2, and post3 ) means that each level
will have its own column in the row; in each column, we will see the
result of the measurement (the value of the dependent variable) obtained
under each condition. The data structure is called multivariate because
the row for each case contains multiple (more than one) instances of the
dependent variable.
10.15.2 THE UNIVARIATE (STACKED) STRUCTURE USED FOR REPEATED
MEASURES BY SAS ENTERPRISE GUIDE
SAS Enterprise Guide uses a structure known variously as univariate, nar-
row, or stacked form. In univariate or stacked column format, each row is
permitted to contain only one score on the dependent variable, and is the
defining feature of univariate format. In a between-subjects design, where
we have only one score per case, multivariate and univariate data struc-
tures look identical. In designs where each participant provides more than
a single score, that is, in designs containing one or more within-subjects
variables, the required data structure to meet stacked format differentiates
itself from multivariate format.
A portion of the stacked data file entered into an Excel spreadsheet
that we will import to SAS Enterprise Guide is presented in Figure 10.18.
Note that the first five lines (below the variable names) represent the
information for the participant identified as subid 1. This is because each
case has five different scores, one for each level of the within-subjects
variable.
Under the univariate requirement that only one score may appear on
any given row, we must use five rows to capture the measurements for that
person. Each row contains information relevant to the score it contains.
Let's look across the row and discuss the columns that are represented.
The first column identifies the particular subject whose data is con-
tained in the row. The identifier variable is named subid in the Excel file.
The second variable, which we have named Time ,representstheparticu-
lar level of the within-subjects variable whose score we are providing. The
variable in the third column is our effort to circumvent the proclivity of
SAS Enterprise Guide to order the levels of the within-subjects variables
Search WWH ::




Custom Search