Biology Reference
In-Depth Information
Setting the header argument to TRUE tells read.table that the first line of the
file lizards.txt contains the variable names. Each observation must be writ-
ten in a single line, and the values assumed by the variables for that observation
correspond to the fields (separated by spaces or tabulations) present in that line.
In both cases, the data is stored in a data frame called lizards , whose structure
can be examined with the str function.
> str(lizards)
'data.frame': 409 obs. of 3 variables:
$ Species : Factor w/ 2 levels "Sagrei","Distichus":
1 ...
$ Diameter: Factor w/ 2 levels "narrow","wide": 1 ...
$ Height : Factor w/ 2 levels "high","low": 2 2 ...
Like most programming languages, R defines a large set of classes of objects, which
represent and provide an interface to different types of variables. Some of these
classes correspond to various kinds of variables used in statistical modeling:
Logical : indicator variables, e.g., either TRUE or FALSE
Integer : natural numbers, e.g., 1
,
2
,...,
n
N
, π , 2
Numeric : real numbers, such as 1
.
2
Character : character strings, such as "a" , "b" , "c"
Factor : categorical variables, defined over a finite set of levels identified by char-
acter strings
Ordered : ordered categorical variables, similar to factors but with an explicit
ordering of the levels, e.g., "LOW" < "AVERAGE" < "HIGH" .
Other classes correspond to more complex data types, such as multidimensional or
heterogeneous data:
List : a collection of arbitrary objects, often belonging to different classes
Vector : a mathematical vector of elements belonging to the same class (i.e.,
all integers, all factors with the same levels, etc.) with an arbitrary number of
dimensions
Matrix : a matrix (i.e., a 2-dimensional vector) of elements belonging to the same
class
Data frame : a list of objects with the same length but possibly different classes.
It is usually displayed and manipulated in the same way as a matrix.
As we can see from the output of str , read.table saves the data read from
lizards.txt in a data frame to allow each variable to be stored as an object
of the appropriate class. The labels of the possible values of each variable, which
are character strings in lizards.txt , are automatically used as levels and the
variables converted to factors.
Search WWH ::




Custom Search