Information Technology Reference
In-Depth Information
This format is more free-form than the fixed-width format because fields needn't be
aligned by position. Here is a datafile in tabular format, called statisticians.txt , using
a space character between fields:
Fisher R.A. 1890 1962
Pearson Karl 1857 1936
Cox Gertrude 1900 1978
Yates Frank 1902 1994
Smith Kirstine 1878 1939
The read.table function is built to read this file. By default, it assumes the data fields
are separated by white space (blanks or tabs):
> dfrm <- read.table("statisticians.txt")
> print(dfrm)
V1 V2 V3 V4
1 Fisher R.A. 1890 1962
2 Pearson Karl 1857 1936
3 Cox Gertrude 1900 1978
4 Yates Frank 1902 1994
5 Smith Kirstine 1878 1939
If your file uses a separator other than white space, specify it using the sep parameter.
For example, if our file used a colon ( : ) as the field separator, we would read it this way:
> dfrm <- read.table("statisticians.txt", sep=":")
You can't tell from the printed output, but read.table interpreted the first and last
names as factors, not strings. We see that by checking the class of the resulting column:
> class(dfrm$V1)
[1] "factor"
To prevent read.table from interpreting character strings as factors, set the stringsAs
Factors parameter to FALSE :
> dfrm <- read.table("statisticians.txt", stringsAsFactor=FALSE)
> class(dfrm$V1)
[1] "character"
Now the class of the first column is character, not factor.
If any field contains the string “NA”, then read.table assumes that the value is missing
and converts it to NA. Your datafile might employ a different string to signal missing
values; if it does, use the na.strings parameter. The SAS convention, for example, is
that missing values are signaled by a single period ( . ). We can read such datafiles like
this:
> dfrm <- read.table("filename.txt", na.strings=".")
I am a huge fan of self-describing data : datafiles which describe their own contents. (A
computer scientist would say the file contains its own metadata .) The read.table func-
tion has two features that support this characteristic. First, you can include a header
line at the top of your file that gives names to the columns. The line contains one name
 
Search WWH ::




Custom Search