Information Technology Reference
In-Depth Information
If your CSV file does not contain a header line, set the header option to FALSE :
> tbl <- read.csv(" filename ", header=FALSE)
Discussion
The CSV file format is popular because many programs can import and export data in
that format. Such programs include R, Excel, other spreadsheet programs, many da-
tabase managers, and most statistical packages. CSV is a flat file of tabular data, in
which each line in the file is a row of data, and each row contains data items separated
by commas. Here is a very simple CSV file with three rows and three columns (the first
line is a header line that contains the column names, also separated by commas):
label,lbound,ubound
low,0,0.674
mid,0.674,1.64
high,1.64,2.33
The read.csv function reads the data and creates a data frame, which is the usual R
representation for tabular data. The function assumes that your file has a header line
unless told otherwise:
> tbl <- read.csv("table-data.csv")
> tbl
label lbound ubound
1 low 0.000 0.674
2 mid 0.674 1.640
3 high 1.640 2.330
Observe that read.csv took the column names from the header line for the data frame.
If the file did not contain a header, we would specify header=FALSE , and R would syn-
thesize column names for us ( V1 , V2 , and V3 in this case):
> tbl <- read.csv("table-data-with-no-header.csv", header=FALSE)
> tbl
V1 V2 V3
1 low 0.000 0.674
2 mid 0.674 1.640
3 high 1.640 2.330
A good feature of read.csv is that is automatically interprets nonnumeric data as a
factor (categorical variable), which is often what you want since this is, after all, a
statistical package, not Perl. The label variable in the tbl data frame just shown is
actually a factor, not a character variable. You see that by inspecting the structure of tbl :
> str(tbl)
'data.frame': 3 obs. of 3 variables:
$ label : Factor w/ 3 levels "high","low","mid": 2 3 1
$ lbound: num 0 0.674 1.64
$ ubound: num 0.674 1.64 2.33
Sometimes, you really want your data interpreted as strings, not as factors. In that case,
set the as.is parameter to TRUE ; this indicates that R should not interpret nonnumeric
data as a factor:
 
Search WWH ::




Custom Search