Information Technology Reference
In-Depth Information
If your CSV file does not contain a header line, set the
header
option to
FALSE
:
>
tbl <- read.csv("
filename
", header=FALSE)
Discussion
The CSV file format is popular because many programs can import and export data in
that format. Such programs include R, Excel, other spreadsheet programs, many da-
tabase managers, and most statistical packages. CSV is a flat file of tabular data, in
which each line in the file is a row of data, and each row contains data items separated
by commas. Here is a very simple CSV file with three rows and three columns (the first
line is a
header line
that contains the column names, also separated by commas):
label,lbound,ubound
low,0,0.674
mid,0.674,1.64
high,1.64,2.33
The
read.csv
function reads the data and creates a data frame, which is the usual R
representation for tabular data. The function assumes that your file has a header line
unless told otherwise:
>
tbl <- read.csv("table-data.csv")
>
tbl
label lbound ubound
1 low 0.000 0.674
2 mid 0.674 1.640
3 high 1.640 2.330
Observe that
read.csv
took the column names from the header line for the data frame.
If the file did not contain a header, we would specify
header=FALSE
, and R would syn-
thesize column names for us (
V1
,
V2
, and
V3
in this case):
>
tbl <- read.csv("table-data-with-no-header.csv", header=FALSE)
>
tbl
V1 V2 V3
1 low 0.000 0.674
2 mid 0.674 1.640
3 high 1.640 2.330
A good feature of
read.csv
is that is automatically interprets nonnumeric data as a
factor (categorical variable), which is often what you want since this is, after all, a
statistical package, not Perl. The
label
variable in the
tbl
data frame just shown is
actually a factor, not a character variable. You see that by inspecting the structure of
tbl
:
>
str(tbl)
'data.frame': 3 obs. of 3 variables:
$ label : Factor w/ 3 levels "high","low","mid": 2 3 1
$ lbound: num 0 0.674 1.64
$ ubound: num 0.674 1.64 2.33
Sometimes, you really want your data interpreted as strings, not as factors. In that case,
set the
as.is
parameter to
TRUE
; this indicates that R should not interpret nonnumeric
data as a factor: