Input and Output in R

Input-Output in S-Plus

There are a number of ways to get data in and out of R. This document discusses most of them and when they might prove useful.

`save()`, load()

-- binary files These are the original command-line techniques for moving R data from one machine to another. save() creates an binary file load () reads in files in that format.

This is the recommnded way to move R data from one machine to another. Of course you could save it as, say, Excel on one machine and read it in on the other (see below for how). The R dump file, guarantees the abillity to move data between PC's and Mac or Linux machines.

`read.table()`, `write.table()` -- spreadsheet-stype data

The read.table() function reads in ASCII (or UTF-8) data that has been properly put into a rectangular format. You can specify the character that separates fields (by default, it's "spaces or tabs," which is often not what you want), as well as tell the function whether the first line should be treated as a header, and specify column or row names. The first argument to read.table() is the name of the file, and the special name "clipboard" is permitted in Windows (on Mac, use the name pipe ("pbpaste") ). This, of course, refers to the Windows clipboard. So if you highlight an area of an Excel spreadsheet, then use control-C (or Edit | Copy) to put it onto the Windows clipboard, you can then use a command like

thing <- read.table ("clipboard", sep="\t", header=T)

to create a data.frame named "thing." Notice the sep="\t" indicating that the separator is the tab character when going to or from Excel.

read.table() calls a function named count.fields() to determine the number of fields in each row. The most common cause of a read.table() failure, in my experience, is unequal numbers of fields. Sometimes Excel will add "mystery cells" onto the ends of rows, seemingly just for spite. I don't really know what to do about this -- but see scan() below for one inelegant solution. One more note: the output of read.table() is always a data.frame. If you want a matrix you'll have to turn it into a matrix yourself with as.matrix(). Also, by default character columns are converted into factors. If you want them to stay as characters, use as.is=T.

Getting data to Excel via the clipboard

One task that seems to come up a lot is to move a data frame to Excel quickly. For this job the write.table() function with arguments file="clipboard" or the more capacious "clipboard-128" and sep="\t" is useful. Mac users should specify the output as pipe ("pbcopy", "w"). In fact, I recommend having this function around:

clipp <- function (x, rn=FALSE) write.table (x, "clipboard-128", sep="\t", row.names=rn, col.names = ifelse (rn, TRUE, NA))

Then to move a data.frame to Excel, just clipp() it and paste it: clipp (mydf), alt-tab to move to Excel, and control-V to paste. The rn argument specifies whether you want row names -- if you do, you probably want to set col.names = NA.

`scan()`, `write()`

scan() reads in data in one big long vector. Since it produces a vector, all of the elements in the output from scan() must be of the same type (so if there are any character items, everything will be converted to character). You'll need the what="" argument if there are characters since scan(), by default, expects numeric data. This gives us one way to handle an "uneven" data set with differing numbers of fields on each row. Suppose the data is in a file called "in.file," and suppose you it's all character data. Then consider this code:

#
# Count.fields knows the sep= argument 
#
fld.cnt <- count.fields (in.file) # get number of fields
#
# Set up output matrix, with number of columns given by the largest
# value returned by count.fields. Then read in the file.
#
out.mat <- matrix ("", length(fld.cnt), max (fld.cnt)
big.vec <- scan (in.file, what = "")
start <- 1
for (i in 1:length(fld.cnt)) {
    out.mat[i,] <- big.vec[start:(start + fld.cnt[i] - 1)]
    start <- start + fld.cnt[i]
}

Notes on scan: Scan() takes a number of arguments, some quite useful. Also scan() (but not count.fields()) will read the Windows clipboard if you specify that the file name is "clipboard." write() is the counterpart of scan() but it's not needed much.

`source()`

source() reads in a set of commands. It's commonly used to read in a function written in a text file, but any set of commands can be read in and executed. Actually you have to be careful with some functions, because the line breaks can be moved around. So even with functions I recommend using data.dump()/data.restore() to move them from machine to machine. (Example: the lines

y <- x +
4

assign to y the value (x+4). The lines

y <- x
+ 4

assign to y the value x, then execute the legal line "+4", which doesn't have any effect.) An alternative to source()-ing a file full of commands is to use a script file. Actually they're almost exactly the same thing; the difference is that you run a script file with the F10 button, whereas you run the commands in a regular text file by using source().

readline()

This function is used only to read data from the user's command line. For example, the code

cat ("Are you sure? (enter y or n) ")
response <- readline ()
if (response == "y" || response == "Y")
{
...

will display Are you sure? (enter y or n) on the terminal and then wait for the user to respond (that is, type something and hit ENTER). If nothing is entered, "response" is an empty string; otherwise, any characters that the user typed get put into "response."

Return to R docs

Input-Output in S-Plus

save(), load()

read.table(), write.table() -- spreadsheet-stype data