Input-Output in S-Plus

There are a number of ways to get data in and out of S-Plus. This document discusses most of them and when they might prove useful.

All of these functions are available in R, too, except where noted by the "not in R" logo


data.dump(), data.restore() -- SDD files

These are the original command-line techniques for moving S-Plus data from one machine to another. data.dump() creates an ASCII file (it's not particularly readable, but you can sort of see what's going on in there) and data.restore() reads in files in that format. The makers of S-Plus named this format "SDD" ("S-Plus Data Dump"? They call it "S-Plus Transport File") and you can read and write files of this format from the File | Import Data and File | Export Data menus as well.

This is the recommnded way to move S-Plus data from one machine to another. Of course you could save it as, say, Excel on one machine and read it in on the other (see below for how). The SDD transport file, however, was specifically designed so that data could be moved between PC's and Unix machines. (Technical note: SDD files are ASCII, so when moving to or from Unix you need to beware of the CR/LF problem).

Why can't I just move the S-Plus items themselves from one machine to another, using the Windows Explorer? And what's with all those weird names I see in my _Data directory, like "__28"?

These questions go together, though they're not really I/O questions. For information on these topics, click here.

File | Import Data and File | Export Data

S-Plus supports a number of file formats for both reading and writing. Many are formats used by other math and statistics packages (SAS, SPSS, Gauss); others are databases; and you can even read and write ODBC data for real-time database access. Under the "options" tab of this dialogue, you can specify a subset of rows and/or columns to be read in. Under "filter" you can use an expression to restrict the set of rows to be imported. The expression's syntax is not S-Plus: for example, it uses = instead of == for equality. The help on the filter explains this process well. Export works just like import and supports a number of file formats.

Version 6 of S-Plus seems much stronger, but my experience in the past had been that while import worked fine for moderate-sized data sets, it was more likely to fail than read.table (see below) for large data sets (say, anything over 50,000 or 100,000 rows). When converting from Excel, I'm in the habit of writing the Excel file out as a comma-separated variable (CSV) file; that will often be easier to import into S-Plus than the original Excel file since the latter will be much larger.

Importing a text file whose extension is .DAT

When using import, a file whose extension is .DAT is assumed to be in the format of the Gauss software. However I often name things .DAT (for "data"). You can't persuade S-Plus that a .DAT file is text; it just won't listen. So rename the file before trying to import it, or use read.table() (below).

read.table(), write.table()

The read.table() function reads in ASCII data that has been properly put into a rectangular format. You can specify the character that separates fields (by default, it's "spaces or tabs," which is often not what you want), as well as tell the function whether the first line should be treated as a header, and specify column or row names. The first argument to read.table() is the name of the file, and the special name "clipboard" is permitted. This, of course, refers to the Windows clipboard. So if you highlight an area of an Excel spreadsheet, then use control-C (or Edit | Copy) to put it onto the Windows clipboard, you can then use a command like

thing <- read.table ("clipboard", sep="\t", header=T)

to create a data.frame named "thing." Notice the sep="\t" indicating that the separator is the tab character when going to or from Excel.

read.table() calls a function named count.fields() to determine the number of fields in each row. The most common cause of a read.table() failure, in my experience, is unequal numbers of fields. Sometimes Excel will add "mystery cells" onto the ends of rows, seemingly just for spite. I don't really know what to do about this -- but see scan() below for one inelegant solution. One more note: the output of read.table() is always a data.frame. If you want a matrix you'll have to turn it into a matrix yourself with as.matrix(). Also, by default character columns are converted into factors. If you want them to stay as characters, use as.is=T.

Getting data to Excel via the clipboard

One task that seems to come up a lot is to move a data frame to Excel quickly. For this job the write.table() function with arguments file="clipboard" and sep="\t" is useful. In fact, I recommend having this function around:
clip <- function(x) write.table (x, file="clipboard", sep="\t")
Then to move a data.frame to Excel, just clip it and paste it: clip (mydf), alt-tab to move to Excel, and control-V to paste.

Creating a table for HTML

The html.table() file produces HTML output suitable for displaying tables in web pages. If its first argument is a list of data frames or matrices, then multiple tables are produced. The table that this function produces is a lot easier to edit than the one you get by producing a table in Word and then saving as HTML.

scan(), write()

scan() reads in data in one big long vector. Since it produces a vector, all of the elements in the output from scan() must be of the same type (so if there are any character items, everything will be converted to character). You'll need the what="" argument if there are characters since scan(), by default, expects numeric data. This gives us one way to handle an "uneven" data set with differing numbers of fields on each row. Suppose the data is in a file called "in.file," and suppose you it's all character data. Then consider this code:
#
# Count.fields knows the sep= argument 
#
fld.cnt <- count.fields (in.file) # get number of fields
#
# Set up output matrix, with number of columns given by the largest
# value returned by count.fields. Then read in the file.
#
out.mat <- matrix ("", length(fld.cnt), max (fld.cnt)
big.vec <- scan (in.file, what = "")
start <- 1
for (i in 1:length(fld.cnt)) {
    out.mat[i,] <- big.vec[start:(start + fld.cnt[i] - 1)]
    start <- start + fld.cnt[i]
}
Notes on scan: Scan() takes a number of arguments, some quite useful. Also scan() (but not count.fields()) will read the Windows clipboard if you specify that the file name is "clipboard." write() is the counterpart of scan() but it's not needed much.

source()

source() reads in a set of commands. It's commonly used to read in a function written in a text file, but any set of commands can be read in and executed. Actually you have to be careful with some functions, because the line breaks can be moved around. So even with functions I recommend using data.dump()/data.restore() to move them from machine to machine. (Example: the lines
y <- x +
4
assign to y the value (x+4). The lines
y <- x
+ 4
assign to y the value x, then execute the legal line "+4", which doesn't have any effect.) An alternative to source()-ing a file full of commands is to use a script file. Actually they're almost exactly the same thing; the difference is that you run a script file with the F10 button, whereas you run the commands in a regular text file by using source().

readline()

This function is used only to read data from the user's command line. For example, the code
cat ("Are you sure? (enter y or n) ")
response <- readline ()
if (response == "y" || response == "Y")
{
...
will display Are you sure? (enter y or n) on the terminal and then wait for the user to respond (that is, type something and hit ENTER). If nothing is entered, "response" is an empty string; otherwise, any characters that the user typed get put into "response."

Building your own menus

It's also possible to build your own menus in S-Plus. I have some example code to do this. The basic function is add.menu.item(), but it is quite a complicated process.

Technical note: the CR/LF problem

This problem stems from the fact the end of a line in a text file is marked differently in DOS/Windows than in Unix. In Unix, two characters mark the end of a line: the carriage return (CR) and the line feed (LF). (In the old days, the first of these moved the type head back to the left edge of the paper and the second rolled the paper forward one line.) In Windows there's only one character (I forget which). So if you transfer a text file directly from a Unix machine to a PC, you'll see these weird trailing characters (they'll look like "\r"'s). Conversely, if you transfer a text file from PC to Unix, you sometimes get really long, unterminated lines. Moral: If you're using FTP, be sure to transfer your SDD items as ASCII, not binary.

Return to S-Plus docs