Introduction to R

Introduction to R

R is a software package for doing statistics. It contains a full-fledged programming language with syntax similar to that of C and Java. It is an open-source product for which thousands of add-on libraries are freely available.

R is an implementation of the S programming language developed at AT&T Bell Labs. Past implementations of S have included the (mostly dead) commercial product S-Plus, but these documents will focus on R.

Starting R

To start R, simply double-click the R or Rstudio icon. The system takes a minute to start up.

Quitting

It's sad, but someday you may want to quit R. You can quit R in at least two ways (four if you count smashing your computer with a length of heavy pipe). At the command line, the q() command will end your session. (Type it in with the parentheses.) You can also click the Windows "X" in the top-right corner, or the equivalent red button in the top-left for Mac. In either case the system will ask you whether you want to save the changes you've made in this session. Normally, of course, you will want to.

Using R as a calculator

The R console window lets you wsend commands to R and get answers back. If you ever lose the console, you can find it again under "Windows." Now you can use R as a calculator. Just type in some commands at the ">" prompt. For example,
> 3 + 4
[1] 7
The "[1]" marks the first entry in the output (of course, in this case there is only one entry), and the 7 is the result. At the command line, use "*" for multiplication, "/" for division, and "^" for "raised to the power of." So, for example, twice the cube root of ten is

> 2 * 10^(1/3)
[1] 4.308869
Note: R uses double-precision arithmetic, but by default it displays only 7 digits. This is easy to change. If you wanted extra digits for just this one operation, you might type

> print (2 * 10^(1/3), digits=17)
[1] 4.3088693800637676
If you wanted to display 17 digits every time, you can adjust the settings in "options." More on that later.

To see the contents of something in R, just type its name. For example, there's a built-in item named "pi" that contains the value pi.

> pi
[1] 3.141593

Functions

Here's another example; the "sqrt" function finds the square root of a number. If you type sqrt you get this:
function (x)  .Primitive("sqrt")
This isn't very informative; it just says that when you call sqrt, R calls some compiled code. Remember, typing the name of the function just displays its contents; it doesn't actually calculate the square root of anything. R has over two thousand built-in functions, many of which will be useful at the command line.

To invoke a function, type its name and put any arguments (things you're trying to pass to the function) inside parentheses. For example,

> sqrt(10)
[1] 3.162278
gives the square root of ten.

All of the common math functions you could want (and lots more you'll never use) are inside R: log (which is natural log), exp, and log10 (which is log base 10); sin, cos, tan; their inverses (asin, acos, atan); the hyperbolic varieties sinh, cosh, tanh, asinh, acosh, atanh; %/% (integer divide) and %% (modulo); and many, many more. (If you go to Help | Manuals | R Reference Manual you can start to look at the list)

Getting Help

R has an extensive on-line help system. To get help on a particular function, type help() with the name of the function inside the parentheses. Don't use quotes around the name. So you might type help(logb), for example. The question mark provides a quick shortcut: ?logb will produce the same result as help(logb). Naturally, ?help gives help on the help function. logb, by the way, computes the logarithm to a specified base.

The help system includes lots of good stuff. In particular the HTML help gives a good interface to some of the other help pages.

The Up-Arrow Key Brings Back Previous Commands

The up-arrow key will let you run through all the commands you've issued in the current session and even earlier. (These are stored on disc in a file named .Rhistory -- note the dot at the front.) If you find a command you want to re-run, you can just hit enter; you can also edit it first, using the left and right-arrow keys to move within the line.

You can also pull up the command history with the history() command. This lets you see what you did, and you can copy-and-paste commands to the console if you like. By default, history() only shows you the 25 most recent commands; I usually want more, so I use a command like history(999) or even history(Inf), where Inf is how R represents positive infinity.

More on functions

Some functions take no arguments (like the quit function, q(), above). Others can take more than one argument. Take a look at the logb() function:
> logb
function (x, base = exp(1)) 
if (missing(base)) log(x) else log(x, base)


As with sqrt, the log is computed in some compiled code (that's the bytecode part). R does a lot of its work internally by calling C and Fortran code for speed.

The logb function takes two arguments. The bas argument has the default value exp(1) (that is, the number e), so it needn't be specified. However, the x argument has no default. If it's not supplied, an error is generated.

> logb ()
Error in log(x) : Non-numeric argument to mathematical function
This isn't the place to get into the details of writing functions (that's here), but here are two more points. First, the missing() function can be used inside the function to decide whether an argument has been passed. Second, the value in the final line of the function serves as the "return value" from the function. In the case of logb, the return value is either log(x) or log(x, base) -- that is, the logb function calls one of those other functions, gets a value back, and then returns it, because the value of the last expression in a function is what gets returned. There's also an explicit return() function that could have been used here.

Variables

Data items in R are called objects or variables. Some of these are built in to the system, but most will be things you create and manage yourself. You save something (a computation, some data, anything) by assigning it. There are at least three ways to enter an assignment. The first is <-, the less-than sign followed by the hyphen. This is the preferred method, at least if I'm doing the assigning. The second is a single equals sign, which I don't like because it looks like a function argument (see the example of logb()above with base = exp(1)). The third way uses the assign() function, which we'll discuss later. So, for example,
> a <- sqrt(10)
creates an item named "a" whose value is about 3.162. Note that nothing appears on the screen. To see the contents of a, just type its name:
> a
[1] 3.162278
"a" is just another item to R. There's no distinction between built-in items like "pi" and ones you create. Since a is numeric, you can do math on it:
> a + 1
[1] 4.162278
Now this variable is in your "workspace" forever, until you remove it. The ls() command lists all your items:
> ls()
[1] "a"
If you forget to save a thing, it's stored (briefly) in an item called .Last.value; I discuss that here.

By the way, copies in R are "deep" copies in that they are entirely distinct from the original. If I make a copy of a, and call it, say, b, and then I change a, b is unaffected.

Types of variables

Several types of variables are supported in R, but the three we'll use most often are numeric, character, and logical. We've already seen examples of numeric variables like pi and our a. The ls() command above produced character output: we know that by the quotation marks. (Actually, it produced a vector of character strings of length one. We'll be talking about vectors here.)

Finally, a logical (Boolean) variable is one that arises from a comparison of two things. For example: is sqrt(10) bigger than 3?

> sqrt(10) > 3
[1] TRUE
The only values a logical can take are TRUE or T and FALSE or F. Furthermore, TRUE and FALSE are reserved words in R; you can't attempt to name a variable TRUE. (You may name a variable T, but don't.)

The comparison operators in R are those of C and Java: < and <= for less than and less than or equal to; > and >=; == for exactly equal; and != for not equal. Beware of testing for equality when you're using non-integers. You don't always get what you think you should. For example (and notice that # is the comment symbol),

> (1/49 * 49)
[1] 1              # Seems reasonable
> (1/49 * 49) == 1 # So is this really equal to 1?
[1] FALSE          # R says no
> (1/49 * 49) - 1  # How different is it from 1?
[1] -1.110223e-16  # Not very: about 10-to-the-minus-16
The tiny difference is inevitable in a digital computer: it's rounding error, induced by the fact that 1/49 does not have an exact binary representation. The difference is not worrisome, but the computer really does think that (1/49 * 49) is different from 1. So if you need to check for equality of two floating-point numbers, you'll need to do something like this:
> eps <- 1e-10                      # Set maximum difference you'll tolerate
> abs ((1/49 * 49) - 1) -1) <= eps  # Check for abs. value of difference
                                    # smaller than that
[1] TRUE
or, similarly,
all.equal ((1/49)*49, 1) 
This is a good habit to get into. Never check two floating-point numbers for exact equality. For more on data types in R, go here.

Deleting variables

You can delete an item with the rm() command. This acts on objects directly, or, with the list= argument, on the names of objects.
> sin.45 <- sin(45)      # create another item
> rm (a, list="sin.45")  # Remove directly and by name
> rm (list = "sin.45")   # Once removed, it's gone forever. There's no undelete!
Warning message:
In rm(list = "sin.45") : object 'sin.45' not found

Legal Names and Reserved Words

A legal R name starts with a letter or a period, and includes letters, numbers, periods or underscores. They cannot contain special characters like parentheses, spaces, ampersands, and so on. (If you really need to give something an illegal name, see get and assign.) Names can be of any length. Obviously a descriptive name makes it easier to remember a variable's reason for existing.

There are almost no reserved words in R. The reserved words are TRUE and FALSE, some of the elements of the language (like if, else, break, in), and a few others. In general you are given the opportunity to give things unwise names like integer. Don't do that. If you write a function whose name is the same as a built-in one, you'll be unable to use the original one until you remove yours (see masking).

Special Characters

There are a couple of special characters in R. For output, the new-line character is represented by "\n" and the tab character by "\t". The backslash character is represented by "\\" -- this comes up a lot when dealing with file names. In general at the R command line in Windows, you specify a file name with two backslashes or one forward slash. Of course you use single backslashes inside Windows dialog boxes. On Linux and Mac we use one forward slash in the usual way.

Return to R docs