Vectors and Subscripting

Vectors and Subscripting in R

Vectors

Lots of things in R are vectors. For example, scalars are vectors of length 1. So understanding how R handles vectors is vital to using it properly. Suppose you want to create a vector consisting of the first 10 integers. You might use the seq function:

> seq (1, 10, by = 1)
[1] 1  2  3  4  5  6  7  8  9 10

or, since these are all integers, you can just use the (somehwat less powerful) colon operator.

> x <- 1:10

If you operate on that vector with, say, the sin() function, you get a vector of 10 sines:

> sin(x)
[1]  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243
[6] -0.2794155  0.6569866  0.9893582  0.4121185 -0.5440211

(Of course, the [1] and [6] indicate that the leftmost values are, respectively, the first and sixth in the set of ten.) Here's an example of a logical vector: the question "is sin(x) bigger than 0?"

> sin(x) > 0
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

The answer, of course depends on the value of x. The first three values of x are smaller than pi, so their sines are positive; the next three are between pi and 2 pi, so their sines negative, and so on. Again the logical operator ">" has operated on a vector and returned a vector.

What if I want to count the number of x's for which sin(x) is > 0?

> sum (sin(x) > 0)
[1] 6

The sum() function converts the logical T's and F's to 1's and 0's, respectively; then adding those up amounts to counting the T's.

Subscripting

R has several ways to subscript (that is, extract specific elements from a vector). The most common way is directly, using the square bracket operator:

> x[4]
[1] 4

In this example, the user has said "give me the fourth element," and R has said, "you get a vector whose first (and only) element is 4."

Here's a similar question: "what are the second and fifth elements of x?"

> x[c(2, 5)]
[1] 2 5

Here the c(), of course, constructs the vector (2,5) to be used as the index; then we extract the second and fifth entries of x.

Logical subscripting

We can use a logical vector, of the same length as your data, as an index and R will pull out the elements of the data vector for which the corresponding indices are TRUE. For example, consider a new x vector consisting of 2, 4, 6, 8, and 10. Let's use a logical subscript to extract the second and fifth entries.

> x <- c(2,4,6,8,10)
> x[c(F, T, F, F, T)]
[1]  4 10

This says "give me the second and fifth elements of x, not the first, third or fourth." Which of these x's have sines that are positive?

> sin(x) > 0
[1] T F F T F

What are the x values whose sines are positive?

 x[sin(x) > 0]
[1] 2 8

Here the sin(x) > 0 on the inside of the square brackets produces a logical vector of length 5 with two T's; then those two T's, in the first and fourth position, give us the first and fourth elements of x.

Here's a similar question: "what the are indices of the elements of x for which the sine is > 0?"

>(1:length(x))[sin(x) > 0] 
[1] 1 4

In this example, the (1:length(x)) produces the vector (1,2,3,4,5); then from that vector we extract the first and fourth items (since sin(x) > 0 for the first and fourth elements of x).

Negative subscripting

Another handy thing is a negative subscript; this says "give me everything except these values." For example:

>x[-c(2,5)] # Give me everything except numbers 2 and 5
[1] 2 6 8   # We could also have used x[c(-2, -5)]

You can't mix positive and negative subscripts. A "0" subscript returns nothing, which is helpful in one specific example.

More on vector operations

Just as the sin() function operates on each element in a vector, so do many other R functions. In fact functions that operate on a vector and produce a scalar are fairly unusual. Important examples include sum(), which gives the sum of a vector; mean(); median(); var() and sd(), which give the sample variance and standard deviation; prod(), which gives the product; and length(), which tells you how many elements the vector has.

Most functions operate element-wise. These include the simple arithmetic operators. For example, remember that x is c(2,4,6,8,10):

>x + 2
[1]  4  6  8 10 12
> x^2
[1]   4  16  36  64  100
> 2^x
[1]    4   16   64   256 1024

(Notice the different formatting. This is just R's way of making sure that all the elements in a vector can be represented by character strings of the same length. Don't worry about it.)

Let's create another vector called, say, y:

> y <- c(10, 20, 30, 40, 50) # or y <- 10 * 1:5
> x + y
[1] 12 24 36 48 60           # Each x is added to the corresponding y
> x * y
[1]  20  80 180 320 500      # Each x multiplied by the corresponding y

What if the lengths don't match up? Then R recycles elements from the shorter one. You get a warning message if the length of the smaller one doesn't divide the length of the longer. For example:

> x[1:4] + y[1:2]
[1] 12 24 16 28     # 2 + 10, 4 + 20, 6 +10, 8 + 20; no warning
> x + y[1:2]
[1] 12 24 16 28 20  # last one is 10 (x[5]) + 10 (y[1])
Warning message:
In x + y[1:2] :
  longer object length is not a multiple of shorter object length

This vectorization gives R much of its power. You will rarely need to write an explicit loop in R. One major exception is when the i-th element of a vector depends explicitly on the (i-1)th, as in some simulations.

Return to R docs