Matrices
Data Frames
Higher-Dimensional Arrays
A matrix is a two-dimensional data structure. All the elements of a matrix must be of the same type (numeric, logical, character, complex). You can create a matrix with the matrix() command:
> matrix (1:12, nrow = 4, ncol = 3) # Use the integers 1 through 12 in four rows, three columns [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12Actually, you don't need to specify both "nrow" and "ncol," because given one, R can deduce the other. Notice that the data goes in column-by-column, unless you specify byrow=T:
> matrix (1:12, nrow = 4, byrow = T) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12Let's save this as our matrix to work with.
x <- matrix (1:12, nrow = 4, byrow=T) >dim(x) # What are the dimensions of x? [1] 4 3 # Answer: 4 rows by three columns. Notice this is a vector of length 2.
Subscripting
With a vector, we subscript with a single index. It seems natural that we should use two subscripts for a matrix. Separate them with a comma. For example:
> x[3,2] # Give me the item in the third row, second column. [1] 8 > x[1:3, c(1,3)] # Give me rows 1 through 3, columns 1 and 3 [,1] [,2] [1,] 1 3 [2,] 4 6 [3,] 7 9 # The result is a 3x2 matrix.If you omit a subscript you get the whole row or column;
> x[3,] [1] 7 8 9 > x[,2] [1] 2 5 8 11Note these results are vectors, not matrices with one row/column. If we ask for two columns, then of course we get a matrix:
>x[,c(1,3)] [,1] [,2] [1,] 1 3 [2,] 4 6 [3,] 7 9 [4,] 10 12Actually, if you really want to, you can force the result of asking for one row or column to continue to be a matrix, using the drop=F argument. It doesn't come up much, so this is just for completeness.
> x[2,,drop=F] [,1] [,2] [,3] [1,] 4 5 6 # This is a 1x3 matrix... > x[,2,drop=F] [,1] [1,] 2 [2,] 5 [3,] 8 [4,] 11 # ...this is a 4x1 matrix.
Logical Subscripting
As with a vector, we can use logical vectors to select certain rows or columns. Normally we would select rows by using a logical vector with one entry for each row, and similarly for columns. So, for example, consider the expression x[,2] > 5:
> x[,2] > 5 [1] F F T TThis has four entries, one for each row. If we wanted only the rows for which the second column is > 5, we could do that simply:
> x[x[,2] > 5,] # Give me just those rows, and all columns [,1] [,2] [,3] [1,] 7 8 9 [2,] 10 11 12Logical and Character Matrices
Here's an example of a logical matrix:
> x > 5 [,1] [,2] [,3] [1,] F F F [2,] F F T [3,] T T T [4,] T T T > x[x>5] [1] 7 10 8 11 6 9 12 # This extract the values > 5. It gives a vector, not a matrix, and note # that the extraction goes column-by-column.Here's a character matrix. It uses the built-in variable "letters" that contains the twenty-six letters in order.
> matrix (letters[1:12], nrow = 4, byrow = T) [,1] [,2] [,3] [1,] "a" "b" "c" [2,] "d" "e" "f" [3,] "g" "h" "i" [4,] "j" "k" "l" # You can tell they're characters by the quotes
Handy matrix functions
Some matrix functions that seem to come up a lot are t(), which transposes your matrix; %*%, which does matrix multiplication; and solve(), which inverts a matrix and solves linear systems. For example:
> t(x) # Give me x-transpose [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > t(x) %*% x # Here's x-transpose times x [,1] [,2] [,3] [1,] 166 188 210 [2,] 188 214 240 [3,] 210 240 270 > solve (t(x) %*% x) # Can we invert this matrix? Error in solve.default(t(x) %*% x) : Lapack routine dgesv: system is exactly singularHere x is not of full rank, so neither is x-transpose x, and we can't invert it.
Solving Linear Systems
Here's an example of solving a system of linear equations. Suppose we have the system
14 c1 + 5 c2 + 5 c3 + 2 c4 = 2 8 c1 + 3 c2 + 4 c3 + 4 c4 = 2 6 c1 + 7 c2 + 3 c3 + 7 c4 = 3 16 c1 + 6 c2 + 1 c3 + 9 c4 = 3. If we create the matrix of this system (call it mat) and the result vector (call it res), so that the system reads (mat) x = res, then we can find x by inverting the matrix with (solve()) and matrix-multiplying by res, or by calling solve() with both mat and res as arguments:
> res <- c(2,2,3,3) > mat <- matrix (c(14, 8, 16, 6, 5, 3, 7, 6, 5, 4, 3, 1, 2, 4, 7, 9), ncol=4) > solve (mat) [,1] [,2] [,3] [,4] [1,] -0.1511111 0.06 0.2288889 -0.17111111 [2,] 0.5688889 -0.52 -0.3911111 0.40888889 [3,] 0.1733333 0.24 -0.3066667 0.09333333 [4,] -0.2977778 0.28 0.1422222 -0.05777778 > solve (mat) %*% res [,1] [1,] -0.008888889 [2,] 0.151111111 # Note: result is a 4x1 matrix [3,] 0.186666667 [4,] 0.217777778 > solve (mat, c(2, 2, 3, 3)) # Result is a vector of length 4 [1] -0.008888889 0.151111111 0.186666667 0.217777778
Row and column names
One final handy thing is that the rows and/or columns of your matrix can have names. For example, we might set the columns of x to have the names of colors, and the rows to be people's names:
> dimnames(x) <- list (c("Bob", "Dave", "Mary", "Sandy"), c("Blue", "White", "Red"))Note that dimnames() is a function that expects a list (see lists). The first item on the list is the vector of row names; the second is the vector of column names; either (or both) can be omitted by replacing the vector with the reserved word NULL. Now what does x look like?
> x Blue White Red Bob 1 2 3 Dave 4 5 6 Mary 7 8 9 Sandy 10 11 12 # Same contents, it's just there are now row and column names.We can now extract by name, rather than by number. This is handy because the numbers are subject to change, if for example we delete some rows or columns.
> x["Bob","Blue"] # The top-left element [1] 1 > x[,"Red"] # The "Red" column. Note the result is a vector with names. Bob Dave Mary Sandy 3 6 9 12
> PlantGrowth weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl 4 6.11 ctrl 5 4.50 ctrl : : : 30 5.26 ctr2This looks rectangular, but it's not a matrix since the second column isn't numeric like the first. The names of the list are the column headers: every data frame must have column names. (In contrast, a matrix doesn't have to have names.) A data frame must also have row names, although often, as here, they're just ascending integers. Since a data frame is a list, you can get at the column names with the names() function; since it's a matrix, you can also get at them with the dimnames() function we used above.
In general (as here) the rows of a data frame will contain incompatible data (numbers, characters, and so on). So in contrast to the matrix case, if you extract a single row from a data frame you get a data frame:
> PlantGrowth[3,] weight group 3 5.18 ctrl # This is not a vector, it's a 1x2 data frame -- and note the row name is "3", not "1".There are some things you just can't do to a data frame. For example, you can't transpose it, because then you'd have columns with different types of things in them. When you try, R does all it can -- it converts everything to character, and then does the transposition:
> t(PlantGrowth) [,1] [,2] [,3] [,4] [,5] [,6] ... weight "4.17" "5.58" "5.18" "6.11" "4.50" "4.61" ... group "ctrl" "ctrl" "ctrl" "ctrl" "ctrl" "ctrl" ...
Data frames are handy because real-life data frequently comes in this form: it's very often rectangular, with each row representing one case and the columns representing the observations. Since a data frame is both a list and matrix, we can use either matrix-type extraction or list-type extraction. For example, all four of these produce the same result:
> PlantGrowth[,1] # (Matrix type) Give me column 1 [1] 4.17 5.58 5.18 6.11 ... > PlantGrowth[[1]] # (List type) Give me item 1 [1] 4.17 5.58 5.18 6.11 ... > PlantGrowth[,"weight"] # (Matrix type) Give me the column named "weight" [1] 4.17 5.58 5.18 6.11 ... > PlantGrowth$weight # (List type) Give me the item named "weight" [1] 4.17 5.58 5.18 6.11 ...For list extraction, you only have to give enough of the name to make it unambiguous. Here PlantGrowth$w would be enough to get the information you wanted. Of course if there was a column named weather, you'd have to specify at least wei to be unambiguous.
> mytable <- table (PlantGrowth$weight > 4, PlantGrowth$weight > 5, PlantGrowth$group) > mytable , , = ctrl FALSE TRUE FALSE 0 0 TRUE 4 6 , , = trt1 FALSE TRUE FALSE 2 0 TRUE 6 2 , , = trt2 FALSE TRUE FALSE 0 0 TRUE 1 9Since weight > 4 was the first argument to the table function, it appears in the rows. weight > 5 appears in the columns, and group appears in the "layers." We need three subscripts to extract things from this array. Here's how we get only the first layer.
> mytable[,,"ctrl"] # Drops the extra dimension, returns a 2x2 matrix FALSE TRUE FALSE 0 0 TRUE 4 6 > mytable[,,"ctrl", drop=F] # Doesn't drop: returns a 2x2x1 array. , , = ctrl FALSE TRUE FALSE 0 0 TRUE 4 6 > mytable["TRUE",,"ctrl"] # Drops two dimensions, returns a vector of length 2 FALSE TRUE 4 6 > mytable["TRUE",,"ctrl", drop=F] # Returns a 1x2x1 array. , , = ctrl FALSE TRUE TRUE 4 6