Matrices and Data Frames

## Matrices

### Higher-Dimensional Arrays

A matrix is a two-dimensional data structure. All the elements of a matrix must be of the same type (numeric, logical, character, complex). You can create a matrix with the matrix() command:
```> matrix (1:12, nrow = 4, ncol = 3) # Use the integers 1 through 12 in four rows, three columns
[,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12```
Actually, you don't need to specify both "nrow" and "ncol," because given one, R can deduce the other. Notice that the data goes in column-by-column, unless you specify byrow=T:
```> matrix (1:12, nrow = 4, byrow = T)
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12```
Let's save this as our matrix to work with.
```x <- matrix (1:12, nrow = 4, byrow=T)
>dim(x) # What are the dimensions of x?
[1] 4 3 # Answer: 4 rows by three columns. Notice this is a vector of length 2.```

Subscripting

With a vector, we subscript with a single index. It seems natural that we should use two subscripts for a matrix. Separate them with a comma. For example:

```> x[3,2] # Give me the item in the third row, second column.
[1] 8
> x[1:3, c(1,3)] # Give me rows 1 through 3, columns 1 and 3
[,1] [,2]
[1,]    1    3
[2,]    4    6
[3,]    7    9 # The result is a 3x2 matrix.```
If you omit a subscript you get the whole row or column;
```> x[3,]
[1]  7  8  9
> x[,2]
[1]  2  5  8 11```
Note these results are vectors, not matrices with one row/column. If we ask for two columns, then of course we get a matrix:
```>x[,c(1,3)]
[,1] [,2]
[1,]    1    3
[2,]    4    6
[3,]    7    9
[4,]   10   12```
Actually, if you really want to, you can force the result of asking for one row or column to continue to be a matrix, using the drop=F argument. It doesn't come up much, so this is just for completeness.
```> x[2,,drop=F]
[,1] [,2] [,3]
[1,]    4    5    6 # This is a 1x3 matrix...
> x[,2,drop=F]
[,1]
[1,]    2
[2,]    5
[3,]    8
[4,]   11           # ...this is a 4x1 matrix.```

Logical Subscripting

As with a vector, we can use logical vectors to select certain rows or columns. Normally we would select rows by using a logical vector with one entry for each row, and similarly for columns. So, for example, consider the expression x[,2] > 5:

```> x[,2] > 5
[1] F F T T
```
This has four entries, one for each row. If we wanted only the rows for which the second column is > 5, we could do that simply:
```> x[x[,2] > 5,]   # Give me just those rows, and all columns
[,1] [,2] [,3]
[1,]    7    8    9
[2,]   10   11   12
```
Logical and Character Matrices

Here's an example of a logical matrix:

```> x > 5
[,1] [,2] [,3]
[1,]    F    F    F
[2,]    F    F    T
[3,]    T    T    T
[4,]    T    T    T
> x[x>5]
[1]  7 10  8 11  6  9 12 # This extract the values > 5. It gives a vector, not a matrix, and note
# that the extraction goes column-by-column.```
Here's a character matrix. It uses the built-in variable "letters" that contains the twenty-six letters in order.
```> matrix (letters[1:12], nrow = 4, byrow = T)
[,1] [,2] [,3]
[1,] "a"  "b"  "c"
[2,] "d"  "e"  "f"
[3,] "g"  "h"  "i"
[4,] "j"  "k"  "l" # You can tell they're characters by the quotes```

Handy matrix functions

Some matrix functions that seem to come up a lot are t(), which transposes your matrix; %*%, which does matrix multiplication; and solve(), which inverts a matrix and solves linear systems. For example:

```> t(x)                   # Give me x-transpose
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> t(x) %*% x             # Here's x-transpose times x
[,1] [,2] [,3]
[1,]  166  188  210
[2,]  188  214  240
[3,]  210  240  270
> solve (t(x) %*% x)     # Can we invert this matrix?
Error in solve.default(t(x) %*% x) :
Lapack routine dgesv: system is exactly singular
```
Here x is not of full rank, so neither is x-transpose x, and we can't invert it.

Solving Linear Systems

Here's an example of solving a system of linear equations. Suppose we have the system

```14 c1 + 5 c2 + 5 c3 + 2 c4 = 2
8 c1 + 3 c2 + 4 c3 + 4 c4 = 2
6 c1 + 7 c2 + 3 c3 + 7 c4 = 3
16 c1 + 6 c2 + 1 c3 + 9 c4 = 3```
. If we create the matrix of this system (call it mat) and the result vector (call it res), so that the system reads (mat) x = res, then we can find x by inverting the matrix with (solve()) and matrix-multiplying by res, or by calling solve() with both mat and res as arguments:
```> res <- c(2,2,3,3)
> mat <- matrix (c(14, 8, 16, 6, 5, 3, 7, 6, 5, 4, 3, 1, 2, 4, 7, 9), ncol=4)
> solve (mat)
[,1]  [,2]       [,3]        [,4]
[1,] -0.1511111  0.06  0.2288889 -0.17111111
[2,]  0.5688889 -0.52 -0.3911111  0.40888889
[3,]  0.1733333  0.24 -0.3066667  0.09333333
[4,] -0.2977778  0.28  0.1422222 -0.05777778
> solve (mat) %*% res
[,1]
[1,] -0.008888889
[2,]  0.151111111      # Note: result is a 4x1 matrix
[3,]  0.186666667
[4,]  0.217777778

> solve (mat, c(2, 2, 3, 3)) # Result is a vector of length 4
[1] -0.008888889  0.151111111  0.186666667  0.217777778
```

Row and column names

One final handy thing is that the rows and/or columns of your matrix can have names. For example, we might set the columns of x to have the names of colors, and the rows to be people's names:

`> dimnames(x) <- list (c("Bob", "Dave", "Mary", "Sandy"), c("Blue", "White", "Red"))`
Note that dimnames() is a function that expects a list (see lists). The first item on the list is the vector of row names; the second is the vector of column names; either (or both) can be omitted by replacing the vector with the reserved word NULL. Now what does x look like?
```> x
Blue White Red
Bob    1     2   3
Dave    4     5   6
Mary    7     8   9
Sandy   10    11  12 # Same contents, it's just there are now row and column names.```
We can now extract by name, rather than by number. This is handy because the numbers are subject to change, if for example we delete some rows or columns.
``` > x["Bob","Blue"]      # The top-left element
[1] 1
> x[,"Red"]            # The "Red" column. Note the result is a vector with names.
Bob Dave Mary Sandy
3    6    9    12
```

## Data Frames

A data frame combines features of matrices and lists. In fact we can think of a data frame as a rectangular list, that is, a list in which all items have the length length. The items of the list serve as the columns of the data frame, so every item within a particular column has to be of the samne type. However, different columns can be of different types. For example, consider the built-in data frame called "PlantGrowth":
```> PlantGrowth
weight group
1    4.17  ctrl
2    5.58  ctrl
3    5.18  ctrl
4    6.11  ctrl
5    4.50  ctrl
:     :    :
30   5.26  ctr2
```
This looks rectangular, but it's not a matrix since the second column isn't numeric like the first. The names of the list are the column headers: every data frame must have column names. (In contrast, a matrix doesn't have to have names.) A data frame must also have row names, although often, as here, they're just ascending integers. Since a data frame is a list, you can get at the column names with the names() function; since it's a matrix, you can also get at them with the dimnames() function we used above.

In general (as here) the rows of a data frame will contain incompatible data (numbers, characters, and so on). So in contrast to the matrix case, if you extract a single row from a data frame you get a data frame:

```> PlantGrowth[3,]

weight group
3   5.18  ctrl # This is not a vector, it's a 1x2 data frame -- and note the row name is "3", not "1".```
There are some things you just can't do to a data frame. For example, you can't transpose it, because then you'd have columns with different types of things in them. When you try, R does all it can -- it converts everything to character, and then does the transposition:
```> t(PlantGrowth)
[,1]   [,2]   [,3]   [,4]   [,5]   [,6]   ...
weight "4.17" "5.58" "5.18" "6.11" "4.50" "4.61" ...
group  "ctrl" "ctrl" "ctrl" "ctrl" "ctrl" "ctrl" ...```

Data frames are handy because real-life data frequently comes in this form: it's very often rectangular, with each row representing one case and the columns representing the observations. Since a data frame is both a list and matrix, we can use either matrix-type extraction or list-type extraction. For example, all four of these produce the same result:

```> PlantGrowth[,1]             # (Matrix type) Give me column 1
[1] 4.17 5.58 5.18 6.11 ...
> PlantGrowth[[1]]            # (List type) Give me item 1
[1] 4.17 5.58 5.18 6.11 ...
> PlantGrowth[,"weight"]      # (Matrix type) Give me the column named "weight"
[1] 4.17 5.58 5.18 6.11 ...
> PlantGrowth\$weight          # (List type) Give me the item named "weight"
[1] 4.17 5.58 5.18 6.11 ... ```
For list extraction, you only have to give enough of the name to make it unambiguous. Here PlantGrowth\$w would be enough to get the information you wanted. Of course if there was a column named weather, you'd have to specify at least wei to be unambiguous.

## Higher Dimension Arrays

Data frames must be two-dimensional (rows and columns). Occasionally, though, we run into a three- or higher-dimensional array. Normally this would be the output from the table() function. An array like that requires one subscript for every dimension. Here's a slightly odd example of a three-dimensional array:
```> mytable <- table (PlantGrowth\$weight > 4, PlantGrowth\$weight > 5, PlantGrowth\$group)
> mytable
, ,  = ctrl

FALSE TRUE
FALSE     0    0
TRUE      4    6

, ,  = trt1

FALSE TRUE
FALSE     2    0
TRUE      6    2

, ,  = trt2

FALSE TRUE
FALSE     0    0
TRUE      1    9

```
Since weight > 4 was the first argument to the table function, it appears in the rows. weight > 5 appears in the columns, and group appears in the "layers." We need three subscripts to extract things from this array. Here's how we get only the first layer.
```              > mytable[,,"ctrl"]   # Drops the extra dimension, returns a 2x2 matrix

FALSE TRUE
FALSE     0    0
TRUE      4    6

> mytable[,,"ctrl", drop=F]         # Doesn't drop: returns a 2x2x1 array.

, ,  = ctrl

FALSE TRUE
FALSE     0    0
TRUE      4    6

> mytable["TRUE",,"ctrl"]       # Drops two dimensions, returns a vector of length 2
FALSE  TRUE
4     6

> mytable["TRUE",,"ctrl", drop=F] # Returns a 1x2x1 array.
, ,  = ctrl

FALSE TRUE
TRUE     4    6
```