Types of Data

## Types of Data

R objects comes in a lot of forms. The word object refers to just about anything in R: data, functions, expressions, and so on. I'll use data to refer to the information we're trying to process. Data may be made up of bits of only one type (for example, it might be all numeric) or it might be a mixture. For example, you might have a data frame that descibes people by giving their name and their age, a character string and a number. At the lowest level of R data is the idea of a vector . A vector is one or more items of the same type, or mode: all numeric, all character, all logical, all factor (factor items are actually stored as numeric: see factors), or (these don't come up much) complex and NULL.

A vector is said to be an atom because its entries are all of one mode. A matrix is also an atom, as is a higher-way array.

There are some other atomic modes, but they generally won't apply to what we think of as data. Often our data will be in the form of data frames or lists. These have mode "list." The mode() function will tell you what mode an object has.

#### Types of Numeric Data

Numeric data has mode "numeric," but it has one of three types: integer, single precision, and double precision. Most numbers in R are represented in double precision, called double in R. Single precision (called "float" in C) uses less memory, so I suppose you could save memory in a really big matrix by keeping it in single precision, but R tends to converts singles to doubles when it's computing anyway. The only real place singles are really used in R is in calling C and Fortran routines that expect floats.

Integers will be converted into doubles as needed, so you rarely need to worry about whether a thing that looks like an integer is being stored as an integer -- again, unless you want to pass that value to an external routine.

#### Converting from one mode to another

R will try to convert from one mode to another as context demands. Remember that every element of a vector needs to have to same mode. As a result, if you insert a character value into a numeric vector, the whole vector will be changed to character. These automatic conversions will always be from a more specific mode to a more general one: logical to double, and either of those to character.

To force a conversion from one mode to another, use one of the as functions: as.logical(), as.numeric(), as.character(), and so on. These functions exist for non-atomic modes as well, and there's also a general as() function that takes the destination mode as an argument: as ("3.5", "numeric"), for example, converts the character string "3.5" into a number.

The conversion rules include:

• When converting numeric to logical, 0 turns into FALSE, anything else turns into TRUE. Warning: you can't expect your double-precision zeros to be exactly zero. To re-use an example from earlier, consider this:
```> as.logical (1 - (1/47:50 * 47:50)) # These should all be zero
[1] F F T F                          # The third one isn't: why not?
> 1 - (1/49)*49                      # Because in floating point...
[1] 1.110223e-016                    # ...this number isn't exactly zero.
```
• When converting character to logical, "true," "TRUE" and "T" (upper-case only) become TRUE, "false" and "FALSE" and "F" become FALSE, and everything else turns into an NA.
• When converting character to numeric, things that cannot be parsed turn into NA. Scientific notation like "123e34" (with upper-case or lower-case "e") is understood.
• as.integer() truncates a floating-point number so that the result is the nearby integer closer to zero. This has the same effect as the trunc(). For greater control you might want to use round(), floor(), or ceil().

You can also change a mode directly with a command like mode(a) <- "integer". This has the advantage of preserving other attributes of the object, like its name and its dimension. If the item is a matrix this will keep it a matrix, whereas as.integer() will turn it into a vector and remove the names.