A "string" is a collection of characters that make up one element of a vector. Usually, you can tell a string because it will be enclosed in (double) quotation marks. Similarly, you can construct a string by enclosing some characters in quotation marks. You may use either single quotes (the character below the double quote, next to the Enter key, on most keyboards) or doubles, so this provides a natural way to include the quotation characters themselves:
> "This is a string" [1] "This is a string" > "A string won't omit quotes" # Single quote needn't be paired [1] "A string won't omit quotes" > 'This string has "double quotes"' [1] "This string has \"double quotes\""Observe in the last example that R prints the embedded double quotes by preceding them with the backslash character. That character is not part of the string; it's only visible when you print out the string.
In addition to the double quote character, there are several other special (non-printing) characters that can appear in strings. The most commonly used are "\t" for TAB, "\n" for new-line, and "\\" for a (single) backslash character. The nchar() function tells you how many characters are in a string; this tells that the "escaping" backslash character (that is, the one that precedes the "t" in the tab character, for example) doesn't count as a character.
> "Tab\t" [1] "Tab\t" > cat ("Tab\t") # print it formatted to the screen Tab > > nchar ("Tab\t") # This string has 4, not 5, characters
> str <- letters[1:5] > str [1] "a" "b" "c" "d" "e" > mean (str) Warning messages: Warning in as.double(x): 5 missing values generated coercing from character to numeric [1] NAbut you can use the usual extraction and assignment operators:
>str[3:4] <- c("Yes", "No") > str [1] "a" "b" "Yes" "No" "e" > length(str) [1] 5
> str <- c("1", "2", "Yes", "No", "5") > as.numeric(str) Warning messages: Warning in as.double(x): 2 missing values generated coercing from character to numeric [1] 1 2 NA NA 5 > as.character (1:3) # conversely... [1] "1" "2" "3"If you really don't want warnings when converting to numeric, you can turn them off with the options() command. Make sure you know what you're doing, though.
> paste (c("a", "b", "c"), 1:5) [1] "a 1" "b 2" "c 3" "a 4" "b 5"Here the first argument is used up after three items, so the system returns to the first element for the last two items of the second argument. By default the separator is one space. You can specify a different separator with the sep= argument; one uimportant choice is the empty string, "":
> paste (c("a", "b", "c"), 1:5, sep="") [1] "a1" "b2" "c3" "a4" "b5"Here's an example of combining some numbers and some percentages. This is pretty close already:
> paste (1:3, c(10, 20, 30), sep=" which is ") [1] "1 which is 10" "2 which is 20" "3 which is 30" > paste (1:3, c(10, 20, 30), "%", sep=" which is ") # Does this work? [1] "1 which is 10 which is %" "2 which is 20 which is %" [3] "3 which is 30 which is %"Not quite. The "%" string was replicated to length 3 to match the other strings; then the "which is" separator was added before the "%". However, this works:
> paste (paste (1:3, c(10, 20, 30), sep=" which is "), "%") [1] "1 which is 10 %" "2 which is 20 %" "3 which is 30 %" > paste (paste (1:3, c(10, 20, 30), sep=" (which is "), "%)") # neater [1] "1 (which is 10 %)" "2 (which is 20 %)" "3 (which is 30 %)" > hold.this <- .Last.value # save thatIf you want to combine your vector of strings into one long string, use the collapse= argument. Whatever you put in there will be inserted between strings, and then everthing is crunched down into one long string. Often you won't want anything in that collapse argument:
> paste (hold.this, "\n", collapse="") [1] "1 (which is 10%) \n2 (which is 20%) \n3 (which is 30%) \n" > cat (paste (hold.this, "\n", collapse="")) 1 (which is 10%) 2 (which is 20%) 3 (which is 30%)
> strsplit ("Nospaces", "s") [[1]]: [1] "No" "pace"Observe that the s's, including the last one, are removed. It's rare that you want the list in this form. Typically you'll want to use unlist() as well as unpaste():
>unlist (unpaste ("Nospaces", "s")) [1] "No" "pace" ""This allows quick manipulation of certain weird strings by unpasting and then pasting together with a different separator character. Consider this function, for example:
convert.delimiter <- function (string, old="_", new = ".") { # convert string delimited by "_" into strings delimited by "." paste (unlist (strsplit (string, old)), collapse=new) } > convert.delimiter ("a_thing_with_delimiters") [1] "a.thing.with.delimiters"
> st <- dimnames(state.x77)[[1]] # State names, from built-in dataset > st[1:5] [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" > st <- st[1:5] # Let's just use these five for now > substring (st, 1, 3) # Give me the first three characters from each [1] "Ala" "Ala" "Ari" "Ark" "Cal" > substring (st, 1:5, 3:7) # Give me 1:3 from the first, 2-4 from the second... [1] "Ala" "las" "izo" "ana" "for" > substring (st, nchar(st) - 2, nchar(st)) # Give me the last three [1] "ama" "ska" "ona" "sas" "nia"In that last example, we used nchar() to return a vector of lengths. Of course the final three characters in each name are as positions nchar(st) - 2, nchar(st) - 1, and nchar(st), so we can extract the final three characters of each element in a vectorized fashion.