Subsetting allows us to select specific elements within a **vector** of items.

We can select elements by position (or **index**) in the vector. Note that R indexes by position (like Matlab) and not by offset (like python and javascript). So the first element is `1`

(not `0`

)

`x[1]`

Select the first element`x[4]`

Select the fourth element`x[-4]`

Select everything**but**the fourth element`x[2:5]`

Select element two to five`x[-(2:5)]`

Select everything**but**element two to five`x[c(1, 3)]`

Select element one and three

**Trying it out!**

Let us see how some of these operations will look in R. First, we’ll create a vector with some values in it, and then we’ll perform some operations on it.

`x <- c(1, 9, 1, 18, 4, 1, 3, 8, 9, 13) # create a vector filled with random integers`

`x # print out the values to terminal`

` [1] 1 9 1 18 4 1 3 8 9 13`

`x[4] # select the fourth element`

`[1] 18`

`x[-4] # select everything but the fourth element`

`[1] 1 9 1 4 1 3 8 9 13`

`x[2:5] # select element two to five`

`[1] 9 1 18 4`

`x[-(2:5)] # select everything **but** element two to five`

`[1] 1 1 3 8 9 13`

`x[c(1, 3)] #elect element one and three`

`[1] 1 1`

To select elements that have a specific value, we’ll first ask R to tell us the **index** of those items. To do this we’ll just use one of the **logical operators**. Some examples of **logical operators** are:

`==`

for equal to`!=`

for**not**equal to`>`

for greater than`<`

for less than`<=`

for less than or equal to`>=`

for greater than or equal to

To use these logical operators we need our vector, the operator, and the value to compare it to.

For example, `x >= 1`

means “which elements in `x`

are greater than or equal to 1?”

When we perform logical operations on a vector like this R, will tell us which elements match the logical rule and which don’t. It’ll print out `TRUE`

for those that match and `FALSE`

for those that don’t.

**Trying it out!**

Let us see what some of these operations will look like in R. We’ll again create a vector with some values in it, and then we’ll perform some operations on it.

```
x <- c(2, 8, 11, 10, -1) # create a vector with some numbers
x # print out the values to terminal
```

`[1] 2 8 11 10 -1`

`x >= 1 # which elements are greater than or equal to 1`

`[1] TRUE TRUE TRUE TRUE FALSE`

`x == 2 # which elements are equal to 2`

`[1] TRUE FALSE FALSE FALSE FALSE`

`x != 2 # which elements are NOT equal to 2`

`[1] FALSE TRUE TRUE TRUE TRUE`

We can also use logical operators to find which elements in a vector are a **member of a set**.

`%in%`

for*is an member of a set*

To use the `%in%`

we’ll also need another vector (our comparison set)

**Trying it out!**

`x # print out x just is case we forgot what was in it!`

`[1] 2 8 11 10 -1`

`x %in% c(1, 8, -1) # elements a member of the set {1, 8, -1}`

`[1] FALSE TRUE FALSE FALSE TRUE`

Finally, the function `is.na()`

can be used to test whether an element is a **missing value**. `is.na()`

is used, for example, when checking your data for missing values that you might want to impute.

**Trying it out!**

```
x <- c(1,2,3,NA) # create a vector with a missing value
x
```

`[1] 1 2 3 NA`

`is.na(x) # check which values are missing`

`[1] FALSE FALSE FALSE TRUE`

One thing to note about **missing values** is that they can’t be tested using regular logical operations like `==`

, `>`

, and the like. If you test a value that is `NA`

using these operations, it will evaluate to `NA`

, and not to `FALSE`

.

```
x <- c(1, 2, 4, NA)
x == 4
```

`[1] FALSE FALSE TRUE NA`

In addition to using logical operations to check **numeric** values, we can also use logical operations to test the values of **strings**. This works in just the same way as it does for numeric values, expect for the tests `<`

, `<=`

, `>`

, and `>=`

, because these only make sense for numbers.

```
x <- "dog"
x == "cat"
```

`[1] FALSE`

`x == "dog"`

`[1] TRUE`

`x != "cat"`

`[1] TRUE`

`x %in% c("dog", "cat", "rabbit")`

`[1] TRUE`

We can also do some more advanced logical subsetting my combining logical operations, by using `&`

(AND) and `|`

(OR). Two operations joined by an `&`

evaluates to `TRUE`

if **all conditions** are true, and it evaluate to `FALSE`

if **any condition** is false. Two operations joined by an `|`

evaluates to `TRUE`

if **either conditions** are true, and it evaluated to `FALSE`

if **all conditions** are false.

`x > 5 & x < 10`

will evaluate to`TRUE`

only if`x`

is a number between 5 and 10 (not including 5 and 10)`x > 5 | x < 10`

will evaluate to`TRUE`

for**any**number

**Trying it out!**

```
x <- 6
x > 5 & x < 10
```

`[1] TRUE`

```
x <- 11
x > 5 & x < 10
```

`[1] FALSE`

```
x <- 6
x > 5 | x < 10
```

`[1] TRUE`

```
x <- 11
x > 5 | x < 10
```

`[1] TRUE`

In the examples above, `x`

only contains a single number (it is a 1 element vector). When combining logical operators and testing vectors with multiple elements the tests are evaluated **by element** just as you would expect.

```
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x > 5 & x < 10
```

` [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE`

```
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x > 5 | x < 10
```

` [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE`

If we have a vector output of logical results then we can use the `any()`

and `all()`

functions to test is **any are true** or **all are true**

`any(x > 5 & x < 10)`

`[1] TRUE`

`any(x > 5 | x < 10)`

`[1] TRUE`

`all(x > 5 & x < 10)`

`[1] FALSE`

`all(x > 5 | x < 10)`

`[1] TRUE`

Combining logical operations with `is.na()`

can be useful if your vector contains `NA`

s. For example, if you want to know what elements match a particular condition, but you also want elements that contain an `NA`

to evaluate to `FALSE`

(and not `NA`

)

```
x <- c(1, 2, 3, 5, NA, 6)
x == 5 & !is.na(x) # elements equal to 5 evaluate as TRUE and NA as FALSE
```

`[1] FALSE FALSE FALSE TRUE FALSE FALSE`

Compare this to what happens in the `& !is.na(x)`

is ommited.

```
x <- c(1, 2, 3, 5, NA, 6)
x == 5 # elements equal to 5 evaluate as TRUE and NA as NA
```

`[1] FALSE FALSE FALSE TRUE NA FALSE`

Once we know which elements of a vector match a logical rule there are a couple of things we can do with this information: 1) we can ask for the element **indexes** or 2) we can ask for the element **values**

To get to positions in a vector that match a particular rule we just wrap our logical operation in a `which()`

function.

For example, `which(x > 5)`

asks “what are the positions in `x`

that have a value greater than 5?”

**Trying it out!**

```
x <- c(2, 8, 11, 10, -1) # create a vector with some numbers
x # print out x
```

`[1] 2 8 11 10 -1`

`which(x == 11) # which position is equal to 11`

`[1] 3`

```
# the next line will return an empty vector because no values are greater than 15
which(x > 15) # which position is greater than 15
```

`integer(0)`

```
# we can save the output of our logical operation to a variable and then use that as input for the which() function
matches <- x == 11
which(matches)
```

`[1] 3`

We saw that when we used a logical operator on a vector is returned a vector of `TRUE`

and `FALSE`

values. We can use these `TRUE`

and `FALSE`

values to select elements in vector.

`x[x > 0]`

Select all element that have a value greater than 0`x[x %in% c(1, 6, 8)]`

Select all elements that are a member of the set {1, 6, 8}

**Trying it out!**

Let us see how some of these operations will look in R. We’ll again create a vector with some values in it, and then we’ll perform some operations on it.

`x <- c(10, 10, 10, 11, -1) # create a vector with some numbers`

`x # print out the values to terminal`

`[1] 10 10 10 11 -1`

`x[x == 10] # all the elements with value 10`

`[1] 10 10 10`

```
matches <- x == 10
x[matches] # note that there's no '' around matches because it's a variable name
```

`[1] 10 10 10`

Be careful that if you have elements that evaluate to `NA`

, then `NA`

s will be returned when you subset a vector.

```
x <- c(1, 2, 3, 5, NA, 6)
x[x == 5]
```

`[1] 5 NA`

One way to get around this, is to ask for the elements that match the condition and aren’t `NA`

.

`x[x == 5 & !is.na(x)]`

`[1] 5`

Or to wrap the logic in `which()`

.

`x[which(x == 5)]`

`[1] 5`

When we have **named** vectors we can select elements using the **name** of that element.

`x["arms"]`

Select the element named**arms**`x[c("arms","legs")]`

Select the elements named**arms**and**legs**

**Trying it out!**

```
x <- c(arms = 2, legs = 2, eyes = 2, heads = 1)
x
```

```
arms legs eyes heads
2 2 2 1
```

`x["arms"]`

```
arms
2
```

`x[c("arms","legs")]`

```
arms legs
2 2
```

When we subset a **named** vector, the result will also be a named vector. The fact that it’s named makes very little difference. **Named** and **unnamed** vectors behave identically in almost every situation. But if you don’t want the name, the you can just double up on the `[]`

`x[["arms"]]`

Select the element named**arms**and discard the name.

`x[["arms"]]`

`[1] 2`

We can also ask R for give us subsets of data tables and lists. There’s a lot of similarities between data tables and lists when it comes to subsetting. There’s also a few differences, but the similarities make it worth dealing with them together.

First, lets create a list and data table (we’ll create a data table of type `data.frame`

). Like vectors, lists can also have named and unnamed elements, so we’ll create a list with named and an unnamed element. Data tables in contrast are **tabular** and therefore each column needs a name (if you don’t set one R will set it for you, so best to do it yourself!)

```
# our list
our_list <- list(el1 = c(1,2,3),
el2 = c("a","b","c"),
c("x","y"))
our_list
```

```
$el1
[1] 1 2 3
$el2
[1] "a" "b" "c"
[[3]]
[1] "x" "y"
```

```
# our data table
our_dt <- data.frame(col1 = c(1,2,3),
col2 = c("a","b","c"),
col3 = c("x","y",NA_character_))
our_dt
```

```
col1 col2 col3
1 1 a x
2 2 b y
3 3 c <NA>
```

To get named elements from lists and tables there are two general approaches. The first is the same as you would use for a named vector. That is, by using `[]`

and the name of the element. This works for both lists and data tables. The output for a list, is also a list. And for a data table, the output is also a data table. However, they each only contain the selected element.

```
# extract the named element el1
our_list["el1"]
```

```
$el1
[1] 1 2 3
```

`our_dt["col1"]`

```
col1
1 1
2 2
3 3
```

We can also ask for multiple elements

`our_list[c("el1","el2")]`

```
$el1
[1] 1 2 3
$el2
[1] "a" "b" "c"
```

`our_dt[c("col1",'col2')]`

```
col1 col2
1 1 a
2 2 b
3 3 c
```

While using single `[]`

returns an object of the same type (a list for a list and a data table for a data table), sometimes we might want to access the data **inside** the list element, or the data **inside** the column. To get the data **inside**, you simply use `[[]]`

^{1}. When using `[[]]`

, however, you can only select single elements because the returned data will no longer be organised inside the list or data table.

`our_list[["el1"]]`

`[1] 1 2 3`

`our_dt[["col2"]]`

`[1] "a" "b" "c"`

In addition to getting elements using the `[]`

and `[[]]`

syntax, it’s also possible to get **named** elements using the `$`

. Using the `$`

is the same as using `[[]]`

`our_list$el1`

`[1] 1 2 3`

`our_dt$col1`

`[1] 1 2 3`

You can also subset data tables and lists using indexes the same way that you would for a vector.

`our_dt[c(1,3)] # the first and third column`

```
col1 col3
1 1 x
2 2 y
3 3 <NA>
```

`our_list[c(1,3)] # first and third list element`

```
$el1
[1] 1 2 3
[[2]]
[1] "x" "y"
```

`our_list[[1]] # content *inside* element 1`

`[1] 1 2 3`

`our_dt[[3]] # content *inside* column 1 `

`[1] "x" "y" NA `

Note that when the content inside the element is a vector or a list, then we can subset it further without having to save the intermediate result to a variable. To do this, we just add further `[]`

to the end.

`our_list[[2]][3]`

`[1] "c"`

`our_dt[[2]][[3]]`

`[1] "c"`

Unlike lists, data tables are **tabular**, which means you can access specific cells inside the table. To do this, we just employ matrix notation. This is less cumbersome than using `[[]][[]]`

style notation. To demonstrate this, we’ll just create a new data table where each cell contains information about where in the table it’s located (it’s column position and it’s row).

```
our_dt2 <- data.frame(col1 = c("r1_c1","r2_c1","r3_c1"),
col2 = c("r1_c2","r2_c2","r3_c2"))
our_dt2
```

```
col1 col2
1 r1_c1 r1_c2
2 r2_c1 r2_c2
3 r3_c1 r3_c2
```

We can now request the cell by position using `[row_num,col_num]`

(apart from `[]`

instead of `()`

this is the same as it would be done in Matlab)

`our_dt2[3,2] # row 3 column 2`

`[1] "r3_c2"`

Finally, rows, columns, or single cells can also be accessed using logic. As with logical sub setting of vectors we just use some logic to generate our indexes and then use the indexes to subset.

Logical subsetting of data table is usually used for filtering rows. To see this in action we’ll first get the indexes in `col2`

of those cells that match a particular condition.

```
# The elements in col2 that match the condition of being equal to "r1_c2"
our_dt2$col2 == "r2_c2"
```

`[1] FALSE TRUE FALSE`

The output tells use that element 2 matches the condition. We can now use this condition inside `[]`

to request row 2 of the data table. Because we want all the columns we just don’t write anything after the `,`

(in Matlab this would be done by using `:`

after the `,`

).

`our_dt2[our_dt2$col2 == "r2_c2",]`

```
col1 col2
2 r2_c1 r2_c2
```

If you use Matlab you can think of this as the distinction between the element and the stuff inside the elements is the distinction between using

`()`

and`{}`

to access elements in a cell array↩︎