×

This worksheet relies on the teachingtools package. To make sure that teachingtools is loaded, run the following code in your console.

If you need to install teachingtools then follow the instructions in the introduction.

xfun::pkg_attach2("teachingtools")

In this worksheet covers the material from the subsetting session.

Subsetting vectors

Problem 3.1

The vector vector1 contains a list of numbers.

vector1
 [1]  4  2 12 15  5  9  6  2  3 14  5  8
  1. Select the 4st the and 7th elements

  2. Select the the first 4 elements

Get a hint
  • Use [] to take a subset of a vector
  • Use c() to concatenate a list of values
Solution 1
vector1[c(4,7)]
[1] 15  6
Solution 2
vector1[1:4]
[1]  4  2 12 15

Problem 3.2

The vector vector1 contains a list of numbers.

vector1
 [1]  4  2 12 15  5  9  6  2  3 14  5  8
  1. Check each element of vector1 and determine which are equal to 6.

  2. The index of the elements that are equal to 6

  3. Subset vector1 to keep only the elements that are equal to 6

Get a hint
  • Use == to test for equality.
  • The function which() can be used to see find the index of elements that match a condition
  • Use [] to take a subset of a vector
Solution 1
vector1 == 6
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
Solution 2
which(vector1 == 6)
[1] 7
Solution 3
vector1[vector1 == 6]
[1] 6

Problem 3.3

The vector vector2 contains a list of numbers, although some values are missing.

vector2
 [1] 15 13  9 NA 10  6  6 13  6 14 NA  3
  1. Find the index elements of vector2 that are missing

  2. Take a subset of vector2 that contains only values between 5 and 10 (including 5 and 10). Make sure only to return numbers.

  3. Determine how many missing values vector2 contains

Get a hint
  • is.na() can be used to find missing values
  • which() will tell you which elements are TRUE
  • You can use the length() function to determine then length (i.e., how many elements) are in a vector.
Solution 1
which(is.na(vector2))
[1]  4 11
Solution 2
vector2[which(vector2 >= 5 & vector2 <= 10)] 
[1]  9 10  6  6  6
Solution 3
length(which(is.na(vector2)))
[1] 2

Problem 3.4

The vector months1 contains a list of months.

months1
[1] "February" "October"  "July"     "November" "May"      "May"     
  1. Take a subset of the of the vector so that it only contains elements with months in the set {May, June, July}.
  2. Take a subset of the vector so that it only contains elements that are NOT part of the set {May, June, July}
Get a hint
  • To test whether elements are a member of a set use the %in% operator.
  • If you have a vector of TRUE and FALSE values, you can switch the them around by wrapping them in !()
Solution 1
months1[months1 %in% c("May","June","July")]
[1] "July" "May"  "May" 
Solution 2
months1[!(months1 %in% c("May","June","July"))]
[1] "February" "October"  "November"

Subsetting parts of data tables

Problem 3.5

The data table starwars contains information about star wars characters.

starwars
# A tibble: 87 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2     96    32 <NA>       white, bl… red             33   none  mascu…
 4 Dart…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia…    150    49 brown      light      brown           19   fema… femin…
 6 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4     97    32 <NA>       white, red red             NA   none  mascu…
 9 Bigg…    183    84 black      light      brown           24   male  mascu…
10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>
  1. Select the column that contains the names. The result should still be a data table.

  2. Select the values inside the species column. The result should be a vector.

Get a hint
  • For one of these problems you’‘ll use [] for one and for the other you’’ll use [[]] / $
Solution 1
starwars['name']
# A tibble: 87 x 1
   name              
   <chr>             
 1 Luke Skywalker    
 2 C-3PO             
 3 R2-D2             
 4 Darth Vader       
 5 Leia Organa       
 6 Owen Lars         
 7 Beru Whitesun lars
 8 R5-D4             
 9 Biggs Darklighter 
10 Obi-Wan Kenobi    
# … with 77 more rows
Solution 2
starwars$species # option 1
 [1] "Human"          "Droid"          "Droid"          "Human"         
 [5] "Human"          "Human"          "Human"          "Droid"         
 [9] "Human"          "Human"          "Human"          "Human"         
[13] "Wookiee"        "Human"          "Rodian"         "Hutt"          
[17] "Human"          "Human"          "Yoda's species" "Human"         
[21] "Human"          "Droid"          "Trandoshan"     "Human"         
[25] "Human"          "Mon Calamari"   "Human"          "Human"         
[29] "Ewok"           "Sullustan"      "Human"          "Neimodian"     
[33] "Human"          "Gungan"         "Gungan"         "Gungan"        
[37] NA               "Toydarian"      "Dug"            NA              
[41] "Human"          "Zabrak"         "Twi'lek"        "Twi'lek"       
[45] "Vulptereen"     "Xexto"          "Toong"          "Human"         
[49] "Cerean"         "Nautolan"       "Zabrak"         "Tholothian"    
[53] "Iktotchi"       "Quermian"       "Kel Dor"        "Chagrian"      
[57] "Human"          "Human"          "Human"          "Geonosian"     
[61] "Mirialan"       "Mirialan"       "Human"          "Human"         
[65] "Human"          "Human"          "Clawdite"       "Besalisk"      
[69] "Kaminoan"       "Kaminoan"       "Human"          "Aleena"        
[73] "Droid"          "Skakoan"        "Muun"           "Togruta"       
[77] "Kaleesh"        "Wookiee"        "Human"          NA              
[81] "Pau'an"         "Human"          "Human"          "Human"         
[85] "Droid"          NA               "Human"         
Solution 2 (alternative approach)
starwars[['species']] # another way of doing it
 [1] "Human"          "Droid"          "Droid"          "Human"         
 [5] "Human"          "Human"          "Human"          "Droid"         
 [9] "Human"          "Human"          "Human"          "Human"         
[13] "Wookiee"        "Human"          "Rodian"         "Hutt"          
[17] "Human"          "Human"          "Yoda's species" "Human"         
[21] "Human"          "Droid"          "Trandoshan"     "Human"         
[25] "Human"          "Mon Calamari"   "Human"          "Human"         
[29] "Ewok"           "Sullustan"      "Human"          "Neimodian"     
[33] "Human"          "Gungan"         "Gungan"         "Gungan"        
[37] NA               "Toydarian"      "Dug"            NA              
[41] "Human"          "Zabrak"         "Twi'lek"        "Twi'lek"       
[45] "Vulptereen"     "Xexto"          "Toong"          "Human"         
[49] "Cerean"         "Nautolan"       "Zabrak"         "Tholothian"    
[53] "Iktotchi"       "Quermian"       "Kel Dor"        "Chagrian"      
[57] "Human"          "Human"          "Human"          "Geonosian"     
[61] "Mirialan"       "Mirialan"       "Human"          "Human"         
[65] "Human"          "Human"          "Clawdite"       "Besalisk"      
[69] "Kaminoan"       "Kaminoan"       "Human"          "Aleena"        
[73] "Droid"          "Skakoan"        "Muun"           "Togruta"       
[77] "Kaleesh"        "Wookiee"        "Human"          NA              
[81] "Pau'an"         "Human"          "Human"          "Human"         
[85] "Droid"          NA               "Human"         

Problem 3.6

  1. Again using the starwars data table, take a subset so that it only contains humans (i.e., Human in the species column). Be careful, because some rows have missing values.
Get a hint
  • use which(starwars$species == "Human") will tell you which rows contain humans.
  • When selecting specific rows, you should also indicate that you want all the columns
Solution
starwars[which(starwars$species == "Human"),]
# A tibble: 35 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 Dart…    202   136 none       white      yellow          41.9 male  mascu…
 3 Leia…    150    49 brown      light      brown           19   fema… femin…
 4 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 5 Beru…    165    75 brown      light      blue            47   fema… femin…
 6 Bigg…    183    84 black      light      brown           24   male  mascu…
 7 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
 8 Anak…    188    84 blond      fair       blue            41.9 male  mascu…
 9 Wilh…    180    NA auburn, g… fair       blue            64   male  mascu…
10 Han …    180    80 brown      fair       brown           29   male  mascu…
# … with 25 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

Subsetting parts of lists

Problem 3.7

The list list1 has two elements. animals contains a vector of animal names, and cars contains a vector of car names.

list1
$animals
 [1] "Indri"           "Chamois"         "Sea Turtle"      "Chamois"        
 [5] "Coral"           "Axolotl"         "Coral"           "Tasmanian Devil"
 [9] "Sea Turtle"      "Labradoodle"    

$cars
 [1] "Merc 450SLC"         "Toyota Corona"       "Lotus Europa"       
 [4] "Pontiac Firebird"    "Toyota Corona"       "Ferrari Dino"       
 [7] "Maserati Bora"       "Datsun 710"          "Porsche 914-2"      
[10] "Camaro Z28"          "Maserati Bora"       "Lincoln Continental"
[13] "Merc 240D"           "Honda Civic"         "Hornet Sportabout"  
  1. Take a subset of list1, so that you create a new list with one element that only has car names
  2. Take a subset of list1, so that you create a vector of only animal names.
  3. Select the 3 item in the cars element of list
Solution 1
list1['cars']
$cars
 [1] "Merc 450SLC"         "Toyota Corona"       "Lotus Europa"       
 [4] "Pontiac Firebird"    "Toyota Corona"       "Ferrari Dino"       
 [7] "Maserati Bora"       "Datsun 710"          "Porsche 914-2"      
[10] "Camaro Z28"          "Maserati Bora"       "Lincoln Continental"
[13] "Merc 240D"           "Honda Civic"         "Hornet Sportabout"  
Solution 2
list1$animals
 [1] "Indri"           "Chamois"         "Sea Turtle"      "Chamois"        
 [5] "Coral"           "Axolotl"         "Coral"           "Tasmanian Devil"
 [9] "Sea Turtle"      "Labradoodle"    
Solution 2 (alternative appraoch)
list1[["animals"]]
 [1] "Indri"           "Chamois"         "Sea Turtle"      "Chamois"        
 [5] "Coral"           "Axolotl"         "Coral"           "Tasmanian Devil"
 [9] "Sea Turtle"      "Labradoodle"    
Solution 3
list1$cars[3]
[1] "Lotus Europa"

CC-BY-NC-SA-4.0Lincoln J Colling