This worksheet relies on the teachingtools package. To make sure that teachingtools is loaded, run the following code in your console.
If you need to install teachingtools then follow the instructions in the introduction.
xfun::pkg_attach2("teachingtools")
In this worksheet covers the material from the subsetting session.
The vector vector1
contains a list of numbers.
vector1
[1] 4 2 12 15 5 9 6 2 3 14 5 8
Select the 4st the and 7th elements
Select the the first 4 elements
[]
to take a subset of a vector
c()
to concatenate a list of values
vector1[c(4,7)]
[1] 15 6
vector1[1:4]
[1] 4 2 12 15
The vector vector1
contains a list of numbers.
vector1
[1] 4 2 12 15 5 9 6 2 3 14 5 8
Check each element of vector1
and determine which are equal to 6.
The index of the elements that are equal to 6
Subset vector1
to keep only the elements that are equal to 6
==
to test for equality.
which()
can be used to see find the index of elements that match a condition
[]
to take a subset of a vector
vector1 == 6
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
which(vector1 == 6)
[1] 7
vector1[vector1 == 6]
[1] 6
The vector vector2
contains a list of numbers, although some values are missing.
vector2
[1] 15 13 9 NA 10 6 6 13 6 14 NA 3
Find the index elements of vector2
that are missing
Take a subset of vector2
that contains only values between 5 and 10 (including 5 and 10). Make sure only to return numbers.
Determine how many missing values vector2
contains
is.na()
can be used to find missing values
which()
will tell you which elements are TRUE
length()
function to determine then length (i.e., how many elements) are in a vector.
which(is.na(vector2))
[1] 4 11
vector2[which(vector2 >= 5 & vector2 <= 10)]
[1] 9 10 6 6 6
length(which(is.na(vector2)))
[1] 2
The vector months1
contains a list of months.
months1
[1] "February" "October" "July" "November" "May" "May"
%in%
operator.
TRUE
and FALSE
values, you can switch the them around by wrapping them in !()
months1[months1 %in% c("May","June","July")]
[1] "July" "May" "May"
months1[!(months1 %in% c("May","June","July"))]
[1] "February" "October" "November"
The data table starwars
contains information about star wars characters.
starwars
# A tibble: 87 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
4 Dart… 202 136 none white yellow 41.9 male mascu…
5 Leia… 150 49 brown light brown 19 fema… femin…
6 Owen… 178 120 brown, gr… light blue 52 male mascu…
7 Beru… 165 75 brown light blue 47 fema… femin…
8 R5-D4 97 32 <NA> white, red red NA none mascu…
9 Bigg… 183 84 black light brown 24 male mascu…
10 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
# … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
Select the column that contains the names. The result should still be a data table.
Select the values inside the species
column. The result should be a vector.
[]
for one and for the other you’’ll use [[]]
/ $
starwars['name']
# A tibble: 87 x 1
name
<chr>
1 Luke Skywalker
2 C-3PO
3 R2-D2
4 Darth Vader
5 Leia Organa
6 Owen Lars
7 Beru Whitesun lars
8 R5-D4
9 Biggs Darklighter
10 Obi-Wan Kenobi
# … with 77 more rows
starwars$species # option 1
[1] "Human" "Droid" "Droid" "Human"
[5] "Human" "Human" "Human" "Droid"
[9] "Human" "Human" "Human" "Human"
[13] "Wookiee" "Human" "Rodian" "Hutt"
[17] "Human" "Human" "Yoda's species" "Human"
[21] "Human" "Droid" "Trandoshan" "Human"
[25] "Human" "Mon Calamari" "Human" "Human"
[29] "Ewok" "Sullustan" "Human" "Neimodian"
[33] "Human" "Gungan" "Gungan" "Gungan"
[37] NA "Toydarian" "Dug" NA
[41] "Human" "Zabrak" "Twi'lek" "Twi'lek"
[45] "Vulptereen" "Xexto" "Toong" "Human"
[49] "Cerean" "Nautolan" "Zabrak" "Tholothian"
[53] "Iktotchi" "Quermian" "Kel Dor" "Chagrian"
[57] "Human" "Human" "Human" "Geonosian"
[61] "Mirialan" "Mirialan" "Human" "Human"
[65] "Human" "Human" "Clawdite" "Besalisk"
[69] "Kaminoan" "Kaminoan" "Human" "Aleena"
[73] "Droid" "Skakoan" "Muun" "Togruta"
[77] "Kaleesh" "Wookiee" "Human" NA
[81] "Pau'an" "Human" "Human" "Human"
[85] "Droid" NA "Human"
starwars[['species']] # another way of doing it
[1] "Human" "Droid" "Droid" "Human"
[5] "Human" "Human" "Human" "Droid"
[9] "Human" "Human" "Human" "Human"
[13] "Wookiee" "Human" "Rodian" "Hutt"
[17] "Human" "Human" "Yoda's species" "Human"
[21] "Human" "Droid" "Trandoshan" "Human"
[25] "Human" "Mon Calamari" "Human" "Human"
[29] "Ewok" "Sullustan" "Human" "Neimodian"
[33] "Human" "Gungan" "Gungan" "Gungan"
[37] NA "Toydarian" "Dug" NA
[41] "Human" "Zabrak" "Twi'lek" "Twi'lek"
[45] "Vulptereen" "Xexto" "Toong" "Human"
[49] "Cerean" "Nautolan" "Zabrak" "Tholothian"
[53] "Iktotchi" "Quermian" "Kel Dor" "Chagrian"
[57] "Human" "Human" "Human" "Geonosian"
[61] "Mirialan" "Mirialan" "Human" "Human"
[65] "Human" "Human" "Clawdite" "Besalisk"
[69] "Kaminoan" "Kaminoan" "Human" "Aleena"
[73] "Droid" "Skakoan" "Muun" "Togruta"
[77] "Kaleesh" "Wookiee" "Human" NA
[81] "Pau'an" "Human" "Human" "Human"
[85] "Droid" NA "Human"
starwars
data table, take a subset so that it only contains humans (i.e., Human
in the species
column). Be careful, because some rows have missing values.which(starwars$species == "Human")
will tell you which rows contain humans.starwars[which(starwars$species == "Human"),]
# A tibble: 35 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 Dart… 202 136 none white yellow 41.9 male mascu…
3 Leia… 150 49 brown light brown 19 fema… femin…
4 Owen… 178 120 brown, gr… light blue 52 male mascu…
5 Beru… 165 75 brown light blue 47 fema… femin…
6 Bigg… 183 84 black light brown 24 male mascu…
7 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
8 Anak… 188 84 blond fair blue 41.9 male mascu…
9 Wilh… 180 NA auburn, g… fair blue 64 male mascu…
10 Han … 180 80 brown fair brown 29 male mascu…
# … with 25 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
The list list1
has two elements. animals
contains a vector of animal names, and cars
contains a vector of car names.
list1
$animals
[1] "Indri" "Chamois" "Sea Turtle" "Chamois"
[5] "Coral" "Axolotl" "Coral" "Tasmanian Devil"
[9] "Sea Turtle" "Labradoodle"
$cars
[1] "Merc 450SLC" "Toyota Corona" "Lotus Europa"
[4] "Pontiac Firebird" "Toyota Corona" "Ferrari Dino"
[7] "Maserati Bora" "Datsun 710" "Porsche 914-2"
[10] "Camaro Z28" "Maserati Bora" "Lincoln Continental"
[13] "Merc 240D" "Honda Civic" "Hornet Sportabout"
list1
, so that you create a new list with one element that only has car nameslist1
, so that you create a vector of only animal names.cars
element of list
list1['cars']
$cars
[1] "Merc 450SLC" "Toyota Corona" "Lotus Europa"
[4] "Pontiac Firebird" "Toyota Corona" "Ferrari Dino"
[7] "Maserati Bora" "Datsun 710" "Porsche 914-2"
[10] "Camaro Z28" "Maserati Bora" "Lincoln Continental"
[13] "Merc 240D" "Honda Civic" "Hornet Sportabout"
list1$animals
[1] "Indri" "Chamois" "Sea Turtle" "Chamois"
[5] "Coral" "Axolotl" "Coral" "Tasmanian Devil"
[9] "Sea Turtle" "Labradoodle"
list1[["animals"]]
[1] "Indri" "Chamois" "Sea Turtle" "Chamois"
[5] "Coral" "Axolotl" "Coral" "Tasmanian Devil"
[9] "Sea Turtle" "Labradoodle"
list1$cars[3]
[1] "Lotus Europa"