This worksheet relies on the teachingtools and tidyverse packages. To make sure that teachingtools and tidyverse are loaded, run the following code in your console.
If you need to install teachingtools then follow the instructions in the introduction.
xfun::pkg_attach2("teachingtools","tidyverse")
This worksheet covers the material from the Basic data wrangling with tidyverse session. All these questions should be answered using tidyverse style syntax.
The following questions all rely on the starwars
data table. The starwars
data table contains information about the characters in the Star Wars films.
starwars
# A tibble: 87 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
4 Dart… 202 136 none white yellow 41.9 male mascu…
5 Leia… 150 49 brown light brown 19 fema… femin…
6 Owen… 178 120 brown, gr… light blue 52 male mascu…
7 Beru… 165 75 brown light blue 47 fema… femin…
8 R5-D4 97 32 <NA> white, red red NA none mascu…
9 Bigg… 183 84 black light brown 24 male mascu…
10 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
# … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
Take a subset of the data table so that it is only contains characters between 100 and 200 cm (inclusive) tall (use the height
column)
filter()
function
filter(.data = starwars, height >= 100 & height <= 200)
# A tibble: 64 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
3 Leia… 150 49 brown light brown 19 fema… femin…
4 Owen… 178 120 brown, gr… light blue 52 male mascu…
5 Beru… 165 75 brown light blue 47 fema… femin…
6 Bigg… 183 84 black light brown 24 male mascu…
7 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
8 Anak… 188 84 blond fair blue 41.9 male mascu…
9 Wilh… 180 NA auburn, g… fair blue 64 male mascu…
10 Han … 180 80 brown fair brown 29 male mascu…
# … with 54 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
Take a subset of the data table so that it only contains characters which are human (use the species
column)
filter(.data = starwars, species == "Human")
# A tibble: 35 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 Dart… 202 136 none white yellow 41.9 male mascu…
3 Leia… 150 49 brown light brown 19 fema… femin…
4 Owen… 178 120 brown, gr… light blue 52 male mascu…
5 Beru… 165 75 brown light blue 47 fema… femin…
6 Bigg… 183 84 black light brown 24 male mascu…
7 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
8 Anak… 188 84 blond fair blue 41.9 male mascu…
9 Wilh… 180 NA auburn, g… fair blue 64 male mascu…
10 Han … 180 80 brown fair brown 29 male mascu…
# … with 25 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
Take a subset of the data table so that is only contains characters that have a home world of Tatooine
{or Naboo
.
%in%
to check whether a value is part of a set
filter(.data = starwars, homeworld %in% c("Tatoonie","Naboo"))
# A tibble: 11 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
2 Palp… 170 75 grey pale yellow 82 male mascu…
3 Jar … 196 66 none orange orange 52 male mascu…
4 Roos… 224 82 none grey orange NA male mascu…
5 Rugo… 206 NA none green orange NA male mascu…
6 Ric … 183 NA brown fair blue NA <NA> <NA>
7 Quar… 183 NA black dark brown 62 <NA> <NA>
8 Greg… 185 85 black dark brown NA male mascu…
9 Cordé 157 NA brown light brown NA fema… femin…
10 Dormé 165 NA brown light brown NA fema… femin…
11 Padm… 165 45 brown light brown 46 fema… femin…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
Take a subset of the data table where the output is a vector containing only the home worlds
select()
function will return a data table.
pull(.data = starwars, homeworld)
[1] "Tatooine" "Tatooine" "Naboo" "Tatooine"
[5] "Alderaan" "Tatooine" "Tatooine" "Tatooine"
[9] "Tatooine" "Stewjon" "Tatooine" "Eriadu"
[13] "Kashyyyk" "Corellia" "Rodia" "Nal Hutta"
[17] "Corellia" "Bestine IV" NA "Naboo"
[21] "Kamino" NA "Trandosha" "Socorro"
[25] "Bespin" "Mon Cala" "Chandrila" NA
[29] "Endor" "Sullust" NA "Cato Neimoidia"
[33] "Coruscant" "Naboo" "Naboo" "Naboo"
[37] "Naboo" "Toydaria" "Malastare" "Naboo"
[41] "Tatooine" "Dathomir" "Ryloth" "Ryloth"
[45] "Vulpter" "Troiken" "Tund" "Haruun Kal"
[49] "Cerea" "Glee Anselm" "Iridonia" "Coruscant"
[53] "Iktotch" "Quermia" "Dorin" "Champala"
[57] "Naboo" "Naboo" "Tatooine" "Geonosis"
[61] "Mirial" "Mirial" "Naboo" "Serenno"
[65] "Alderaan" "Concord Dawn" "Zolan" "Ojom"
[69] "Kamino" "Kamino" "Coruscant" "Aleen Minor"
[73] NA "Skako" "Muunilinst" "Shili"
[77] "Kalee" "Kashyyyk" "Alderaan" "Umbara"
[81] "Utapau" NA NA NA
[85] NA NA "Naboo"
Take a subset of the data table where the output is a data table containing only the home worlds
select(.data = starwars, homeworld)
# A tibble: 87 x 1
homeworld
<chr>
1 Tatooine
2 Tatooine
3 Naboo
4 Tatooine
5 Alderaan
6 Tatooine
7 Tatooine
8 Tatooine
9 Tatooine
10 Stewjon
# … with 77 more rows
Take a subset of the data table so that it only contains characters that have both black hair and blue eyes.
filter(.data = starwars, hair_color == "black" & eye_color == "blue")
# A tibble: 2 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Lumi… 170 56.2 black yellow blue 58 fema… femin…
2 Barr… 166 50 black yellow blue 40 fema… femin…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
Take a subset of the data table so that it only contains characters that either have black hair or have blue eyes.
filter(.data = starwars, hair_color == "black" | eye_color == "blue")
# A tibble: 30 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 Owen… 178 120 brown, gr… light blue 52 male mascu…
3 Beru… 165 75 brown light blue 47 fema… femin…
4 Bigg… 183 84 black light brown 24 male mascu…
5 Anak… 188 84 blond fair blue 41.9 male mascu…
6 Wilh… 180 NA auburn, g… fair blue 64 male mascu…
7 Chew… 228 112 brown unknown blue 200 male mascu…
8 Jek … 180 110 brown fair blue NA male mascu…
9 Boba… 183 78.2 black fair brown 31.5 male mascu…
10 Land… 177 79 black dark brown 31 male mascu…
# … with 20 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
Take a subset of the data so that it includes all the characters EXCEPT those that have both black hair and blue eyes.
!
filter(.data = starwars, !(hair_color == "black" & eye_color == "blue"))
# A tibble: 85 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke… 172 77 blond fair blue 19 male mascu…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
4 Dart… 202 136 none white yellow 41.9 male mascu…
5 Leia… 150 49 brown light brown 19 fema… femin…
6 Owen… 178 120 brown, gr… light blue 52 male mascu…
7 Beru… 165 75 brown light blue 47 fema… femin…
8 R5-D4 97 32 <NA> white, red red NA none mascu…
9 Bigg… 183 84 black light brown 24 male mascu…
10 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu…
# … with 75 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
The following questions all rely on beck_data
data table. The beck_data
contains data from 20 participants (identified by SubjCode
), and their scores on 3 items from the Beck depression inventory (BDI) and 3 items from the Beck Anxiety Inventory (BAI). This data was supplied to you by a collaborator that isn’t very detail oriented, so they may not have employed consistent in how they named their variables
beck_data
# A tibble: 20 x 7
SubjCode BDI1 BDI2 Bdi3 BAI1 BAx2 BAI3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 OTLCZE 1 0 1 1 3 0
2 NDMCLE 1 2 3 2 3 2
3 LWBHYZ 2 0 3 1 1 3
4 MDBONG 3 1 2 1 1 2
5 IEHLOP 3 2 2 2 2 2
6 MOLDIY 2 1 1 3 1 0
7 FMESAI 2 2 2 1 0 0
8 OAKBCE 2 0 2 0 1 2
9 RKUPQC 2 2 0 1 1 1
10 OQRUZE 2 3 2 1 2 1
11 RESZOA 0 2 2 0 3 3
12 MKQJWV 2 2 0 3 2 0
13 PUSELO 3 3 2 2 1 1
14 FRBHDW 1 2 3 2 2 2
15 JXIWPN 2 0 0 2 2 1
16 ZPCGIO 2 2 3 1 0 2
17 ZMONYS 2 3 2 0 2 0
18 LMWVRZ 1 0 1 1 2 1
19 VALHXF 2 3 2 2 2 2
20 YAFVOM 1 1 1 0 3 3
Take a subset of the BDI data so that you’re left with a data table containing only the participant identifier and the data from DBI
starts_with()
, ends_with()
, and contains()
may be useful.
select(.data = beck_data, SubjCode, starts_with("bdi", ignore.case = T))
# A tibble: 20 x 4
SubjCode BDI1 BDI2 Bdi3
<chr> <dbl> <dbl> <dbl>
1 OTLCZE 1 0 1
2 NDMCLE 1 2 3
3 LWBHYZ 2 0 3
4 MDBONG 3 1 2
5 IEHLOP 3 2 2
6 MOLDIY 2 1 1
7 FMESAI 2 2 2
8 OAKBCE 2 0 2
9 RKUPQC 2 2 0
10 OQRUZE 2 3 2
11 RESZOA 0 2 2
12 MKQJWV 2 2 0
13 PUSELO 3 3 2
14 FRBHDW 1 2 3
15 JXIWPN 2 0 0
16 ZPCGIO 2 2 3
17 ZMONYS 2 3 2
18 LMWVRZ 1 0 1
19 VALHXF 2 3 2
20 YAFVOM 1 1 1
Take a subset of the data so that you only have the participant identifier and the third item of each scale
select(.data = beck_data, SubjCode, ends_with("3"))
# A tibble: 20 x 3
SubjCode Bdi3 BAI3
<chr> <dbl> <dbl>
1 OTLCZE 1 0
2 NDMCLE 3 2
3 LWBHYZ 3 3
4 MDBONG 2 2
5 IEHLOP 2 2
6 MOLDIY 1 0
7 FMESAI 2 0
8 OAKBCE 2 2
9 RKUPQC 0 1
10 OQRUZE 2 1
11 RESZOA 2 3
12 MKQJWV 0 0
13 PUSELO 2 1
14 FRBHDW 3 2
15 JXIWPN 0 1
16 ZPCGIO 3 2
17 ZMONYS 2 0
18 LMWVRZ 1 1
19 VALHXF 2 2
20 YAFVOM 1 3