×

This worksheet relies on the teachingtools and tidyverse packages. To make sure that teachingtools and tidyverse are loaded, run the following code in your console.

If you need to install teachingtools then follow the instructions in the introduction.

xfun::pkg_attach2("teachingtools","tidyverse")

This worksheet covers the material from the Basic data wrangling with tidyverse session. All these questions should be answered using tidyverse style syntax.

The star wars data

The following questions all rely on the starwars data table. The starwars data table contains information about the characters in the Star Wars films.

starwars
# A tibble: 87 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2     96    32 <NA>       white, bl… red             33   none  mascu…
 4 Dart…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia…    150    49 brown      light      brown           19   fema… femin…
 6 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4     97    32 <NA>       white, red red             NA   none  mascu…
 9 Bigg…    183    84 black      light      brown           24   male  mascu…
10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

Problem 4.1

Take a subset of the data table so that it is only contains characters between 100 and 200 cm (inclusive) tall (use the height column)

Get a hint
  • You’ll need to use the filter() function
  • You’ll need to combine two logical operations
Solution
filter(.data = starwars, height >= 100 & height <= 200) 
# A tibble: 64 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu…
 3 Leia…    150    49 brown      light      brown           19   fema… femin…
 4 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 5 Beru…    165    75 brown      light      blue            47   fema… femin…
 6 Bigg…    183    84 black      light      brown           24   male  mascu…
 7 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
 8 Anak…    188    84 blond      fair       blue            41.9 male  mascu…
 9 Wilh…    180    NA auburn, g… fair       blue            64   male  mascu…
10 Han …    180    80 brown      fair       brown           29   male  mascu…
# … with 54 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

Problem 4.2

Take a subset of the data table so that it only contains characters which are human (use the species column)

Solution
filter(.data = starwars, species == "Human")
# A tibble: 35 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 Dart…    202   136 none       white      yellow          41.9 male  mascu…
 3 Leia…    150    49 brown      light      brown           19   fema… femin…
 4 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 5 Beru…    165    75 brown      light      blue            47   fema… femin…
 6 Bigg…    183    84 black      light      brown           24   male  mascu…
 7 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
 8 Anak…    188    84 blond      fair       blue            41.9 male  mascu…
 9 Wilh…    180    NA auburn, g… fair       blue            64   male  mascu…
10 Han …    180    80 brown      fair       brown           29   male  mascu…
# … with 25 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

Problem 4.3

Take a subset of the data table so that is only contains characters that have a home world of Tatooine {or Naboo.

Get a hint
  • You can use %in% to check whether a value is part of a set
Solution
filter(.data = starwars, homeworld %in% c("Tatoonie","Naboo"))
# A tibble: 11 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 R2-D2     96    32 <NA>       white, bl… red               33 none  mascu…
 2 Palp…    170    75 grey       pale       yellow            82 male  mascu…
 3 Jar …    196    66 none       orange     orange            52 male  mascu…
 4 Roos…    224    82 none       grey       orange            NA male  mascu…
 5 Rugo…    206    NA none       green      orange            NA male  mascu…
 6 Ric …    183    NA brown      fair       blue              NA <NA>  <NA>  
 7 Quar…    183    NA black      dark       brown             62 <NA>  <NA>  
 8 Greg…    185    85 black      dark       brown             NA male  mascu…
 9 Cordé    157    NA brown      light      brown             NA fema… femin…
10 Dormé    165    NA brown      light      brown             NA fema… femin…
11 Padm…    165    45 brown      light      brown             46 fema… femin…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

Problem 4.4

Take a subset of the data table where the output is a vector containing only the home worlds

Get a hint
  • The select() function will return a data table.
Solution
pull(.data = starwars, homeworld)
 [1] "Tatooine"       "Tatooine"       "Naboo"          "Tatooine"      
 [5] "Alderaan"       "Tatooine"       "Tatooine"       "Tatooine"      
 [9] "Tatooine"       "Stewjon"        "Tatooine"       "Eriadu"        
[13] "Kashyyyk"       "Corellia"       "Rodia"          "Nal Hutta"     
[17] "Corellia"       "Bestine IV"     NA               "Naboo"         
[21] "Kamino"         NA               "Trandosha"      "Socorro"       
[25] "Bespin"         "Mon Cala"       "Chandrila"      NA              
[29] "Endor"          "Sullust"        NA               "Cato Neimoidia"
[33] "Coruscant"      "Naboo"          "Naboo"          "Naboo"         
[37] "Naboo"          "Toydaria"       "Malastare"      "Naboo"         
[41] "Tatooine"       "Dathomir"       "Ryloth"         "Ryloth"        
[45] "Vulpter"        "Troiken"        "Tund"           "Haruun Kal"    
[49] "Cerea"          "Glee Anselm"    "Iridonia"       "Coruscant"     
[53] "Iktotch"        "Quermia"        "Dorin"          "Champala"      
[57] "Naboo"          "Naboo"          "Tatooine"       "Geonosis"      
[61] "Mirial"         "Mirial"         "Naboo"          "Serenno"       
[65] "Alderaan"       "Concord Dawn"   "Zolan"          "Ojom"          
[69] "Kamino"         "Kamino"         "Coruscant"      "Aleen Minor"   
[73] NA               "Skako"          "Muunilinst"     "Shili"         
[77] "Kalee"          "Kashyyyk"       "Alderaan"       "Umbara"        
[81] "Utapau"         NA               NA               NA              
[85] NA               NA               "Naboo"         

Problem 4.5

Take a subset of the data table where the output is a data table containing only the home worlds

Solution
select(.data = starwars, homeworld)
# A tibble: 87 x 1
   homeworld
   <chr>    
 1 Tatooine 
 2 Tatooine 
 3 Naboo    
 4 Tatooine 
 5 Alderaan 
 6 Tatooine 
 7 Tatooine 
 8 Tatooine 
 9 Tatooine 
10 Stewjon  
# … with 77 more rows

Problem 4.6

Take a subset of the data table so that it only contains characters that have both black hair and blue eyes.

Get a hint
  • You’ll need to combine two logical operations.
Solution
filter(.data = starwars, hair_color == "black" & eye_color == "blue")
# A tibble: 2 x 14
  name  height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Lumi…    170  56.2 black      yellow     blue              58 fema… femin…
2 Barr…    166  50   black      yellow     blue              40 fema… femin…
# … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

Problem 4.7

Take a subset of the data table so that it only contains characters that either have black hair or have blue eyes.

Solution
filter(.data = starwars, hair_color == "black" | eye_color == "blue")
# A tibble: 30 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172  77   blond      fair       blue            19   male  mascu…
 2 Owen…    178 120   brown, gr… light      blue            52   male  mascu…
 3 Beru…    165  75   brown      light      blue            47   fema… femin…
 4 Bigg…    183  84   black      light      brown           24   male  mascu…
 5 Anak…    188  84   blond      fair       blue            41.9 male  mascu…
 6 Wilh…    180  NA   auburn, g… fair       blue            64   male  mascu…
 7 Chew…    228 112   brown      unknown    blue           200   male  mascu…
 8 Jek …    180 110   brown      fair       blue            NA   male  mascu…
 9 Boba…    183  78.2 black      fair       brown           31.5 male  mascu…
10 Land…    177  79   black      dark       brown           31   male  mascu…
# … with 20 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

Problem 4.8

Take a subset of the data so that it includes all the characters EXCEPT those that have both black hair and blue eyes.

Get a hint
  • Once you have a logical rule or rules set up, you can negate them by using the !
Solution
filter(.data = starwars, !(hair_color == "black" & eye_color == "blue"))
# A tibble: 85 x 14
   name  height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2     96    32 <NA>       white, bl… red             33   none  mascu…
 4 Dart…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia…    150    49 brown      light      brown           19   fema… femin…
 6 Owen…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4     97    32 <NA>       white, red red             NA   none  mascu…
 9 Bigg…    183    84 black      light      brown           24   male  mascu…
10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# … with 75 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#   films <list>, vehicles <list>, starships <list>

The Beck Depression Inventory data

The following questions all rely on beck_data data table. The beck_data contains data from 20 participants (identified by SubjCode), and their scores on 3 items from the Beck depression inventory (BDI) and 3 items from the Beck Anxiety Inventory (BAI). This data was supplied to you by a collaborator that isn’t very detail oriented, so they may not have employed consistent in how they named their variables

beck_data
# A tibble: 20 x 7
   SubjCode  BDI1  BDI2  Bdi3  BAI1  BAx2  BAI3
   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 OTLCZE       1     0     1     1     3     0
 2 NDMCLE       1     2     3     2     3     2
 3 LWBHYZ       2     0     3     1     1     3
 4 MDBONG       3     1     2     1     1     2
 5 IEHLOP       3     2     2     2     2     2
 6 MOLDIY       2     1     1     3     1     0
 7 FMESAI       2     2     2     1     0     0
 8 OAKBCE       2     0     2     0     1     2
 9 RKUPQC       2     2     0     1     1     1
10 OQRUZE       2     3     2     1     2     1
11 RESZOA       0     2     2     0     3     3
12 MKQJWV       2     2     0     3     2     0
13 PUSELO       3     3     2     2     1     1
14 FRBHDW       1     2     3     2     2     2
15 JXIWPN       2     0     0     2     2     1
16 ZPCGIO       2     2     3     1     0     2
17 ZMONYS       2     3     2     0     2     0
18 LMWVRZ       1     0     1     1     2     1
19 VALHXF       2     3     2     2     2     2
20 YAFVOM       1     1     1     0     3     3

Problem 4.9

Take a subset of the BDI data so that you’re left with a data table containing only the participant identifier and the data from DBI

Get a hint
  • look for something that’s consistent between all the columns that you want and use that as the selector
  • The functions starts_with(), ends_with(), and contains() may be useful.
Solution
select(.data = beck_data, SubjCode, starts_with("bdi", ignore.case = T))
# A tibble: 20 x 4
   SubjCode  BDI1  BDI2  Bdi3
   <chr>    <dbl> <dbl> <dbl>
 1 OTLCZE       1     0     1
 2 NDMCLE       1     2     3
 3 LWBHYZ       2     0     3
 4 MDBONG       3     1     2
 5 IEHLOP       3     2     2
 6 MOLDIY       2     1     1
 7 FMESAI       2     2     2
 8 OAKBCE       2     0     2
 9 RKUPQC       2     2     0
10 OQRUZE       2     3     2
11 RESZOA       0     2     2
12 MKQJWV       2     2     0
13 PUSELO       3     3     2
14 FRBHDW       1     2     3
15 JXIWPN       2     0     0
16 ZPCGIO       2     2     3
17 ZMONYS       2     3     2
18 LMWVRZ       1     0     1
19 VALHXF       2     3     2
20 YAFVOM       1     1     1

Problem 4.10

Take a subset of the data so that you only have the participant identifier and the third item of each scale

Solution
select(.data = beck_data, SubjCode, ends_with("3"))
# A tibble: 20 x 3
   SubjCode  Bdi3  BAI3
   <chr>    <dbl> <dbl>
 1 OTLCZE       1     0
 2 NDMCLE       3     2
 3 LWBHYZ       3     3
 4 MDBONG       2     2
 5 IEHLOP       2     2
 6 MOLDIY       1     0
 7 FMESAI       2     0
 8 OAKBCE       2     2
 9 RKUPQC       0     1
10 OQRUZE       2     1
11 RESZOA       2     3
12 MKQJWV       0     0
13 PUSELO       2     1
14 FRBHDW       3     2
15 JXIWPN       0     1
16 ZPCGIO       3     2
17 ZMONYS       2     0
18 LMWVRZ       1     1
19 VALHXF       2     2
20 YAFVOM       1     3

CC-BY-NC-SA-4.0Lincoln J Colling