Lab 2 : Data Visualization with ggplot

Learning objectives

  • Instailling R packages
  • Built-in R data sets and data set packages
  • ggplot2

Introduction to R Graphics

R provides comprehensive graphics utilities for visualizing and exploring scientific data. To date we have been making a few plots using the R Base Graphics. In addition, several more recent graphics environments extend these utilities. These include the grid, lattice and ggplot2 packages. All have the roles, but ggplot2 environment that is part of the Tidyverse package has become popular and is now used for many R packages and in scientific publications.

ggplot2 and the Grammar of Graphics

ggplot2 is meant to be an implementation of the Grammar of Graphics, hence the gg in ggplot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By directly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Hadley Wickham (2010), the author of ggplot2 said,

“A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics.”

That said learning good gradatmmar can be challenging, especially when the computer is catching your mistakes.

https://ggplot2.tidyverse.org/index.html

Tutorials and resources

You can make amazing graphs with ggplot, but there is a long learning curve so we will have multiple lab sessions on ggplot and graphing. Here are a few different resources for ggplot.

On the Computer

Create and save your Quarto Markdown (qmd) file

Just like last week we will be writing our code in a Quarto Markdown (qmd) file. Remember to use the following formatting in your YAML block

---
title: "Lab 2 Data Visualization"
author: "You"
format:
  html:
    toc: true
    toc_float: true
    embed-resources: true
editor: visual
---

Installing and loading R packages

In this course we will work with many different R packages that will need to fbe installed on your computer. I have already installed most of these packages for students on Posit Cloud. If you are working on your own computer or on Unity, you can install them using Tools > Install Packages. You only need to install a package once!

To work with an R package load it with the library command. I always load my packages at the beginning of my files.

R code
# The above warning: false command is used to suppress
# known warning messages when rendering the qmd file

library(tidyverse)

Data for today’s lab

In most labs we will be loading in data from files (e.g. our 23andME SNP data). Today and next week for simplicity we will work with data sets that come with R and the are available as R packages.

Data sets (data frames) that come with R

R contains pre-loaded data sets that will see in many examples posted on the internet. The mtcars and iris data sets are very popular. You can see the whole list by typing data(). This will pop up a window with a list of the data sets.

R code
# The above eval: false statement is used because it will result
# in an error when rendering the qmd file. 

data()

In class we will talk more about the structure of a data set, which can be summarized using the str command

R code
glimpse(iris)
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

You can see the whole data set by typing the name iris or by typing view(iris) which will pop up a window with the data set. However we don’t want to show all 150 observations (rows) of the iris data set in this document. We can use the head command to show just the first 5 rows.

R code
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Data sets that are part of R packages

R for Data Science uses the palmerpenguins package, “which includes the penguins dataset containing body measurements for penguins on three islands in the Palmer Archipelago, and the ggthemes package, which offers a colorblind safe color palette. We will load these for our work today.”

R code
library(palmerpenguins)
library(ggthemes)

Data Analysis and Visualization in R for Ecologists uses the ratdat package, a long-term dataset from Portal, Arizona, in the Chihuahuan desert.

R code
library(ratdat)

The help command can be used to learn more about the palmerpenguins and ratdat packages. After running the below commands, in the right bottom corner under the Help tab the package documentation can be viewed

R code
help(package="palmerpenguins")
help(package="ratdat")

Exercises

R for Data Science Chapter 1.

Today we will walk through Chapter 1 of R for Data Science. By putting the examples and exercises in our own Quarto Markdown file, we can create own personal path through the Chapter. We will do sections 1.2 and 1.3 today and the later sections next week. In section 1.2 you only need to do exercises 1-5, 9 and 10. Make are readable report by delineating the sections (e.g. 1.2.3 Creating a ggplot) with hashtags so they are visible in your report outline.

What to upload to Canvas

After you Render the qmd file to an html file, export the file to your computer and upload it to Canvas.