R code
# The above warning: false command is used to suppress
# known warning messages when rendering the qmd file
library(tidyverse)
R provides comprehensive graphics utilities for visualizing and exploring scientific data. To date we have been making a few plots using the R Base Graphics. In addition, several more recent graphics environments extend these utilities. These include the grid
, lattice
and ggplot2
packages. All have the roles, but ggplot2
environment that is part of the Tidyverse package has become popular and is now used for many R packages and in scientific publications.
ggplot2
is meant to be an implementation of the Grammar of Graphics, hence the gg in ggplot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By directly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Hadley Wickham (2010), the author of ggplot2
said,
“A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics.”
That said learning good gradatmmar can be challenging, especially when the computer is catching your mistakes.
https://ggplot2.tidyverse.org/index.html
You can make amazing graphs with ggplot, but there is a long learning curve so we will have multiple lab sessions on ggplot and graphing. Here are a few different resources for ggplot
.
Just like last week we will be writing our code in a Quarto Markdown (qmd) file. Remember to use the following formatting in your YAML block
---
title: "Lab 2 Data Visualization"
author: "You"
format:
html:
toc: true
toc_float: true
embed-resources: true
editor: visual
---
In this course we will work with many different R packages that will need to fbe installed on your computer. I have already installed most of these packages for students on Posit Cloud. If you are working on your own computer or on Unity, you can install them using Tools > Install Packages. You only need to install a package once!
To work with an R package load it with the library
command. I always load my packages at the beginning of my files.
# The above warning: false command is used to suppress
# known warning messages when rendering the qmd file
library(tidyverse)
In most labs we will be loading in data from files (e.g. our 23andME SNP data). Today and next week for simplicity we will work with data sets that come with R and the are available as R packages.
R contains pre-loaded data sets that will see in many examples posted on the internet. The mtcars
and iris
data sets are very popular. You can see the whole list by typing data(). This will pop up a window with a list of the data sets.
# The above eval: false statement is used because it will result
# in an error when rendering the qmd file.
data()
In class we will talk more about the structure of a data set, which can be summarized using the str
command
glimpse(iris)
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
You can see the whole data set by typing the name iris
or by typing view(iris)
which will pop up a window with the data set. However we don’t want to show all 150 observations (rows) of the iris data set in this document. We can use the head
command to show just the first 5 rows.
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
R for Data Science uses the palmerpenguins
package, “which includes the penguins dataset containing body measurements for penguins on three islands in the Palmer Archipelago, and the ggthemes package, which offers a colorblind safe color palette. We will load these for our work today.”
library(palmerpenguins)
library(ggthemes)
Data Analysis and Visualization in R for Ecologists uses the ratdat
package, a long-term dataset from Portal, Arizona, in the Chihuahuan desert.
library(ratdat)
The help command can be used to learn more about the palmerpenguins
and ratdat
packages. After running the below commands, in the right bottom corner under the Help
tab the package documentation can be viewed
help(package="palmerpenguins")
help(package="ratdat")
Today we will walk through Chapter 1 of R for Data Science. By putting the examples and exercises in our own Quarto Markdown file, we can create own personal path through the Chapter. We will do sections 1.2 and 1.3
today and the later sections next week. In section 1.2
you only need to do exercises 1-5, 9 and 10. Make are readable report by delineating the sections (e.g. 1.2.3 Creating a ggplot) with hashtags so they are visible in your report outline.
After you Render
the qmd file to an html file, export the file to your computer and upload it to Canvas.