R code
library(tidyverse)AI won’t take your job, but someone using who knows how to use AI might. Think of AI as a force multiplier. You have to learn to code first before you can use AI to help you. Google recently reported that about 25% of its new code is AI-generated.
Microsoft designed Copilot to work off of the latest version of OpenAI’s GPT model, GPT-4. GPT-5 is coming soon.
R provides comprehensive graphics utilities for visualizing and exploring scientific data. To date we have been making a few plots using the R Base Graphics. In addition, several more recent graphics environments extend these utilities. These include the grid, lattice and ggplot2 packages. All have the roles, but ggplot2 environment that is part of the Tidyverse package has become popular and is now used for many R packages and in scientific publications.
ggplot2 is meant to be an implementation of the Grammar of Graphics, hence the gg in ggplot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics. By directly controlling that grammar, you can generate a large set of carefully constructed graphics from a relatively small set of operations. As Hadley Wickham (2010), the author of ggplot2 said,
“A good grammar will allow us to gain insight into the composition of complicated graphics, and reveal unexpected connections between seemingly different graphics.”
You can make amazing graphs with ggplot, but there is a long learning curve so we will have multiple lab sessions on ggplot and graphing. Here are a few different resources for ggplot.
Just like last week we will be writing our code in a Quarto Markdown (qmd) file. Remember to use the following formatting in your YAML block. You can add different themes or change the parameters below, but you need to put in the embed-resources: true line true into the YAML block.
---
title: "Lab 2 Data Visualization"
author: "You"
format:
html:
toc: true
toc_float: true
embed-resources: true
execute:
warning: false
message: false
---
In this course we will work with many different R packages that will need to be installed on your computer. I have already installed most of these packages for students on Posit Cloud. If you are working on your own computer or on Unity, you can install them using Tools > Install Packages. You only need to install a package once!
To work with an R package load it with the library command. I always load my packages at the beginning of my files.
In most labs we will be loading in data from files (e.g. our 23andME SNP data). Today and next week for simplicity we will work with data sets that come with R and the are available as R packages.
R contains pre-loaded data sets that will see in many examples posted on the internet. The mtcars and iris data sets are very popular. You can see the whole list by typing data(). This will pop up a window with a list of the data sets. Include #| eval: false within your R code chunk if you want to show but not run code. You can also use the older R Markdown style of including it in the header ```{r eval = FALSE}
In class we will talk more about the structure of a data set, which can be summarized using the str command
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
You can see the whole data set by typing the name iris or by typing view(iris) which will pop up a window with the data set. However we don’t want to show all 150 observations (rows) of the iris data set in this document. We can use the head command to show just the first 5 rows.
R for Data Science uses the palmerpenguins package, “which includes the penguins dataset containing body measurements for penguins on three islands in the Palmer Archipelago, and the ggthemes package, which offers a colorblind safe color palette. We will load these for our work today.” You likely will need to install these packages before loading the libraries.
Data Analysis and Visualization in R for Ecologists uses the ratdat package, a long-term dataset from Portal, Arizona, in the Chihuahuan desert.
The help command can be used to learn more about the palmerpenguins and ratdat packages. After running the below commands, in the right bottom corner under the Help tab the package documentation can be viewed. I used #| eval: false in the below code chunk.
Today we will walk through Chapter 1 of R for Data Science. By putting the examples and exercises in our own Quarto Markdown file, we can create own personal path through the Chapter. Make are readable report by delineating the sections (e.g. 1.2.3 Creating a ggplot) with hashtags so they are visible in your report outline. Include all of the example code in the chapter in your report (In addition to the exercises).
Working through the exercises is a great time to explore changing the code with or without Copilot! Answers to all the questions are available online thanks to Martin Lukic and others. I recommend not using these, but learn how to use Copilot to help when your are not sure and to ask me questions during class, help sessions or email.
In your report include notes on the places you used Copilot and your prompts. One way to do this would be to have
There are probably better ways to do this. Think of one that works for you and clearly communicates to me your strategies.
After you Render the qmd file to an html file, export the file to your computer and upload it to Canvas.