This note is a very quick introduction to the R statistical software. You do not need to do any of this, or even download R Statistical Software, to follow Econ 102. This is only extra for those who want to play with the data, do their own research, or even fact-check me, etc.
R is a free software environment for statistical computing and graphics. It is released under the GNU General Public License, and an alternative to other popular commercial softwares such as Stata. Many statisticians and data scientists use R (and python) for exploring data.
Again, you do not have to, but you might want use Econ 102 as an excuse to teach yourself some elements of R. Macroeconomic data is constantly evolving in real-time, and R allows you to retrieve real-time data from up-to-date sources, as well as work with these datasets, quite easily. Learning a statistical software would also help you think critically about macroeconomic analysis that is typically found in the Wall Street Journal, the New York Times, and other reliable source of information, which often make heavy use of data. R make computations easy (such as transforming values to growth rate), detrending, as well as presenting data in the best way possible (such as plotting the time series of economic quantities, etc.).
Moreover, California currently demands graduates with data science skills. Even the consulting and the finance industry increasingly require knowing how to manipulate datasets. I view macroeconomics as an occasion to teach you these general purpose skills. Finally, these skills might also prove useful when you take Economics 103, in which you will learn more thoroughly the tools of regression analysis (if you have not already). Again, you do not at all need to learn R in order to succeed in this class, I am only providing you some material for those who are interested and want to do more.
Downloading. You need to install R
and Rstudio
:
First you must get the R statistical software, which you may download on the UCLA website here. The latest release (2018-07-02, Feather Spray) is version 3.5.1. For Mac OSX: download here. For Windows: download here.
Second, I recommend you use a Graphical User Interface (GNU) for R such as R Studio. R Studio’s latest release is 1.1.456: download here.
Introduction to R. Cheatsheets are a great way to get started on R. Many are available here, but the 2 main cheatsheets are:
Base R Cheatsheet.
Advanced R Cheatsheet
I use tidyverse
, from Hadley Wickham, for data manipulation as well as plotting data. This cheatsheet has a beginner’s introduction to tidyverse
. tidyverse
is a powerful collection of R packages that are data tools for transforming and visualizing data. Datacamp has a free tutorial for tidyverse
, which can get you started. The following packages are particularly useful:
dplyr
for data manipulation. Cheatsheet. Note, in particular, the use of pipes %>%:
# # A tibble: 3 x 2
# Species avg
# <fct> <dbl>
# 1 versicolor 2.77
# 2 virginica 2.97
# 3 setosa 3.43
ggplot2
for data visualization. Cheatsheet.
stringr
for string manipulation. Cheatsheet. Cheatsheet on Regular Expressions.
In addition to the tidyverse
collection of R packages, I also use the following packages:
lubridate
for working with dates (very useful in macroeconomics !). Cheatsheet.tidyverse
also contains readr
which allows to read in data. Cheatsheet
My lecture notes are created using R-markdown
, which you can learn using this cheatsheet, as well as this reference guide.
The power of R is now illustrated using the first Figure in Lecture 1.
Data for GDP is taken from the Bureau of Economic Analysis’s National Income and Product Accounts (NIPA) here and data for Personal Consumption Expenditures is taken from there. Note that the BEA’s website is here, and that you get access to this data clicking on Section 1 - Domestic Product and Income and then on Table 1.1.5. which has Gross Domestic Product (A) (Q) (Annual and Quarterly). However, instead of retrieving this data from the BEA, I use the rdbnomics
R-package which retrieves data from this website. You must first install and load this package through the following code:
devtools::install_github("dbnomics/rdbnomics")
pklist <- c("tidyverse", "lubridate", "rdbnomics", "scales")
source("https://fgeerolf.com/code/load-packages.R")
load(url("https://fgeerolf.com/data/us/nber_recessions.RData"))
Note that I also loaded tidyverse
containing in particular ggplot2
, dplyr
and stringr
mentioned above, as well as lubridate
for working easily with dates, and scales
to use 2-digit years on the x-axis. Finally, I added a dataset of US recessions as defined by the NBER. The code that you can see in lecture 1 showing the R-code looks like this:
rdb(ids = c("BEA/NIPA-T10105/A191RC-A",
"BEA/NIPA-T10105/DPCERC-A")) %>%
mutate(value = value / 1000000,
series_name = series_name %>% gsub(" - Annually", "", .)) %>%
select(date = period, value, series_name) %>%
ggplot(.) +
geom_line(aes(x = date, y = value, linetype = series_name)) + theme_minimal() +
geom_rect(data = nber_recessions %>%
filter(Peak > as.Date("1928-01-01")),
aes(xmin = Peak, xmax = Trough, ymin = -Inf, ymax = +Inf),
fill = 'grey', alpha = 0.5) +
theme(legend.title = element_blank(),
legend.position = c(0.3, 0.8)) +
scale_x_date(breaks = nber_recessions$Peak,
labels = date_format("%y")) +
xlab("") + ylab("Trillion") +
scale_y_continuous(breaks = seq(0, 20, 2.5),
labels = scales::dollar_format(accuracy = 0.1))
The first command loads in the data. A series of manipulations then transform the dates into numeric form, and keeps only the useful information for plotting: the date, the value, and the series name. Finally, ggplot2
is used to plot this data.
legend.title = element_blank()
removes the legend title, legend.position puts the legend into the graph. You can figure out using the ggplot2
Cheatsheet, or just googling them, what the other options are doing.