R is a free software environment for statistical computing and graphics. It is released under the GNU General Public License, and an alternative to other popular commercial softwares such as Stata, or even Microsoft Excel. Many statisticians and data scientists use R (many also use python) for exploring data.
We shall use Intermediate Macroeconomics as an excuse to teach ourselves some elements of R. Macroeconomic data is constantly evolving in real-time, and R allows you to retrieve real-time data from up-to-date sources, as well as work with these datasets, quite easily. Learning a statistical software would also help you think critically about macroeconomic analysis that is typically found in The Economist, the Wall Street Journal, the New York Times, and other reliable sources of information, which often make heavy use of data. R make computations easy (such as transforming values to growth rate), detrending, as well as presenting data in the best way possible (such as plotting the time series of economic quantities, etc.).
Moreover, many companies/universities in the world currently demands graduates with data science skills. Even the consulting and the finance industry increasingly require knowing how to manipulate datasets. I view Intermediate Macroeconomics as an occasion to teach you these general purpose skills (as economists like to call them). Finally, these skills might also prove useful when you take an Econometrics or a Statistics class, in which you will learn more thoroughly the tools of regression analysis (if you have not already).
Cheatsheets are a great way to get started on R. A list of cheatsheets is available here. The most important cheatsheets are:
dplyr
cheatsheet.ggplot2
cheatsheet.dplyr
dplyr
for data transformation. Note, in particular, the use of pipes %>%:
+ x %>% f(y) is the same as f(x, y).
+ y %>% f(x, ., z) is the same as f(x, y, z).
+ “Piping” with %>% makes code more readable.
ggplot2
cheatsheet# # A tibble: 3 × 2
# Species avg
# <fct> <dbl>
# 1 versicolor 2.77
# 2 virginica 2.97
# 3 setosa 3.43
The following packages are particularly useful:
ggplot2
for data visualization. Cheatsheet.
stringr
for string manipulation. Cheatsheet. Cheatsheet on Regular Expressions.
In addition to the tidyverse
collection of R packages, I also use the following packages:
lubridate
for working with dates (very useful in macroeconomics !). Cheatsheet.tidyverse
also contains readr
which allows to read in data. Cheatsheet
The power of R is now illustrated using the first Figure in Lecture 1. Data for GDP is taken from the Bureau of Economic Analysis’s National Income and Product Accounts (NIPA) here and data for Personal Consumption Expenditures is taken from there. Note that the BEA’s website is here, and that you get access to this data clicking on Section 1 - Domestic Product and Income and then on Table 1.1.5. which has Gross Domestic Product (A) (Q) (Annual and Quarterly).
However, instead of retrieving this data from the BEA, we use the rdbnomics
R-package which retrieves data from this website. You must first install and load this package through the following code:
::install_github("dbnomics/rdbnomics")
devtools<- c("tidyverse", "lubridate", "rdbnomics", "scales", "fredr")
pklist source("https://fgeerolf.com/code/load-packages.R")
load(url("https://fgeerolf.com/data/us/nber_recessions.RData"))
fredr_set_key(fred_key)
rdb
rdb(ids = c("BEA/NIPA-T10105/A191RC-A",
"BEA/NIPA-T10105/DPCERC-A")) %>%
glimpse()
# Rows: 186
# Columns: 19
# $ `@frequency` <chr> "annual", "annual", "annual", "annual", "annual", "ann…
# $ concept <chr> "gross-domestic-product", "gross-domestic-product", "g…
# $ Concept <chr> "Gross domestic product", "Gross domestic product", "G…
# $ dataset_code <chr> "NIPA-T10105", "NIPA-T10105", "NIPA-T10105", "NIPA-T10…
# $ dataset_name <chr> "Table 1.1.5. Gross Domestic Product - LastRevised: Ja…
# $ FREQ <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
# $ Frequency <chr> "Annually", "Annually", "Annually", "Annually", "Annua…
# $ indexed_at <dttm> 2022-01-28 05:22:49, 2022-01-28 05:22:49, 2022-01-28 …
# $ metric <chr> "millions-of-current-dollars", "millions-of-current-do…
# $ Metric <chr> "Millions of current Dollars", "Millions of current Do…
# $ original_period <chr> "1929", "1930", "1931", "1932", "1933", "1934", "1935"…
# $ original_value <chr> "104556", "92160", "77391", "59522", "57154", "66800",…
# $ period <date> 1929-01-01, 1930-01-01, 1931-01-01, 1932-01-01, 1933-…
# $ provider_code <chr> "BEA", "BEA", "BEA", "BEA", "BEA", "BEA", "BEA", "BEA"…
# $ series_code <chr> "A191RC-A", "A191RC-A", "A191RC-A", "A191RC-A", "A191R…
# $ series_name <chr> "Gross domestic product (line 1) - Annually", "Gross d…
# $ unit <chr> "level", "level", "level", "level", "level", "level", …
# $ Unit <chr> "Level", "Level", "Level", "Level", "Level", "Level", …
# $ value <dbl> 104556, 92160, 77391, 59522, 57154, 66800, 74241, 8483…
fredr
map_dfr(c("GDP", "PCE"), fredr) %>%
glimpse()
# Rows: 1,060
# Columns: 5
# $ date <date> 1946-01-01, 1946-04-01, 1946-07-01, 1946-10-01, 1947-0…
# $ series_id <chr> "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP", "GDP",…
# $ value <dbl> NA, NA, NA, NA, 243.164, 245.968, 249.585, 259.745, 265…
# $ realtime_start <date> 2022-02-14, 2022-02-14, 2022-02-14, 2022-02-14, 2022-0…
# $ realtime_end <date> 2022-02-14, 2022-02-14, 2022-02-14, 2022-02-14, 2022-0…
rdb(ids = c("BEA/NIPA-T10105/A191RC-A",
"BEA/NIPA-T10105/DPCERC-A")) %>%
mutate(value = value / 1000000,
series_name = series_name %>% gsub(" - Annually", "", .)) %>%
select(date = period, value, series_name) %>%
glimpse()
# Rows: 186
# Columns: 3
# $ date <date> 1929-01-01, 1930-01-01, 1931-01-01, 1932-01-01, 1933-01-0…
# $ value <dbl> 0.104556, 0.092160, 0.077391, 0.059522, 0.057154, 0.066800…
# $ series_name <chr> "Gross domestic product (line 1)", "Gross domestic product…
ggplot2
rdb(ids = c("BEA/NIPA-T10105/A191RC-A",
"BEA/NIPA-T10105/DPCERC-A")) %>%
mutate(value = value / 1000000,
series_name = series_name %>% gsub(" - Annually", "", .)) %>%
select(date = period, value, series_name) %>%
ggplot(.) + xlab("") + ylab("Trillion") + theme_minimal() +
geom_line(aes(x = date, y = value, linetype = series_name)) +
scale_y_continuous(breaks = seq(0, 20, 2.5),
labels = scales::dollar_format(accuracy = 0.1))
ggplot2
The pretty graph looks like this:
rdb(ids = c("BEA/NIPA-T10105/A191RC-A",
"BEA/NIPA-T10105/DPCERC-A")) %>%
mutate(value = value / 1000000,
series_name = series_name %>% gsub(" - Annually", "", .)) %>%
select(date = period, value, series_name) %>%
ggplot(.) +
geom_line(aes(x = date, y = value, linetype = series_name)) + theme_minimal() +
geom_rect(data = nber_recessions %>%
filter(Peak > as.Date("1928-01-01")),
aes(xmin = Peak, xmax = Trough, ymin = -Inf, ymax = +Inf),
fill = 'grey', alpha = 0.5) +
theme(legend.title = element_blank(),
legend.position = c(0.3, 0.8)) +
scale_x_date(breaks = nber_recessions$Peak,
labels = date_format("%y")) +
xlab("") + ylab("Trillion") +
scale_y_continuous(breaks = seq(0, 20, 2.5),
labels = scales::dollar_format(accuracy = 0.1))