The ggplot2
package is part of tidyverse
, a
suite of packages maintained by the RStudio company. In ggplot, graphics
are built by supplying data and mapping of data values
to aesthetics, and then adding layers that build geometric
objects, scales, labels, and more.
Before using ggplot, make sure to load the package. You can either
load tidyverse
or ggplot2
directly.
library(tidyverse)
library(ggplot2)
dplyr
and other
tidyverse
packages.ggigraph
),
animated graphics (gganimate
), alluvial diagrams
(ggalluvial
), maps (ggmap
), networks
(ggraph
), and more.ggplot2
.ggplot2
, this will be more useful in conjunction with the
cheatsheet since it contains more detail on syntax and links to full
help files.tidyverse
.ggplot2
that enables users to
launch an graphical user interface (GUI) with which to build ggplot
graphics layer-by-layer. When completed, the user can copy the code
required to generate the plot and return to their script file.
ggplot works by combining several functions using the +
operator. This creates layers in an additive fashion. You can
start a new line after each +
operator for readability if
you would like (I do so throughout these notes.) Each function does
something specific: provide a dataset, create a geometric object, add
labels, add scales, change the coordinate system or layout, change the
color palette, etc.
%>%
from dplyr
with the +
from
ggplot2
!Each graphic generated by ggplot requires at least three basic components:
ggplot()
function’s
required data =
argument.geom
functions, such as geom_point()
or
geom_hist()
.aes()
function nested
within the mapping =
argument of the geom
function.
aes()
function to the
mapping =
argument of ggplot()
. If you do so,
all geom
functions will pull from the aesthetic
mapping provided in this initial ggplot()
function.Together, the basic structure looks like this:
ggplot(data = <DATA FRAME>) +
<GEOM_FUNCTION>(mapping = aes(<VARIABLES>))
To demonstrate with an example, let’s re-create the scatterplot we made in Day 1 using data from Gapminder. Recall that we used the 2007 data subset and plotted the relationship between life expectancy on the y-axis and GDP per capita on the x-axis. Using the structure above, we would proceed as follows:
data =
argument of the initial
ggplot()
function.geom_point()
as our second function, since the
geometric object we want to generate is points.mapping =
argument of
geom_point()
, we provide our aesthetic mapping. In this
case, we need to provide an x
and y
variable
for the points.Altogether, it looks like this:
ggplot(data = gapminder07) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp))
## Or, you can provide the mapping in the call to `ggplot`
# ggplot(data = gapminder07, mapping = aes(x = gdpPercap, y = lifeExp)) +
# geom_point()
Recall how we added a horizontal line to the scatterplot on Day 1. We
can replicate that in ggplot by adding another layer using the
geom_hline()
function, which generates another geometric
object (specifically, a horizontal line). Notice that
data =
and mapping =
have been dropped here:
as with all functions in R, since these are the first arguments, R
assumes that the first inputs provided to the function are for these
first arguments.
ggplot(gapminder07) +
geom_point(aes(x = gdpPercap, y = lifeExp)) +
geom_hline(aes(yintercept = mean(lifeExp)))
Remember that you can also the aesthestics in the call to
ggplot
. This allows you to use the same aesthetic mapping
across a number of different geoms.
# Doesn't work!
# ggplot(data = gapminder07) +
# geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) +
# geom_smooth()
# Works!
ggplot(data = gapminder07, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
We can also add titles and axes as a layer using the
labs()
function.
ggplot(data = gapminder07) +
geom_point(aes(x = gdpPercap, y = lifeExp)) +
geom_hline(aes(yintercept = mean(lifeExp))) +
labs(title = "Relationship between life expectancy and GDP per capita in 2007",
x = "GDP per capita", y = "Life expectancy")
As you can see, the grammar of graphics used in ggplot2 breaks down the information that goes into each graphic into several layers, each of which you can customize.
Recall the role of pipes (%>%
) in dplyr
this morning. Since dplyr
and ggplot2
are both
part of tidyverse, they are designed to work well together. You can use
a series of pipes to prepare your data to be in the appropriate format,
subset, etc before plotting.
For example, consider the above plot. Instead of specifying the data
in the ggplot()
function, we can supply the data through a
pipe. Instead of using the subsetted gapminder07
data, we
can use the original gapminder
and use a
filter()
function to select the 2007 observations only.
gapminder %>%
filter(year == 2007) %>%
ggplot() +
geom_point(aes(x = gdpPercap, y = lifeExp)) +
geom_hline(aes(yintercept = mean(lifeExp))) +
labs(title = "Relationship between life expectancy and GDP per capita in 2007",
x = "GDP per capita", y = "Life expectancy")
For another example of preparing data for plotting, let’s turn to the
California energy data, which we will use for the rest of the ggplot
session. Recall that in the advanced manipulation session this morning,
you used dplyr to create long-format data frames long_gen
(all generation data) and long_merged_gen
(all generation +
imports data). Say we wanted to visualize the total energy generated in
the state over time. Starting with long_gen
, and using
group_by()
and summarize()
, we can sum the
output for each date-time value (i.e. each hour) and feed this directly
into ggplot()
. Note that this kind of graphing task is one
of the uses of having a long-format version of your data.
long_gen %>%
group_by(datetime) %>%
summarise(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=datetime, y=output)) +
labs(title="Total energy generated, by hour", x="Hour", y="Output (MW)")
Perhaps we are concerned that the above plot has too much
granularity. Instead, we want to plot the total output per day
instead of per hour. Again, we can achieve this by manipulating
our data prior to piping it into ggplot()
. Note that we use
the date()
function in lubridate
which takes a
date-time value and returns the date portion only.
long_gen %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date) %>%
summarise(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=date, y=output)) +
labs(title="Total energy generated, by day", x="Day", y="Output (MW)")
We have already seen two kinds of geometries:
geom_point()
which generates points using x
and y
values, and geom_col()
which generates
columns for a bar chart. Let’s explore a few more, and learn how to
modulate the appearance of the geometric objects created.
Let’s say we’d like to plot the amount of energy imported over time.
We can use geom_line
to generate a line connecting all the
data points. To do so, we need to provide x
and
y
values in the aesthetic mapping function nested within
geom_line()
. Note that we are using the
imports
data frame here, but we could easily use the wide-
or long-format merged data frames, as long as we use the appropriate
filtering functions.
imports %>%
ggplot() +
geom_line(aes(x=datetime, y=imports)) +
labs(title="Energy imports over time", x="Hour", y="Amount imported (MW)")
Once we have created the geometric object we want, using the
appropriate aesthetic mapping, we may want to change the appearance of
the object. The features that we can modify vary depending on the
geom
object (the cheatsheet and each function’s reference
files are helpful here). For most function, we can modify the size and
shape of the object(s) created. Let’s try to increase the size of the
line in the plot above and make it red in color. Note that the
col =
and size =
arguments here are
outside the aes()
function since we are not
mapping anything from the data frame here.
imports %>%
ggplot() +
geom_line(aes(x=datetime, y=imports), size=1.2, col="red") +
labs(title="Energy imports over time", x="Hour", y="Amount imported (MW)")
To learn more about the color inputs you can provide in R, see the R colors cheatsheet.
In addition to lines and columns, we can also generate an area plot
with geom_area()
. Let’s try it with a plot of wind power
generation over time. Note that to change the color on this plot, we use
fill =
rather than col =
. This is because we
want to fill the geometric object with a color; using
col =
would create an outline around the plot (try it out
to see the difference). Also note that
generation %>%
ggplot() +
geom_area(aes(x=datetime, y=wind), fill="darkblue") +
labs(title="Hourly wind power generation, Sept 3-9", x="Hour", y="Output (MW)")
Let’s explore one more plot that visualizes a different kind of
relationship. Instead of plotting trends over time, let’s plot the
distribution of each source’s output using a box plot (aka a
box-and-whisker plot) which shows the 25% quantile, mean, and 50%
quantile in a “box” and the minimum and maximum values in “whiskers”. We
can do so using the function geom_boxplot()
; note again
that this is a case where having long-format data is useful.
long_gen %>%
ggplot() +
geom_boxplot(aes(x=source, y=output)) +
labs(title="Amount of energy generated by each source, Sept 3-9", x="Source type", y="Output (MW)")
The above examples show cases where one geometric object is plotted
per graphic. ggplot
allows you to add multiple geometric
objects as layers. Below is a plot of large hydro power generation over
time, first shown using a line with geom_line()
. On top of
that line plot, we can also add a smoothed line with
geom_smooth()
, which plots smoothed conditional means
(estimated using a loess regression) in order to aid observation of
trends in cases of overplotting.
generation %>%
ggplot(aes(x=datetime, y=large_hydro)) +
geom_line(, col="turquoise3") +
geom_smooth(aes(x=datetime, y=large_hydro)) +
labs(title="Hydroelectric (large) generation per hour, Sept 3-9", x="Hour", y="Output (MW)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
An alternative syntax is to do the aesthetic mapping in the
ggplot()
function, after which all geom
functions will adopt the previously defined aesthetic mapping. The below
code generates a plot identical to the above.
merged_energy %>%
ggplot(aes(x=datetime, y=large_hydro)) +
geom_line(col="turquoise3") +
geom_smooth() +
labs(title="Hydroelectric (large) generation per hour, Sept 3-9", x="Hour", y="Output (MW)")
Usually when we plot multiple geometric objects in one graphic, we are trying to plot the same relationship across different groups in the data. We will explore how this works in the below section on visualizing grouped data.
There are many, many more geometric objects that you can create using ggplot. For a full list, see the cheatsheet and reference guide linked above. Several extension packages add even more geometric objects.
We have already seen how we can specify the title and axes labels of
a graphic with the labs()
layer. Note that you can also
specify a subtitle and caption for the graphic in the same layer. Let’s
see this in action using the line plot of imports data we created
earlier.
imports %>%
ggplot() +
geom_line(aes(x=datetime, y=imports), col="red") +
labs(title="Energy imports over time in California", subtitle="Hourly data from September 3-9, 2018",
caption="Source: California Energy Commission",
x="Hour", y="Amount imported (MW)")
For more advanced labeling, you can use the annotate()
layer, which generates text according to specifications at any point on
the coordinate system. See the cheatsheet and reference guide for more
details.
Scales allow you to manipulate the relationship between data values
and aesthetics beyond aesthetic mapping. In other words, they “control
the details of how data values are translated to visual properties”
(quote from reference
guide). You can set the transparency of objects using
scale_alpha
functions, modify color palettes using
scale_color
and scale_fill
functions, the
position of axis markers using scale_x
and
scale_y
functions, and more. The cheatsheet helpfully
organizes scale functions based on the type of task they are suited for,
and can see a full list of scale functions in the reference guide.
To see an example at work, let’s use the
scale_x_datetime()
function to manipulate the scaling and
display of datetime values of the x-axis. Specifically, we will use
date_labels =
to define which labels to use for the date
values and date_breaks =
to define how far apart the breaks
between the x-axis ticks should be.
(Note: %H:%M
means the datetime should be displayed in
24-hour clock format without any date information; you can see a list of
datetime conversion specifications in R by looking up
?strptime
.)
imports %>%
ggplot() +
geom_line(aes(x=datetime, y=imports), col="red") +
scale_x_datetime(date_labels="%H:%M", date_breaks="12 hours") +
labs(title="Energy imports over time in California", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Amount imported (MW)")
The most common use of scale layers is to specify and adjust color palettes used when plotting grouped data. We will see some examples of this in the below section on that topic.
ggplot comes with several preset themes, including:
theme_grey()
: this is the defaulttheme_bw()
: strip colors, including grey gradientstheme_dark()
and theme_light()
: these
change the backgrounds of the coordinate systemtheme_minimal()
and theme_void()
: these
are what they sound like. Try them out!For a full list, see the reference guide file on complete themes (looking up the help file for any of the above functions will open this file).
There are also many ggplot extension packages that provide additional themes.
Instead of using a pre-set theme, you can modulate many components of
the default theme using the theme()
function. In this
manually controlled theme layer, some of the things you can modify:
axis.labels.x = element_text()
argument in the
theme()
function).plot.background =
argument).For a full list of the components that can be modified in the theme
layer, see the help file for ggplot2::theme()
.
A common coordinate system adjustment layer is
coord_flip()
, which rotates the plot such that the
x
and y
axes are flipped. Let’s see an example
with the bar chart that we generated with geom_col()
earlier in the session.
long_gen %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date) %>%
summarise(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=date, y=output)) +
labs(title="Total energy generated, by day", x="Day", y="Output (MW)") +
coord_flip()
You can manipulate the coordinate system in other ways:
coord_flip()
is the default Cartesian coordinate system
used; by explicitly calling this function, you can change the limits of
the x
and y
axes from their defaultscoord_fixed()
sets a fixed aspect ratio between the
x
and y
axescoord_transform()
lets you transform the Cartesian
coordinates using functions like sqrt
or
log
coord_polar()
changes the coordinate system to polar
coordinates rather than a Cartesian systemThere are several stat
functions in ggplot that enable
you to conduct statistical transformations of your data prior to
plotting. stat
and geom
layers can be used
interchangeably, as each stat
layer has a geom
argument and vice versa. Some reasons that you may want to use a
stat
layer instead of a geom
layer, or specify
the stat
argument in a geom
layer include:
stat="identity"
to specify y
values in
geom_bar()
instead of the default stat="count"
which uses counts of each x
value for the y-axis.Since these are relatively rare and more advanced use cases of ggplot, we will not explore them in detail here. If you are interested, see Chapter 3.7 of the R for data science online textbook.
A common task in data visualization is plotting variables that are grouped, in order to make comparisons across groups or to show how a particular trend breaks down on a group level. For example, instead of visualizing energy generation as a whole, we may want to visualize generation from each source relative to ther sources.
There are two broad ways to visualize grouped data. One is to
generate multiple geometric objects of the same type for each group by
indicating the grouping variable using the group =
function
in the aes()
function. Another is to use facets. We will
examine both below. Throughout these examples, you will see one of the
uses of converting data to long-format.
Note that the col =
argument is specified within
aes()
here, as our goal is to map data values to the color
aesthetics. When supplied outside the aes()
function, the
role of col=
is to modify the color of the geometric object
unrelated to any data values.
Although ggplot will generally interpret a col =
argument inside aes()
as providing a grouping variable, it
is good practice to specify group =
anyway.
long_merged_energy %>%
ggplot() +
geom_line(aes(x=datetime, y=output, group=source, col=source)) +
labs(title="Output by energy source over time", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Output (MW)")
Let’s take a look at a more complex plot that uses grouping to generate multiple lines. For example, we might look at the above plot and think that the repeated patterns over multiple days is too noisy. Perhaps we are interested in the average trend over the course of a single day. To visualize this, we can plot the average output of each energy source per hour over the 7 days in the data. Comments are included in the code below describing what each step does. Note how the code for this plot combines several of the topics we have discussed so far (data manipulation before plotting, scales, themes, and labels).
long_merged_energy %>%
# Create a variable indicating hour only
mutate(hour=lubridate::hour(datetime)) %>%
# Group data by hour and source
group_by(hour, source) %>%
# Compute mean output for each hour-source unit
summarise(output=mean(output)) %>%
# Pipe data into ggplot
ggplot() +
# Plot lines for output over hour, grouped by source
geom_line(aes(x=hour, y=output, group=source, col=source), size=1) +
# Use Set3 color palette to distinguish lines from each other, and give legend a title
scale_color_brewer(palette="Set3", name="Energy Source") +
# Use dark theme to make colors more visible +
theme_dark() +
# Add labels
labs(title="Average hourly output by energy source", subtitle="Data collected during Sept 3-9, 2018",
x="Hour", y="Mean output (MW)")
## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.
Remember from earlier that we may need to use either
col =
or fill =
depending on the type of
geometric object whose appearance we are trying to change. This is the
same when we are mapping data to the colors. For example, in order to
change the color of the objects in an area plot or bar chart, we will
need to use fill =
rather than col =
.
In the above example, all groups are plotted overlaying each other. One way to visualize data more clearly, especially when we are dealing with bars or area plots rather than lines, is to use position adjustment.
For example, let’s plot a column chart with the total energy output
per day. Note that we use the fill =
argument here to
demarcate each group in a different color. By default, ggplot stacks all
the groups on top of each other.
long_merged_energy %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date, source) %>%
summarize(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=date, y=output, group=source, fill=source)) +
labs(title="Energy use by day", x="Day", y="Output (MW)")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
Instead of stacking the groups on top of each other, we can also use
the position = "dodge"
argument to create a column per
group and arrange them next to each other horizontally.
long_merged_energy %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date, source) %>%
summarize(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=date, y=output, group=source, fill=source), position="dodge") +
labs(title="Energy use by day", x="Day", y="Output (MW)")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
A third alternative is to use position = "fill"
, which
normalizes the height of each bar and as such shows the proportion of
the data in each bin that consists of each group.
long_merged_energy %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date, source) %>%
summarize(output=sum(output)) %>%
ggplot() +
geom_col(aes(x=date, y=output, group=source, fill=source), position="fill") +
labs(title="Energy use by day", x="Day", y="Output (MW)")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
Let’s see another example of how position adjustment works with an
area plot, of energy output over time by source.
geom_area()
defaults to position="stack"
when
using groups, but I have explicitly defined it here for
illustration.
long_merged_energy %>%
ggplot() +
geom_area(aes(x=datetime, y=output, group=source, fill=source), position="stack") +
labs(title="Energy use over time", subtitle="Data collected during September 3-9, 2018",
x="Hour", y="Output (MW)")
We might look at this plot and think it’s not particularly helpful.
To simplify the above plot, we can do two things: (1) use the categories
contained in the regroup
data frame to reduce the number of
categories; and (2) switch to position="fill"
instead of
position="stack"
.
# Make sure regroup data has been imported
regroup <- read_csv("data/ca_energy_regroup.csv")
## Rows: 12 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): type, group
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Prepare data frame
long_merged_energy %>%
rename(type = source) %>%
merge(regroup, by = "type") %>%
group_by(datetime, group) %>%
summarise(output=sum(output)) %>%
# Pipe data into ggplot
ggplot() +
geom_area(aes(x=datetime, y=output, group=group, fill=group), position="fill") +
labs(title="Energy use over time", subtitle="Data collected during September 3-9, 2018",
x="Hour", y="Output (MW)")
## `summarise()` has grouped output by 'datetime'. You can override using the
## `.groups` argument.
For certain geom
functions, such as
geom_point()
, we can demarcate groups by shape. This is
especially helpful in conjuction with line plots. For example, let’s
plot total output by day, grouped using the grouping variable from
regroup
. First, we will plot lines and use colors to
demarcate groups. Then, we will plot points and use shapes to demarcate
groups.
# Prepare data
long_merged_energy %>%
rename(type = source) %>%
merge(regroup, by = "type") %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date, group) %>%
summarise(output=sum(output)) %>%
# Pipe data into ggplot
ggplot() +
geom_line(aes(x=date, y=output, group=group, col=group), size=0.8) +
geom_point(aes(x=date, y=output, group=group, shape=group)) +
labs(title="Output by source group over time", subtitle="Data collected during September 3-9, 2018",
x="Date", y="Output (MW)")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
The shape =
approach works well for points, but when
working with line plots, we can demarcate groups by
linetype =
instead. For example, instead of combining
colored lines and shapes in the above plot, let’s use linetypes to
demarcate groups.
# Prepare data
long_merged_energy %>%
rename(type = source) %>%
merge(regroup, by = "type") %>%
mutate(date=lubridate::date(datetime)) %>%
group_by(date, group) %>%
summarise(output=sum(output)) %>%
# Pipe data into ggplot
ggplot() +
geom_line(aes(x=date, y=output, group=group, linetype=group), size=1) +
labs(title="Output by source group over time", subtitle="Data collected during September 3-9, 2018",
x="Date", y="Output (MW)")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
Especially useful for continuous variables, you can map the size and
alpha (transparency) of geometric objects to data values. Since the
grouping variables we have been working with in the California energy
data are categorical, let us briefly return to the gapminder data to
show an example. For example, you could create a scatterplot of the
relationship between life expectancy and logged GDP per capita using the
gapminder data, and map the values of population to the size of that
points by using aes(size=pop)
in geom_point()
.
To show how you can incorporate multiple grouping variables, I have also
added aes(col=continent)
.
gapminder07 %>%
ggplot() +
geom_point(aes(x=log(gdpPercap), y=lifeExp, size=pop, col=continent)) +
scale_size_continuous(name="Population") +
scale_color_discrete(name="Continent") +
labs(title="Life expectancy as a function of GDP per capita in 2007",
x="Logged GDP per capita", y="Life expectancy")
While color, fill, shapes, and linetypes are useful ways to plot
grouped data, we sometimes want to separate the visualization for each
group even more clearly. To do so, we can use faceting, i.e. plot each
group on a separate coordinate system. The simplest way to create facets
using a grouping variable is facet_wrap()
, but you can also
control the arrangement of facets or use two grouping variables with
facet_grid()
.
Let’s say we are interested in examining trends in energy generation by source. Using the skills we’ve learned so far, we could generate a line plot as follows.
long_gen %>%
ggplot() +
geom_line(aes(x=datetime, y=output, group=source, col=source), size=1) +
labs(title="Generation over time, by energy source", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Output (MW)")
But this feels too noisy, especially if our primary goal is to visualize the trend for each source. An alternative strategy is to facet by source rather than using a color aesthetic to separate each source.
long_gen %>%
ggplot() +
geom_line(aes(x = datetime, y = output)) +
facet_wrap(~source) +
labs(title="Generation over time, by energy source", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Output (MW)")
This is a little better! But note that the scales for each coordinate
system are fixed, i.e. the scale limits are the same for each plot. If
our main goal is to examine the patterns over time for each source,
rather than comparing the sources to each other, we can specify
scales="free"
in facet_wrap()
.
long_gen %>%
ggplot() +
geom_line(aes(x = datetime, y = output)) +
facet_wrap(~source, scales="free") +
labs(title="Generation over time, by energy source", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Output (MW)")
That’s much better! Note that we can combine faceting with the
previous skills we’ve learned to incorporate multiple groupings. For
example, we can use map the source groups in regroup
to the
color aesthetic in geom_line
as follows. Since this is a
fairly complicated mix of data manipulation and visualization, comments
are included describing each step.
# Begin data preparation
long_gen %>%
# Rename "source" variable to "type", to prepare for merging
rename(type = source) %>%
# Merge energy generation data with regroup data by "source" variable
merge(regroup, by = "type") %>%
# Pipe data into dplyr
ggplot() +
# Generate lines of output ~ datetime, color based on "group" variable, and increase size
geom_line(aes(x=datetime, y=output, group=group, col=group), size=1) +
# Adjust color palette for "group" colors and give legend a better name
scale_color_brewer(palette="Set1", name="Type of energy source") +
# Create facets by source, with free scales
facet_wrap(~type, scales="free") +
# Add labels
labs(title="Generation over time, by energy source", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Output (MW)") +
# Use the minmal theme
theme_bw() +
# Customize theme to move the legend to the bottom
theme(legend.position = "bottom")
Graphics generated using ggplot2
can be saved as images
in three ways:
ggsave()
function as an additional layer at the
end of your code. You must specify the filename when using
ggsave()
, and can additionally specify the filepath (if
different from working directory) and dimensions of the image, along
with many other settings.Let’s see an example where we combine methods 1 and 3. First, we will
save a graphic as an object, and then call the object and add a
ggsave()
layer to create a .png
file in our
directory.
# Save a column chart of imports over time
plot_importsovertime <- ggplot(imports) +
geom_line(aes(x=datetime, y=imports), col="red") +
scale_x_datetime(date_labels="%H:%M", date_breaks="12 hours") +
labs(title="Energy imports over time in California", subtitle="Hourly data from September 3-9, 2018",
x="Hour", y="Amount imported (MW)")
# Save the plot as an image
plot_importsovertime + ggsave("importsovertime.png", width=5, height=3)
Comment your code
In some of the examples above, you have seen that we can write comments within a flow of
%>%
or+
operators. It is highly recommended that you use such comments when doing complex data manipulation or data visualization tasks, or indeed when doing both together. This enables not only others to read your code, but also aids your own understanding of the code when returning to it at a later time.