Read both California energy datasets. Make sure the
datetime
variable is in an appropriate data type (i.e. not
character).
Merge the two datasets and then melt the resulting dataframe/datatable to make it tidy.
Create a series of new variables:
day
, which is the year-month-day, without the hour. The
lubridate
function as_date
will do this.log_output
, which is the natural log of the
output.per_output
, which is the
percent of daily output represented by each observation. You will need
to use group_by
and to create a new variable with the total
output for the day. (Make sure to use ungroup()
after
this!)Bonus: If you are using dplyr
, try to do this all in one
pipe!
dplyr
verb arrange(desc(variable))
to order
the data frame so that the largest value of variable
is
first. Don’t use desc
and it arranges in ascending order.
The data.table
function is setorder
.) Which
has the least?The dataset regroup.csv
has information about which
sources are considered renewable by the state of California. Use this
dataset, along with yourdata manipulation skills, to explore the use of
renewable and non-renewable sources. Annotate what your descisions for
the analysis.
Hint: Use your merge skills to merge the CA energy data with the
regroup
data. Which variable should you join by?