Read both California energy datasets. Make sure the
datetime variable is in an appropriate data type (i.e. not
character).
Merge the two datasets and then melt the resulting dataframe/datatable to make it tidy.
Create a series of new variables:
day, which is the year-month-day, without the hour. The
lubridate function as_date will do this.log_output, which is the natural log of the
output.per_output, which is the
percent of daily output represented by each observation. You will need
to use group_by and to create a new variable with the total
output for the day. (Make sure to use ungroup() after
this!)Bonus: If you are using dplyr, try to do this all in one
pipe!
dplyr verb arrange(desc(variable)) to order
the data frame so that the largest value of variable is
first. Don’t use desc and it arranges in ascending order.
The data.table function is setorder.) Which
has the least?The dataset regroup.csv has information about which
sources are considered renewable by the state of California. Use this
dataset, along with yourdata manipulation skills, to explore the use of
renewable and non-renewable sources. Annotate what your descisions for
the analysis.
Hint: Use your merge skills to merge the CA energy data with the
regroup data. Which variable should you join by?