Forecasting Electricity Demand in Seattle: Beating the Government Forecast

Photo by Nicholas Doherty on Unsplash

My final project in the Metis data science bootcamp was on forecasting hourly electricity demand. We barely touched on time series during the bootcamp, but my strong desire to work in clean energy after graduation required me to get acquainted with the dark horse of data science. As I discuss in great detail in another blog post, improving energy demand forecasts with machine learning is a key step in modernizing the electric grid, transitioning to renewables, and ultimately mitigating the effects of climate change.

Time series is a wicked beast of a subject, and I learned a lot while grinding away on this project (TL;DR statistical analysis before and after modeling is essential to handle non-stationarity, sometimes less data is better for time series models, and a proper time series split is essential). If anyone wants a comprehensive lesson on time series, I suggest the following resources: fpp2 and stats510. In addition to several articles on Towards Data Science, these two texts proved invaluable to my understanding of the math and theory behind time series analysis.

Data Collection and Cleaning

There remains a lot of uncertainty in electricity demand and generation, and this uncertainty leads to wasted resources and higher costs for both providers and consumers. According to the Environmental Defense Fund, three-quarters of what consumers spend on electricity every month is wasted. Considering Americans spend about $350 billion on electricity annually, that adds up to a lot of money. In Seattle, seven percent of the electricity that’s generated is later lost in the delivery path to homes and businesses, or misallocated due to forecasting errors and incorrect estimates of peak load time.

Advanced forecasting can make the generation and distribution of electricity as efficient and cost-effective as possible for utilities like Seattle City Light while also mitigating the effects of climate change.

For my project, I sourced historical hourly electricity demand data from February 2021 through the EIA API, and acquired historical hourly weather data from a NOAA bulk download. Overall the data was pretty clean, but there were a handful of outliers and missing values.

Outliers on the right

I manually researched the outliers and determined they were erroneous recordings since nothing extraordinary happened in Seattle on those days. I marked them as missing values and imputed using spline interpolation. Imputing with the mean or median is not recommended for time series. There were several days with more than one recording per hour, either for hourly electricity demand or weather, and I wrote a bunch of custom functions to subset the data in order to match the disparate datasets. I decided to only use one month of hourly data from February 2021.

Preprocessing: Handling Non-stationarity

The next major hurdle in a time series project is understanding the statistical properties of the data. In particular, great care must be taken to make the data stationary before modeling, meaning the statistical properties do not change over time.

ADF and KPSS Tests

I first ran my data through the ADF and KPSS tests to determine stationarity. The ADF test is used to determine the presence of unit root in the time series, and hence helps in understanding if the series is stationary or not. The null and alternate hypothesis of this test are:

Null Hypothesis: The series has a unit root.

Alternate Hypothesis: The series has no unit root.

If the null hypothesis in failed to be rejected, this test may provide evidence that the series is non-stationary.

The ADF test returned a p-value of 0.66 for my data. Based upon the significance level of 0.05 and the high p-value of ADF test, the null hypothesis cannot be rejected. Hence, the ADF test suggests the series can be concluded as non-stationary.

KPSS is another test for checking the stationarity of a time series. The null and alternate hypothesis for the KPSS test are opposite that of the ADF test.

Null Hypothesis: The process is trend stationary.

Alternate Hypothesis: The series has a unit root (series is not stationary).

The KPSS test returned a p-value of 0.046. Based upon the significance level of 0.05 and the p-value of KPSS test, there is evidence for rejecting the null hypothesis in favor of the alternative. Hence, the series is non-stationary as per the KPSS test. KPSS also indicates non-stationarity.

ACF and PACF Plots

We can get a sense of the data and glean insights into its non-stationarity through visualizations, such as by plotting the raw time series, the autocorrelation plot, and the partial autocorrelation plot.

Looking at a plot of the raw time series on top, we can already see some indicators of non-stationarity. We can see the mean is nonconstant and the repetitive fork-like intervals suggest some type of seasonality. The similar amplitude and frequency of the intervals suggest our time series is additive, not multiplicative.

The ACF plot on the left shows a strong correlation between the observation at time t and neighboring observations, suggesting an autoregressive process. We can also see there is a strong correlation between the observation at time t and at time t-24 by looking at the peaks in the plot. This confirms at least one seasonal component to the data that we take care to smooth out, likely a daily seasonal component.

In the case of the PACF plot, we see a sharp dropoff at the second residual, again confirming an autoregressive process, and not a moving average process.

Based on our findings here we can try differencing at lag 1 and lag 24 to see if that makes the data stationary. After differencing at lag 24…

…most visible seasonality has been handled with the disappearance of the repetitive, fork-like intervals. However, the autocorrelation function still has too many significant lags. To remove them, let’s take first order differences, subtracting the series from itself with lag 1.

After both seasonal and first order differencing, we can see most of the significant autocorrelations are gone and are now oscillating around 0. The differenced data passed the ADF and KPSS tests, indicating the data is now stationary enough to be put into a model.

In summation, these graphs and tests give an intuition of the time series data and will help inform which models to use and what the model parameters should be.


I did an 80/20 time series split on the 4 weeks of hourly data with the testing set designated as the last 5 days of February. For my baseline, I built an ARIMA model with one auto-regressive order, one order of differencing, and no moving average factor. Next, I applied a seasonal arima model with exogenous variables. I initialized a grid search with lists of parameters derived from our previous plotting and statistical tests, using both temperature and day of week as exogenous variables, and found the optimal parameters by selecting the model with the lowest AIC. I also wanted a model that could account for complex seasonality, so I built a TBATS model, which handles such cases very well. Finally I fed my data into a Facebook Prophet model, which can handle complex seasonality and hidden trends, and compared the root mean squared error of all of my models with that of the government’s day-ahead forecast model.

Here we have a plot of percent error for my models, with percent error being the ratio of the RMSE to the mean of the test set. The ARIMA model performed so poorly that I decided not to include it. The forecast converged to the mean very quickly, which goes to show that accounting for seasonality is crucial for an accurate forecast. The seasonal arima model performed well but was not as adaptive as TBATS and Prophet. Prophet had the lowest percent error, as I expected. Importantly, all three of these models outperformed the government’s (EIA) day-ahead demand forecast.

This is Prophet’s 5 day hourly electricity demand forecast, with the forecast in orange and the actual values in blue. We can see that it performed quite well on the test set, on average missing the actual demand value by about 87MWh. And we can see it tended to overestimate demand, especially around the peaks and troughs. After day 3 or so, the errors start to increase on average, suggesting my model may be most useful for short term forecasts.


My best model outperforms the government’s forecast by about 12MWh on average, and has less of a forecast bias. As we saw before, Prophet tends to overestimate hourly demand by about 32MWh. The government forecast tends to underestimate, which can have disastrous social consequences, such as blackouts caused by insufficient power supply. With the current cost of electricity in Seattle at 11.3 cents per kwh, implementing my model could potentially save the people of Seattle and Seattle City Light hundreds of dollars an hour in consumption and generation.

Our nation’s electric infrastructure is aging and is being pushed to do more than it was originally designed to do, and modernizing the grid through enhanced forecasting to make it smarter and more resilient will deliver electricity more reliably and efficiently, and can reduce peak loads and lower operational costs for utilities like Seattle City Light.

Data science, clean energy, civic tech, rock climbing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store