In this article, I will show you how to use the Savitzky-Golay filter in Python and show you how it works. To understand the Savitzky–Golay filter, you should be familiar with the moving average and linear regression.
Table of Contents
The Savitzky-Golay filter has two parameters: the window size and the degree of the polynomial.
The window size parameter specifies how many data points will be used to fit a polynomial regression function. The second parameter specifies the degree of the fitted polynomial function (if we choose 1 as the polynomial degree, we end up using a linear regression function).
In every window, a new polynomial is fitted, which gives us the effect of smoothing the input dataset.
Take a look at the following animation (Source: Wikipedia Author: Cdang, Licence: CC BY‑SA 3.0)
data:image/s3,"s3://crabby-images/4be14/4be148c797c11935833d988059f244c312351520" alt=""
In every step, the window moves and a different part of the original dataset is used. Then, the local polynomial function is fitted to the data in the window, and a new data point is calculated using the polynomial function. After that, the window moves to the next part of the dataset, and the process repeats.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Python
Here is a dataset of Bitcoin prices during the days between 2019-07-19 and 2019-08-17.
bitcoin.plot()
plt.title('Bitcoin price: 2019-07-19 - 2019-08-17')
plt.xlabel('Day')
plt.ylabel('BTC price in USD')
data:image/s3,"s3://crabby-images/30029/30029f366704667fa3154f66272c645aa8b10b87" alt=""
I’m going to smooth the data in 5 days-long windows using a first-degree polynomial and a second-degree polynomial.
from scipy.signal import savgol_filter
smoothed_2dg = savgol_filter(btc, window_length = 5, polyorder = 2)
smoothed_2dg
smoothed_1dg = savgol_filter(btc, window_length = 5, polyorder = 1)
smoothed_1dg
bitcoin['smoothed_2dg'] = smoothed_2dg
bitcoin['smoothed_1dg'] = smoothed_1dg
When we plot the result, we see the original data, and the two smoothed time-series.
bitcoin.plot()
plt.title('Bitcoin price: 2019-07-19 - 2019-08-17')
plt.xlabel('Day')
plt.ylabel('BTC price in USD')
data:image/s3,"s3://crabby-images/0a6b3/0a6b38da4d47226bfcfee9e4577b709895d53c3f" alt=""