Let’s say you’ve got some data points and you make a scatterplot:

You might say *great!* and call it a day. But what if we want to see the behavior of the data between these points? Linear interpolation is a simple way to connect the dots:

And now at this point you might say *great!* and call it a day. Or, you might decide that you can do better than linear interpolation. That sure does look like a sine curve. But you’re working with a collection of discrete points, and you wouldn’t want to erroneously just plot a sine function. Instead, you can reach for a smoother interpolation function, such as a spline:

The term “spline” refers to a wide class of functions involving interpolation and smoothing. In data viz, we often see the basis spline (or, B-spline). Think of spline interpolation like a flexible ruler that bends to pass smoothly through all your data points, but in doing so, it might sometimes bend too much or too little. Sometimes the spline overshoots, introducing peaks or valleys that weren’t there in the original data.

Sometimes this is okay! Depending on your data, a spline may be ideal for generating a very smooth curve, especially when smoothness is more critical than accurately interpolating between every data point. And when the underlying function is oscillatory, a spline can capture the movement between points quite accurately. But real-world data is often not oscillatory.

Let’s say you’ve got a month’s worth of observed temperatures recorded in the Austin area:

And because temperatures exist on a continuous distribution, we could do a simple linear interpolation to articulate the rates of change between points:

But temperatures are unlikely to ascend or descend on linear gradients, so we could also try a spline:

That’s a bit more natural, but it looks a bit weird, too. Unlike our sine wave sampling from before, the data points here are of real, observed, daily maximum temperatures. So it’s a little strange that the fit curve overshoots and undershoots those known values. The interpolation is smooth, but the shape of the data has not been preserved.

While a spline produces smooth curves, the artifacts of overshooting, undershooting, or unwanted oscillations between data points can misrepresent what the data actually says. Fortunately, we have another option: the PCHIP, or Piecewise Cubic Hermite Interpolating Polynomial. Hermite refers to a method of interpolating data points where both the function values and the derivatives at those points are matched.

A PCHIP preserves the shape of the data and avoids oscillations. The monotonicity (increasing or decreasing trend) of the data is preserved, ensuring no overshoots between data points. I like to think of PCHIP as a hand that firmly (but not rigidly) guides a curve through the data points without allowing any unnecessary dips or rises.

Looks good! This results in a curve that better captures the shape of the function, especially when the slope information is critical. In our case, the slope is critical. It makes no sense to have a positive slope (overshooting) between points, when the next value decreased.

But PCHIP isn’t always better than Spline. Let’s apply a PCHIP interpolation to the oscillating data from before:

It’s not wrong, it’s just a little weird and lumpy. It’s a curve that connects the dots, but it somewhat lost the true movement between points.

PCHIP can aggressively flattened near local extrema. When you need to capture those local extrema — as we did in our temperature plots — PCHIP works well. When you need to capture the smooth movements of oscillatory data, Spline works well. Sometimes it’s fairly intuitive what you need. Sometimes you need to plot it to really see which works better. Other times it takes more thought.

Consider the nature of your data. If your data is smooth and continuous, like a waveform or a gradient, spline interpolation might work well. If your data has sharp changes or you need to preserve the natural shape of the data without introducing artifacts, PCHIP might be the better choice. In practical applications like elections modeling, financial forecasting, or engineering metrics, the choice can have significant implications.

Graphs!