Locally Weighted Linear Regression: When One Line Isn’t Enough (and Why It’s Non-Parametric!) ✨🗺️

July 26, 2025

This content originally appeared on DEV Community and was authored by Randhir Kumar

Locally Weighted Linear Regression: When One Line Isn’t Enough (and Why It’s Non-Parametric!)

Hey everyone! My name is Randhir, and as an ethical hacker, machine learning enthusiast, deep learning practitioner, and web developer, I’m constantly exploring algorithms to build better tools like my current AI SaaS projects:TailorMails.dev (my personalized cold email tool that crafts outreach based on LinkedIn bios!).

In our journey through Linear Regression, we’ve talked about finding a single set of parameters $θ\theta$ for our hypothesis $hθ(x)=θTxh_\theta(x) = \theta^Tx$ . But what if the real relationship between $x$ and $y$ isn’t a straight line? Adding polynomial features can lead to overfitting… so, what’s a data scientist to do?

Enter Locally Weighted Linear Regression (LWR) – a clever alternative that adapts locally! Let’s dive in!

Addressing Model Fit Issues: Beyond Simple Lines

Standard linear regression tries to fit one global line (or hyperplane) through all your data. This can lead to problems:

Underfitting: If the true relationship between $x$ and $y$ is non-linear, a simple linear function simply can’t capture it. Your model will perform poorly, even on training data.
Overfitting: To compensate for non-linearity, one might add many polynomial features (e.g., $x^2, x^3$ ). While this can fit the training data perfectly, it often leads to a model that’s too complex and performs terribly on new, unseen data. It essentially “memorizes” the training examples rather than learning the underlying pattern.

LWR aims to sidestep these issues by making the choice of features “less critical,” assuming you have enough training data. It’s about adapting the model locally.

Core Mechanism – Weighted Least Squares

Instead of fitting one $θ\theta$ for the entire dataset, LWR takes a different approach:

Local Fitting: For every specific query point $x$ where you want a prediction, LWR computes a new set of parameters $θ\theta$ . This means the model isn’t global; it’s tailored to the specific region around your prediction point.
Weighted Cost Function: This “local” fitting is achieved by minimizing a weighted least-squares cost function:

$\sum_{i=1}^n w^{(i)}(y^{(i)} – \theta^Tx^{(i)})^2$ Here, the $w^{(i)}$ are non-negative weights. Intuitively, they dictate how much influence each training example’s error ( $y(i)−θTx(i)y^{(i)} – \theta^Tx^{(i)}$ ) has on determining the $θ\theta$ for this specific query point $x$ .
The Gaussian Kernel Weights: A common and effective choice for these weights is a Gaussian kernel:

$w^{(i)} = \exp\left(-\frac{(x^{(i)} – x)^2}{2\tau^2}\right)$
This formula is key! It means:
- Training examples $x^{(i)}$ that are closer to the query point $x$ will have a $x^{(i)} – x)^2$ value close to zero, making $w^{(i)}$ close to $exp⁡(0)=1\exp(0) = 1$ . They get a very high “weight” or importance.
- Training examples $x^{(i)}$ that are farther from $x$ will have a large $x^{(i)} – x)^2$ , causing $w^{(i)}$ to rapidly approach zero. They are given very little importance.
The Bandwidth Parameter $τ\tau$ (tau): This crucial parameter controls how quickly the weight of a training example diminishes with distance.
- A small $τ\tau$ means weights drop off very quickly, leading to a “very local” fit (potentially overfitting if too small).
- A large $τ\tau$ means weights drop off slowly, making the fit more “global” (closer to standard linear regression).
It’s important to remember that these weights $w^{(i)}$ are deterministic values based on distance, not random variables, despite the Gaussian form. If $x$ is a vector, the distance is typically Euclidean.

Non-Parametric Nature: A Different Kind of Model

LWR is often introduced as a prime example of a non-parametric algorithm. This is a significant distinction from what we’ve seen so far:

Parametric Algorithms (e.g., Standard Linear Regression):
- Have a fixed, finite number of parameters (the $θj\theta_j$ ‘s).
- Once these parameters are learned from the data, the original training data is no longer needed to make future predictions. You just need the $θ\theta$ values.
Non-Parametric Algorithms (e.g., LWR):
- The “complexity” of the hypothesis (the amount of information needed to represent $h$ ) grows linearly with the size of the training set.
- To make any prediction for a new query point $x$ , the entire training set must be kept available because the model parameters $θ\theta$ are re-computed for each new query.

This “non-parametric” nature is both a strength (adaptability) and a weakness (computational cost for large datasets and predictions).

Placement and Importance in the Text

While LWR provides an elegant solution for non-linearity and offers a glimpse into different model complexities, it’s often labeled as “optional reading” in foundational texts. This suggests it might be considered less fundamental than the core LMS algorithm or The Normal Equations for an initial grasp of linear regression.

However, it beautifully illustrates diverse strategies for handling complex data relationships beyond simply adding more global polynomial features. It shows that sometimes, a local approach can be more flexible and robust!

Wrapping Up

Locally Weighted Linear Regression offers a fascinating departure from global model fitting in linear regression. By re-computing parameters locally for each prediction using weighted least squares, it effectively handles non-linear relationships without explicit feature engineering. Its non-parametric nature is a key concept, highlighting that not all models can “forget” their training data.

As I continue to build out my AI SaaS tools, TailorMails.dev, exploring these nuances in algorithms helps me choose the right tool for the right job, balancing complexity, performance, and interpretability.

If you found this helpful or insightful, consider supporting my work! You can grab me a virtual coffee here: https://buymeacoffee.com/randhirbuilds. Your support helps me keep learning, building, and sharing!

This content originally appeared on DEV Community and was authored by Randhir Kumar