This content originally appeared on DEV Community and was authored by Randhir Kumar
Locally Weighted Linear Regression: When One Line Isn’t Enough (and Why It’s Non-Parametric!)
Hey everyone! My name is Randhir, and as an ethical hacker, machine learning enthusiast, deep learning practitioner, and web developer, I’m constantly exploring algorithms to build better tools like my current AI SaaS projects:TailorMails.dev (my personalized cold email tool that crafts outreach based on LinkedIn bios!).
In our journey through Linear Regression, we’ve talked about finding a single set of parameters
ΞΈ\theta ΞΈ
for our hypothesis
hΞΈ(x)=ΞΈTxh_\theta(x) = \theta^Tx hΞΈβ(x)=ΞΈTx
. But what if the real relationship between
xx x
and
yy y
isn’t a straight line? Adding polynomial features can lead to overfitting… so, what’s a data scientist to do?
Enter Locally Weighted Linear Regression (LWR) β a clever alternative that adapts locally! Let’s dive in!
Addressing Model Fit Issues: Beyond Simple Lines 
Standard linear regression tries to fit one global line (or hyperplane) through all your data. This can lead to problems:
- Underfitting: If the true relationship between xx x and yy y is non-linear, a simple linear function simply can’t capture it. Your model will perform poorly, even on training data.
- Overfitting: To compensate for non-linearity, one might add many polynomial features (e.g., x2,x3x^2, x^3 x2,x3 ). While this can fit the training data perfectly, it often leads to a model that’s too complex and performs terribly on new, unseen data. It essentially “memorizes” the training examples rather than learning the underlying pattern.
LWR aims to sidestep these issues by making the choice of features “less critical,” assuming you have enough training data. It’s about adapting the model locally.
Core Mechanism β Weighted Least Squares 
Instead of fitting one ΞΈ\theta ΞΈ for the entire dataset, LWR takes a different approach:
Local Fitting: For every specific query point xx x where you want a prediction, LWR computes a new set of parameters ΞΈ\theta ΞΈ . This means the model isn’t global; it’s tailored to the specific region around your prediction point.
-
Weighted Cost Function: This “local” fitting is achieved by minimizing a weighted least-squares cost function:
βi=1nw(i)(y(i)βΞΈTx(i))2 \sum_{i=1}^n w^{(i)}(y^{(i)} – \theta^Tx^{(i)})^2 i=1βnβw(i)(y(i)βΞΈTx(i))2Here, the w(i)w^{(i)} w(i) are non-negative weights. Intuitively, they dictate how much influence each training example’s error ( y(i)βΞΈTx(i)y^{(i)} – \theta^Tx^{(i)} y(i)βΞΈTx(i) ) has on determining the ΞΈ\theta ΞΈ for this specific query point xx x . -
The Gaussian Kernel Weights: A common and effective choice for these weights is a Gaussian kernel:
w(i)=expβ‘(β(x(i)βx)22Ο2) w^{(i)} = \exp\left(-\frac{(x^{(i)} – x)^2}{2\tau^2}\right) w(i)=exp(β2Ο2(x(i)βx)2β)This formula is key! It means:- Training examples x(i)x^{(i)} x(i) that are closer to the query point xx x will have a (x(i)βx)2(x^{(i)} – x)^2 (x(i)βx)2 value close to zero, making w(i)w^{(i)} w(i) close to expβ‘(0)=1\exp(0) = 1 exp(0)=1 . They get a very high “weight” or importance.
- Training examples x(i)x^{(i)} x(i) that are farther from xx x will have a large (x(i)βx)2(x^{(i)} – x)^2 (x(i)βx)2 , causing w(i)w^{(i)} w(i) to rapidly approach zero. They are given very little importance.
-
The Bandwidth Parameter Ο\tau Ο (tau): This crucial parameter controls how quickly the weight of a training example diminishes with distance.
- A small Ο\tau Ο means weights drop off very quickly, leading to a “very local” fit (potentially overfitting if too small).
- A large Ο\tau Ο means weights drop off slowly, making the fit more “global” (closer to standard linear regression).
It’s important to remember that these weights w(i)w^{(i)} w(i) are deterministic values based on distance, not random variables, despite the Gaussian form. If xx x is a vector, the distance is typically Euclidean.
Non-Parametric Nature: A Different Kind of Model 
LWR is often introduced as a prime example of a non-parametric algorithm. This is a significant distinction from what we’ve seen so far:
-
Parametric Algorithms (e.g., Standard Linear Regression):
- Have a fixed, finite number of parameters (the ΞΈj\theta_j ΞΈjβ ‘s).
- Once these parameters are learned from the data, the original training data is no longer needed to make future predictions. You just need the ΞΈ\theta ΞΈ values.
-
Non-Parametric Algorithms (e.g., LWR):
- The “complexity” of the hypothesis (the amount of information needed to represent hh h ) grows linearly with the size of the training set.
- To make any prediction for a new query point xx x , the entire training set must be kept available because the model parameters ΞΈ\theta ΞΈ are re-computed for each new query.
This “non-parametric” nature is both a strength (adaptability) and a weakness (computational cost for large datasets and predictions).
Placement and Importance in the Text
While LWR provides an elegant solution for non-linearity and offers a glimpse into different model complexities, it’s often labeled as “optional reading” in foundational texts. This suggests it might be considered less fundamental than the core LMS algorithm or The Normal Equations for an initial grasp of linear regression.
However, it beautifully illustrates diverse strategies for handling complex data relationships beyond simply adding more global polynomial features. It shows that sometimes, a local approach can be more flexible and robust!
Wrapping Up
Locally Weighted Linear Regression offers a fascinating departure from global model fitting in linear regression. By re-computing parameters locally for each prediction using weighted least squares, it effectively handles non-linear relationships without explicit feature engineering. Its non-parametric nature is a key concept, highlighting that not all models can “forget” their training data.
As I continue to build out my AI SaaS tools, TailorMails.dev, exploring these nuances in algorithms helps me choose the right tool for the right job, balancing complexity, performance, and interpretability.
If you found this helpful or insightful, consider supporting my work! You can grab me a virtual coffee here: https://buymeacoffee.com/randhirbuilds. Your support helps me keep learning, building, and sharing!
This content originally appeared on DEV Community and was authored by Randhir Kumar