Understanding Linear Regression Inside-Out: Practical Implementation with scikit-learn and TensorFlow – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by John Ojo

Welcome back to Part 2 of the series:
“Understanding Linear Regression Inside-Out”

In Part 1, we explored the foundations of linear regression by building it from scratch using NumPy. We delved into concepts like prediction, loss gradient, gradient descent, gaining an inside-out understanding of how linear regression models learn.

While implementing these algorithms manually is essential for learning, real-world machine learning applications require more efficient, scalable, and production-ready tools. In this second part, we shift our focus to practical implementations using two widely used machine learning libraries: scikit-learn and TensorFlow.

We’ll use the same dataset with identical data loading logic, so we’ll skip that section (refer to Part 1’s Data Processing section if needed). Although scikit-learn doesn’t require normalization, we’ll maintain the same preprocessing steps.

Let’s dive in.

The source code on GitHub : Linear Regression Using Library

Prerequisites

jupyter notebook setup
A local Python env setup with packages: numpy, matplotlib, sklearn, and tensorflow

scikit-learn Implementation

scikit-learn’s LinearRegression model is straightforward to use. You instantiate it, fit it to your training data, and then call predict() on your test data. During the fitting process, the model learns the best parameters (weights and bias), the same goal we achieved with gradient descent in Part 1.

# Train Linear Regression model using sklearn
# Initialize and train the Linear Regression model
sklearn_model = LinearRegression()
sklearn_model.fit(X_train_norm, y_train)

# Make predictions on validation and test sets
val_predictions = sklearn_model.predict(X_val_norm)
y_predict_sklearn = sklearn_model.predict(X_test_norm)

# Calculate mean squared error for both validation and test sets
val_loss = mean_squared_error(y_val, val_predictions)
test_loss = mean_squared_error(y_test, y_predict_sklearn)

That’s all you need to train a Linear Regression model using scikit-learn. From there, you can evaluate performance like mean squared error and plot actual vs predicted value comparisons.

print(f"Validation MSE: {val_loss:.4f}")
print(f"Test MSE: {test_loss:.4f}")

# Create a plot comparing actual vs predicted sales
utils.plot_predictions(y_test, y_predict_sklearn, 'Predicted vs Actual Sales', 'Actual Sales', 'Predicted Sales')

# Print the first 25 actual and predicted values for comparison
for i in range(25):
    print("Print actual vs predicted values")
    print(f"Actual: {y_test[i]}, Predicted: {y_predict_sklearn[i]:.1f}")

If you compare this plot to the one from Part 1, you’ll see we get the same results

Tensorflow Implementation

TensorFlow takes a neural network approach. We’ll create a simple one-layer neural network, use summary() to view the model architecture, and then model.fit() will learn the best parameters (weights and bias), just like our gradient descent implementation in Part 1, and then call predict() on your test data to make new predictions

# Train Linear Regression model using TensorFlow
# Define a simple Sequential model with a Dense layer (1 unit for linear regression)
tf_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train_norm.shape[1],)),
    tf.keras.layers.Dense(1)  # Linear regression (no activation)
])

# Display model architecture
tf_model.summary()

# Configure the model training parameters
tf_model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01),
    loss='mse', # Mean Squared Error loss
    metrics=['mae'] # Track Mean Absolute Error during training
)

# Train the model
history = tf_model.fit(
    X_train_norm, y_train,
    validation_data=(X_val_norm, y_val),
    epochs=200,
    batch_size=32
)

# Evaluate on test data
test_loss, test_mae = tf_model.evaluate(X_test_norm, y_test, verbose=0)

# Generate predictions on test data
y_predict_tf = tf_model.predict(X_test_norm).flatten() # Flatten converts 2D array to 1D for easier comparison

That’s all you need to train a Linear Regression model using Tensorflow. You can also evaluate performance like mean squared error and plot actual vs predicted value comparisons.

print(f"Test Loss (MSE): {test_loss:.4f}")
print(f"Test Mean Absolute Error: {test_mae:.4f}")

# Create a plot comparing actual vs predicted sales
utils.plot_predictions(y_test, y_predict_tf, 'Predicted vs Actual Sales', 'Actual Sales', 'Predicted Sales')

# Print the first 25 actual and predicted values for comparison
for i in range(25):
    print("Print actual vs predicted values")
    print(f"Actual: {y_test[i]}, Predicted: {y_predict_tf[i]:.1f}")

Comparing this output to our results from Part 1 and sklearn, you’ll see they are virtually identical.

Conclusion

In this second part of the series, we transitioned from building linear regression from scratch to using powerful machine learning libraries, scikit-learn and TensorFlow, to implement the same model more efficiently.

What’s particularly rewarding is how both approaches — manual and library-based — converged to similar results. This reinforces the foundational concepts covered in Part 1, while also demonstrating the real-world value of leveraging mature frameworks for faster development and scalability.

Understanding what’s happening under the hood, while also knowing how to use the tools available, gives you a balanced and confident footing in your machine learning journey.

Thanks for following along! If you found this helpful or have questions, feel free to reach out. You can also buy me a coffee — happy learning!

This content originally appeared on DEV Community and was authored by John Ojo