Machine Learning Guide (Part 1)



This content originally appeared on DEV Community and was authored by Youssof Naghibi

Machine Learning Guide

Quick Info

Audience: This guide is made for beginners with basic knowledge in Python programming.

Goal: Introduction to this guide series.

Resources: On my GitHub page you can download the whole guide as a PDF or find the links to all parts of this series.

Last Edit: 2025 April 03

Credits: This guide is inspired by chapter 2 in “Hands on Machine Learning” by Aurélien Geron. I am in no way associated with the author himself. This guide does not replicate any parts of the book, and the code presented here is based on publicly available source codes (see Colab).

Introduction

I want to use this introduction briefly to explain how to learn the basics of machine learning, because it can be quite intimidating for newcomers with little background knowledge. Even without much knowledge about Python, you can learn language on the fly by following this guide, but if you want more preparations, then you should get familiar with the most basic concepts (variables, lists, tuples, dictionaries, functions, loops, if-else-statements). You will also encounter other concepts like lambda functions or classes, but our use cases are rather simple.

You will probably find out that learning Python modules for machine learning or data scientists almost feels like learning a new language, anyway.

For now you do not need much mathematical background except very simple school mathematics. Of course more advanced topics require more knowledge (like basic linear algebra or probability theory), but as long as you do not intend to build your own machine learning tools, you can simply use the existing ones without knowing every mathematical or technical detail working under the hood.

The best way to get a grasp on machine learning is to start with very practical books like “Hands on Machine Learning” by Aurélien Geron, because they explain working source codes for real world examples. The alternative would be starting from scratch with very basic books, but you may not have time to learn every detail right from the beginning.

Of course practical books can have a very steep learning curve, but if you use learning techniques like priming, incubation, and the 24-hour rule combined with practical coding you can get started with machine learning within just a few days or weeks. This means that you do should not try to memorize everything from the beginning, but rather skim through the working examples, and revisit the details later on, while experimenting with parts of the code. The more you repeat the first-skim-then-revisit-cycle the better you will get without wasting too much time on less important details.

One way to soften the steep learning curve is to start with crash courses like this one. So without further ado, let us begin.

Setup and installations

In order to run the Python code you only need a webbrowser, if you use Google’s Colab. Aurélien Geron’s source code used in his book is also publicly available on Colab, even though it may not be very beginner friendly.

However, I recommend running everything locally on your computer for the ease of use. We will be using Visual Code, which has a lot of nice comfort functions that are probably not available on Colab. The only downside is that the initial installation takes a bit of effort and about 10 GB space in total.

Jupyter notebooks

First you can install Visual Code and the Jupyter extension. Jupyter allows you to run certain parts of your code in any order you like. We will refer to these code parts as Jupyter cells.

#%%
x = 3
#%%
print(x)
#%%
del x

Once you run a Jupyter cell, an interactive window will open in Visual Code, which shows you the outputs like numbers, arrays, tables or even plots.

Note that the interactive window may have a restart button, where all variables are reset, but this does not necessarily apply to module-level attributes like __file__. In this case you have to close the interactive window, before you can safely run the cells from a new script file. Otherwise some problems might occur, where e.g. __file__ is the file directory of a previously executed python script instead of the current one. Even restarting Visual Code itself is not a substitute for starting a new interactive window.

The reason, why Jupyter cells are useful, is that you can debug or modify the code without repeating previously computed time intensive cells. Of course you should be careful with this functionality. Sometimes it is better to restart the whole code from scratch before causing too much chaos.

Anaconda

After installing Jupyter, your Python setup also needs the core modules required for machine learning. Instead of downloading them separately, you can install Anaconda, which is widely used for data science, because it can handle module dependencies well. It should be also compatible with other Python based tasks that are not related to machine learning.

#%%
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In case you ever need to install missing modules:

conda install -c conda-forge xgboost

Kaggle competitions

One way to practice machine learning is to participate in Kaggle competitions. We will demonstrate this with a competition for beginners: House Prices – Advanced Regression Techniques.

References

  1. House Prices Competition
  2. Anaconda Install Guide
  3. Geron’s Colab Notebooks
  4. Complete Code Examples
  5. Aurélien Géron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow


This content originally appeared on DEV Community and was authored by Youssof Naghibi