Filtering Out Rows Using LEFT JOIN: A Clean Alternative to NOT IN



This content originally appeared on DEV Community and was authored by Dror Atariah

Have you ever needed to exclude certain records from your dataset based on a blacklist or exclusion table? While many developers reach for NOT IN or NOT EXISTS, there’s an elegant alternative using LEFT JOIN that’s often more readable and performant. Let me show you how.

The Problem

Imagine you have a dataset and you want to exclude certain rows based on values in another table. This is a common scenario when working with:

  • Blacklisted users or items
  • Excluded categories
  • Records that should be filtered out based on business rules

The Solution: LEFT JOIN with NULL Check

Here’s a clean approach using LEFT JOIN that I’ll demonstrate with DuckDB:

Step 1: Create Sample Data

First, let’s create some sample data to work with:

from duckdb import sql

# Create a main dataset
sql("""
    CREATE OR REPLACE TABLE data AS (
    SELECT
        'ID' || generate_series AS id,
        'Name_' || generate_series AS name,
        generate_series * 10 AS value
    FROM generate_series(1, 6)
    )
""")

sql("SELECT * FROM data").df()

This results in:

id name value
ID1 Name_1 10
ID2 Name_2 20
ID3 Name_3 30
ID4 Name_4 40
ID5 Name_5 50
ID6 Name_6 60

Step 2: Create the Blacklist

Now let’s create our exclusion list:

# Create a blacklist of IDs to exclude
sql("""
    CREATE OR REPLACE TABLE black_list AS (
    SELECT 'ID2' AS ignore_id
    UNION ALL
    SELECT 'ID4' AS ignore_id
    UNION ALL
    SELECT 'ID5' AS ignore_id
    )
""")

This blacklist contains three IDs that we want to exclude from our results.

Step 3: Filter Using LEFT JOIN

Here’s where the magic happens:

SELECT d.*
FROM data AS d
LEFT JOIN black_list ON d.id = black_list.ignore_id
WHERE black_list.ignore_id IS NULL

How It Works

The LEFT JOIN technique works in three steps:

  1. LEFT JOIN: Joins every row from the data table with matching rows from black_list. If no match exists, the joined columns are filled with NULL.

  2. NULL Check: The WHERE black_list.ignore_id IS NULL condition keeps only rows where no match was found in the blacklist.

  3. Result: We get all rows from data except those whose IDs appear in the blacklist.

Why This Approach?

Advantages over NOT IN:

  • NULL Safety: NOT IN can return unexpected results when the exclusion list contains NULL values
  • Performance: Often more efficient, especially with proper indexing
  • Readability: The intent is clear – we’re excluding based on a join condition

Advantages over NOT EXISTS:

  • Simplicity: Less nested subquery complexity
  • Consistency: Follows the same pattern as regular filtering joins

The Result

Running our query will return only ID1, ID3, and ID6 – effectively filtering out ID2, ID4, and ID5 as specified in our blacklist.

Real-World Applications

This pattern is incredibly useful for:

  • User Management: Excluding banned or inactive users
  • Product Catalogs: Filtering out discontinued items
  • Data Cleanup: Removing test or invalid records
  • Compliance: Excluding data based on regulatory requirements

Conclusion

The LEFT JOIN filtering technique is a powerful tool in your SQL arsenal. It’s clean, performant, and handles edge cases better than alternatives like NOT IN. Next time you need to exclude rows based on another table, give this approach a try!

Of course, by replacing WHERE black_list.ignore_id IS NULL with WHERE black_list.ignore_id IS NOT NULL we get the complement! That is, whitelisting.

This example uses DuckDB, but the same pattern works across all major SQL databases including PostgreSQL, MySQL, SQL Server, and others.


This content originally appeared on DEV Community and was authored by Dror Atariah