This content originally appeared on DEV Community and was authored by Dror Atariah
Have you ever needed to exclude certain records from your dataset based on a blacklist or exclusion table? While many developers reach for NOT IN
or NOT EXISTS
, there’s an elegant alternative using LEFT JOIN
that’s often more readable and performant. Let me show you how.
The Problem
Imagine you have a dataset and you want to exclude certain rows based on values in another table. This is a common scenario when working with:
- Blacklisted users or items
- Excluded categories
- Records that should be filtered out based on business rules
The Solution: LEFT JOIN with NULL Check
Here’s a clean approach using LEFT JOIN that I’ll demonstrate with DuckDB:
Step 1: Create Sample Data
First, let’s create some sample data to work with:
from duckdb import sql
# Create a main dataset
sql("""
CREATE OR REPLACE TABLE data AS (
SELECT
'ID' || generate_series AS id,
'Name_' || generate_series AS name,
generate_series * 10 AS value
FROM generate_series(1, 6)
)
""")
sql("SELECT * FROM data").df()
This results in:
id | name | value |
---|---|---|
ID1 | Name_1 | 10 |
ID2 | Name_2 | 20 |
ID3 | Name_3 | 30 |
ID4 | Name_4 | 40 |
ID5 | Name_5 | 50 |
ID6 | Name_6 | 60 |
Step 2: Create the Blacklist
Now let’s create our exclusion list:
# Create a blacklist of IDs to exclude
sql("""
CREATE OR REPLACE TABLE black_list AS (
SELECT 'ID2' AS ignore_id
UNION ALL
SELECT 'ID4' AS ignore_id
UNION ALL
SELECT 'ID5' AS ignore_id
)
""")
This blacklist contains three IDs that we want to exclude from our results.
Step 3: Filter Using LEFT JOIN
Here’s where the magic happens:
SELECT d.*
FROM data AS d
LEFT JOIN black_list ON d.id = black_list.ignore_id
WHERE black_list.ignore_id IS NULL
How It Works
The LEFT JOIN technique works in three steps:
LEFT JOIN: Joins every row from the
data
table with matching rows fromblack_list
. If no match exists, the joined columns are filled with NULL.NULL Check: The
WHERE black_list.ignore_id IS NULL
condition keeps only rows where no match was found in the blacklist.Result: We get all rows from
data
except those whose IDs appear in the blacklist.
Why This Approach?
Advantages over NOT IN:
-
NULL Safety:
NOT IN
can return unexpected results when the exclusion list contains NULL values - Performance: Often more efficient, especially with proper indexing
- Readability: The intent is clear – we’re excluding based on a join condition
Advantages over NOT EXISTS:
- Simplicity: Less nested subquery complexity
- Consistency: Follows the same pattern as regular filtering joins
The Result
Running our query will return only ID1, ID3, and ID6 – effectively filtering out ID2, ID4, and ID5 as specified in our blacklist.
Real-World Applications
This pattern is incredibly useful for:
- User Management: Excluding banned or inactive users
- Product Catalogs: Filtering out discontinued items
- Data Cleanup: Removing test or invalid records
- Compliance: Excluding data based on regulatory requirements
Conclusion
The LEFT JOIN filtering technique is a powerful tool in your SQL arsenal. It’s clean, performant, and handles edge cases better than alternatives like NOT IN
. Next time you need to exclude rows based on another table, give this approach a try!
Of course, by replacing WHERE black_list.ignore_id IS NULL
with WHERE black_list.ignore_id IS NOT NULL
we get the complement! That is, whitelisting.
This example uses DuckDB, but the same pattern works across all major SQL databases including PostgreSQL, MySQL, SQL Server, and others.
This content originally appeared on DEV Community and was authored by Dror Atariah