Left Join Gotcha: Rows Duplication



This content originally appeared on DEV Community and was authored by Dror Atariah

I am sure you saw it a zillion times, but it can still hit you if you’re being sloppy.
The uniqueness of your left table is not guaranteed to hold after a left join.
Assume that your left table is:

customer_id customer_name
1 Alice
2 Bob
3 Charlie

and your right table is:

product_id customer_id
a 1
b 1
b 2
c 2
c 3

Then, the result of the join would be:

customer_id customer_name product_id
1 Alice a
1 Alice b
2 Bob b
2 Bob c
3 Charlie c

This is a very straightforward example, so you might not be surprised.
But, rest assured, in more complex setups you can easily miss that and end up with duplications in your result set.

Data quality

Data quality issues can mean that your understanding of the data is misaligned with the actual nature of the data you use.
Namely, you read a right table where you expect it to have unique identifiers, but in reality it has duplications.
These duplications will end up in the result of the join!
So the lesson learned is: ALWAYS CHECK PRE-CONDITIONS!


This content originally appeared on DEV Community and was authored by Dror Atariah