Clustering



This content originally appeared on DEV Community and was authored by Carlos Almonte

Sometimes the data we need to evaluate is unorganized, and by this I mean the data is missing target values. Target values are the values we are trying to predict. When this happens we need a way to group the data into defining clusters; pictures of babies in one cluster, pictures of sexy ladies in another cluster, and pictures of old people in another cluster. It is very time consuming having to go over thousands of pictures and manually providing the target value for each of the pictures, this is where clustering comes into play.

Another more beginner friendly example is that we can group the data the same way we might group groceries at the store. If you dump everything on the counter — apples, bananas, bread, milk, and chips — you don’t need labels to know that the apples and bananas belong in the “fruit” pile, and the bread and chips belong in the “snacks” pile. Clustering works the same way: it looks at the similarities between items and puts them into groups, even when we don’t have target labels to guide us.

Think of features as the different ways you could describe an object to a friend. If you say ‘round, red, and sweet,’ they’ll probably guess you’re talking about an apple. Clustering works the same way — it looks at those descriptive features, shape, color, and taste, and puts similar items together.


This content originally appeared on DEV Community and was authored by Carlos Almonte