How classification and clustering work: the easy way

Machine learning gets a lot of buzz. The two most talked about classes of algorithms are classification and clustering. Classification is assigning things a label. Clustering is grouping things that look like they go together. Yet people are often confused about what these are and what the difference is.

That confusion is partly because many explanations quickly go into a bunch of formulas. Instead, here is an explanation of clustering and classifying things the old-fashioned way: in an Excel spreadsheet.

How classification works

Let’s say that you want to predict which students will likely graduate and which students will likely drop out. Perhaps you want to flag them so you can assign a counselor. So, you have two labels: risk and low-risk. To do this using classification, you need a training set of students already known to have graduated.

(Please note that I acquired this data the same way a stable genius does: I made it up. Don’t use it for anything but understanding what classification means.)

Leave a Reply

Your email address will not be published. Required fields are marked *