K-Means Elbow Method

K-Means Elbow Method

How to implement K-Means Elbow Method.

Source Code Link: Elbow Method Project Files

K-Means Elbow Method is a heuristic used in determining the number of clusters in a data set.

A lot of the tutorial/articles I am writing on this blog are related to K-Means clustering. One of the major decisions we have to make is how many clusters our data set have?

K-Means Clustering

K-means is a simple unsupervised machine learning algorithm that groups data into a specified number (k) of clusters. Because the user must specify in advance what k to choose, the algorithm is somewhat naive – it assigns all members to k clusters even if that is not the right k for the dataset.

Sometimes choosing the right number for k is not that simple. More often than not analyzing the dataset before hand is a must.

There are multiple articles using K-Means algorithm on this blog:

and although these are fairly simple problems to solve, you can still run the K-Means Elbow Method to determine the number of clusters in the dataset.

To complete this article you can use my K-Means implementation provided with this project, or you can use a third-party library such as Accord.NET and you can use the following K-Means implementation.

K-Means Elbow Method

The elbow method runs k-means clustering on the dataset for a range of values for k (say from 1-10) and then for each value of k computes an average score for all clusters.

When these overall metrics for each model are plotted, it is possible to visually determine the best value for k. If the line chart looks like an arm, then the “elbow” (the point of inflection on the curve) is the best value of k. The “arm” can be either up or down, but if there is a strong inflection point, it is a good indication that the underlying model fits best at that point.

K-Means Elbow Method – Intuition

The intuition is that increasing the number of clusters will naturally improve the fit (explain more of the variation), since there are more parameters (more clusters) to use, but that at some point this is over-fitting, and the elbow reflects this.

This means that you should choose a number k so that adding another cluster doesn’t give much better modelling of the data.

In this scenario the intuition of over-fitting is expressed by subdividing the labeled groups into multiple clusters. Therefore, once the number of clusters exceeds the actual number of groups in the data, the added information will drop sharply, because it is just subdividing the actual groups.

Analyzing the project

K-Means Elbow Method
Image 1. K-Means Elbow Method

From the application we can see that we are clustering three distinct groups. We are running the Elbow Method algorithm 6 times (The value K gets values from 1 to 6). By now you should know that we are actually running the K-Means algorithm and setting the K to values from 1 to 6.

By plotting the results we can observe that the best value for K is 3. And this is the correct solution just by observing the dataset. If we set K=1 then we can see the error is pretty big. For K=2 we can observe that the error decreased.

If we increase the number K (number of clusters) to 4,5,6 we can detect no changes in accuracy. The problem with higher number is that eventually these clusters will be divided into sub groups and the error rate will decrease even more.

To avoid that from happening we are plotting the results and using the “elbow” as a cut off point to choose a K where diminishing returns no longer worth the additional cost.

Also worth mentioning is that in practice there may not be a sharp elbow (like the one I provided) and such “elbow” cannot always be unambiguously identified.

Code Sample

for (int k = (int)start.Value; k <= (int)end.Value; k++)
{
     List<double[]> distanceResult = new List<double[]>();

     _kMeans = new KMeans(k, eucledeanDistance);
     Centroid[] centroids = _kMeans.Run(trainingSetList.ToArray());

     double result = 0;
     foreach (double[] point in trainingSetList)
     {
          double minValue = Double.MaxValue;
          foreach (Centroid centroid in centroids)
          {
             double distance = eucledeanDistance.Run(point, centroid.Array);
             if (distance < minValue)
                minValue = distance;
          }
          result += minValue;
     }
     result /= trainingSetList.Count;
}

6 thoughts on “K-Means Elbow Method

  1. Reply
    Dk
    March 24, 2021 at 3:26 am

    Hi
    can you please share code.

    1. Reply
      vanco
      March 24, 2021 at 7:12 am

      You can find the code at: http://code-ai.mk/wp-content/uploads/2020/09/DiscoverGroups_ElbowMethod_C.zip

  2. Reply
    vpn code 2024
    April 6, 2024 at 9:06 am

    I am in fact delighted to glance at this web site posts
    which consists of tons of useful data, thanks for providing these kinds of information.

    Feel free to surf to my webpage :: vpn code 2024

  3. Reply
    vpn coupon code 2024
    April 6, 2024 at 7:35 pm

    Attractive section of content. I just stumbled upon your website and
    in accession capital to assert that I get in fact enjoyed account your blog posts.
    Anyway I’ll be subscribing to your feeds and
    even I achievement you access consistently rapidly.

    My web page :: vpn coupon code 2024

  4. Reply
    vpn code 2024
    April 10, 2024 at 7:01 pm

    Hey there would you mind sharing which blog platform you’re using?
    I’m planning to start my own blog soon but I’m having a hard
    time making a decision between BlogEngine/Wordpress/B2evolution and
    Drupal. The reason I ask is because your design and style seems different then most blogs
    and I’m looking for something completely unique.
    P.S Apologies for being off-topic but I had to ask!

    my web blog … vpn code 2024

  5. Reply
    Maxwell
    April 16, 2024 at 12:09 am

    Wow, amazing blog format! How lengthy have you ever been blogging for?
    you make blogging look easy. The total glance of your web site
    is excellent, as smartly as the content! You can see similar here najlepszy sklep

Leave a Reply

Your email address will not be published. Required fields are marked *