Discover Groups with K-Means in C# Tutorial

Discover Groups with K-Means in C# Tutorial

Discover groups with K-Means in C# with source code.

Source Code Link: Discover Groups – NBA Player Position

Problem Description

In today’s post we are going to see if we can group modern NBA players into five groups:

  • Point guard
  • Shooting guard
  • Small forward
  • Power forward
  • Center

Basketball positions – Wikipedia

Traditional 5 positions in basketball
Image 1: Traditional 5 positions in basketball

Discover Groups with K-Means

One of the biggest advantages with K-Means, is given a predefined data set with players data, we can use it to find underlying groups and patterns.

As a result, these underlying groups and patterns will show us how many clusters/groups are there in the data. If we believe that the traditional five position system accurately describes the players skill, we should be able to observe 5 different clusters of players.

In other words, we will find out if modern NBA players skill set can be grouped into five categories on the court.

So, because K-Means is an unsupervised clustering algorithm it fits perfectly into our scenario.

However, if you need a refresher on K-Means algorithm you can check one of the following articles:

Code Walkthrough

Before we dive right into implementing the solution, let’s look at the data.

The data
Image 2: The data

You can find and open the dataset from the project folder: players_stats.csv

First, let’s read in the CSV file

_trainingSet = CSVFile.ImportCSVFileAsList(ofd.FileName, true, ',', 6, 7, 8, 9, 12, 13, 14, 16, 17, 18);s

Here we specify the file path, whether or not to include the header and the field delimiter. You can read more about CSV Files on this link.

In order to find how many groups of players there are, we are going to use the “Elbow Method“.

for (int k = (int)start.Value; k <= (int)end.Value; k++)
     List<double[]> distanceResult = new List<double[]>();

     _kMeans = new KMeans(k, eucledeanDistance);
     Centroid[] centroids = _kMeans.Run(_trainingSet.ToArray());

     double result = 0;
     foreach (double[] point in _trainingSet)
          double minValue = Double.MaxValue;
          foreach (Centroid centroid in centroids)
             double distance = eucledeanDistance.Run(point, centroid.Array);
             if (distance < minValue)
                 minValue = distance;
          result += minValue;
      result /= _trainingSet.Count;
      _kmeansPointList.Add(new DPoint(k, result));

What we are doing here is, we are running K-Means clustering algorithm for K=1 to K=10. The best K shows us what is the optimal number of clusters for the dataset we are processing.


Using Elbow Method to find the optimal number of clusters
Image 3: Results

From the image above we can clearly observe that five is not the optimal number of clusters/groups based on the players statistics. Therefore, we can conclude that the traditional five player positions incorrectly oversimplify the skill sets of the modern day NBA players.