What is K-means Clustering?
K-means clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.
The way kmeans algorithm works is as follows:
- Specify number of clusters K.
- Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
- Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing.
- Compute the sum of the squared distance between data points and all centroids.
- Assign each data point to the closest cluster (centroid).
- Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster.
Can we form any number of cluster??
The number of clusters that we choose for the algorithm shouldn’t be random. If there is 50 data points and we form 50 cluster then there is no use of forming the cluster as we can’t conclude anything from that cluster.
We Can Choose the right number of clusters with the help of the Within-Cluster-Sum-of-Squares (WCSS) method.
WCSS Stands for the sum of the squares of distances of the data points in each and every cluster from its centroid.
The main idea is to minimize the distance between the data points and the centroid of the clusters. The process is iterated until we reach a minimum value for the sum of distances.
Some of the use case of K-means clusterting:
Cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.
Every company depends on the customers. Company has data and info about customer, so by applying K-means clustering companies can segregate the customers on the basis of different parameter for their profit.
We have huge number of students so based on academic performance of student we can form cluster of students based on grade.
Identifying threats in network traffic
As more and more services begin to use APIs on your application, or as your website grows, it is important you know where the traffic is coming from. For example, you want to be able to block harmful traffic and double down on areas driving growth. However, it is hard to know which is which when it comes to classifying the traffic.
How clustering works: K-means clustering is used to group together characteristics of the traffic sources. When the clusters are created, you can then classify the traffic types. The process is faster and more accurate than the previous Autoclass method. By having precise information on traffic sources, you are able to grow your site and plan capacity effectively.
So that’s all about K-means clustering!!!
Thanks for reading!!
#worldrecordholder #training #internship #makingindiafutureready #summer #summertraining #python #machinelearning #docker #rightmentor #deepknowledge #linuxworld #vimaldaga #righteducation