Just Another Control System Blog: K-Medoid Algorithm in Matlab

According to Wikipedia, k-medoids algorithm is a clustering algorithm related to the k-means algorithm. In contrast to k-means algorithm, k-medoids chooses data points as centres. I wrote a Matlab program for implementing this algorithm. I am posting it here in the hope that people may find it useful.

function [IDX, Cluster, Err] = kmedoid2(data, NC, maxIter, varargin)
% Performs clustering using Partition Around Medoids (PAM) algorithm
% Initial clusters must be selected from the available data points

% Usage
% [IDX,C,cost] = kmedoid(data, NC, maxIter, [init_cluster])
% 
% Input : 
%       data - input data matrix
%       NC - number of clusters
%       maxIter - Maximum number of iterations
%       init_cluster - Initial cluster centers (optional)
%       init_index - Index of initial clusters
%
% Output :
%       IDX - data point index which became cluster centres
%       C -  Cluster centers
%       Err - cost associated with the clusters for all iterations
% ------------------------------------------

% Size of data
dsize = size(data);

% Number of data points
L = dsize(1);

% No. of features
NF = dsize(2); 

% Cluster size
csize = [NC, NF];

if (L < NC)
    error('Too few data points for making k clusters');
end

if(nargin > 5)
    error('Usage : [IDX,C,error] = kmedoid(data, NC, maxIter, [init_cluster], [init_idx])');
elseif(nargin == 5)
    vsize1 = size(varargin{1});
    vsize2 = length(varargin{2});
    
    if(isequal(vsize1,csize))
        Cluster = varargin{1};
    else
        error('Incorrect size for initial cluster');
    end
    
    if(vsize2 == NC)
        IDX = varargin{2};
    else
        error('Incorrect size for initial cluster index');
    end
elseif(nargin == 4)
   error('You must provide two optional arguments: init_cluster, init_idx');
elseif(nargin == 3) % no initial cluster provided
    IDX = randint(NC,1,L)+1;
    Cluster = data(IDX,:); % Initialize the initial clusters randomly
else
    display('Usage : [IDX,C] = kmedoid(data, NC, maxIter, [init_cluster], [init_idx]');
    error('Function takes at least 3 arguments');
end

%Array for storing cost for each iteration
Err = zeros(maxIter,1);

Z = zeros(NC, NF, L);
for i = 1:L
    Z(:,:,i) = repmat(data(i,:),NC,1);
end


FIRST = 1;
total_cost = 0;


for Iteration = 1:maxIter
    %disp(Iteration);
    
    if(FIRST) 
        % First time, compute the cost associated with the initial cluster
        
        C = repmat(Cluster,[1,1,L]);
        B = sqrt(sum((abs(Z-C).^2),2)); %Euclidean distance
        B1 = squeeze(B);
        [Bmin,Bidx] = min(B1,[],1);

        cost = zeros(1,NC);
        for k = 1:NC
            cost(k) = sum(Bmin(Bidx==k));
        end
        total_cost = sum(cost);
        
        FIRST = 0; % Reset the FIRST flag
        
    else % Not first time
        
        % change one cluster center randomly and see if the cost is
        % decreased. If yes, accept the change, otherwise reject the change

        while(1)
            Tidx = randint(1,1,L)+1; % find a random number betn 1 to L
            if(isempty(find(IDX==Tidx))) % Avoid previously selected centers
                break;
            end
        end
        % randomly change one cluster
        pos = randint(1,1,NC) + 1;
        OLD_IDX = IDX; % Preserve the old index list
        IDX(pos) = Tidx;

        %Assign cluster centers
        PCluster = Cluster;
        Cluster = data(IDX,:);

        % For each data point, find the cluster which is closest to it
        % and compute the cost obtained from this association

        C = repmat(Cluster,[1,1,L]);

        %B = sum(abs(Z-C),2); %Manhattan distance
        B = sqrt(sum((abs(Z-C).^2),2)); %Euclidean distance

        % Row - cluster (NC)
        % Column - each data point 1, 2, ... L
        B1 = squeeze(B);

        % For each data point, find the nearest cluster
        % size(Bmin) = 1xL
        [Bmin,Bidx] = min(B1,[],1);

        cost = zeros(1,NC);
        for k = 1:NC
            cost(k) = sum(Bmin(Bidx==k));
        end

        previous_cost = total_cost;
        total_cost = sum(cost);


        if(total_cost > previous_cost) % cost increases
            % Bad choice, restore the previous index list
            IDX = OLD_IDX;
            total_cost = previous_cost;
            Cluster = PCluster;
        end
    end
    Err(Iteration) = total_cost;
end

if(nargout == 0)
    disp(IDX);
elseif(nargout > 3)
    error('Function gives only 3 outputs');
end   
    
return

Save this file as "kmedoid.m". Now we can create clusters for a given data set as follows :

[Idx, C,err] = kmedoid(data, 5, 1000);

Above command iterates for 1000 times to create 5 clusters for a given data.

Idx - the indices of data points selected as cluster centres.
C - cluster centres selected from the data sets
err - cost for each iteration. It can plotted to see if the error is decreasing.

6 comments:

Unknown said...: Hello! How can I extract a vector containing the number of the corresponding cluster for each data point? (size=[n data point,1])
Thank you!; April 26, 2010 at 1:40 PM
Unknown said...: Thank you very much for sharing the code, but you should note that your implementation is Clarans (Ng & Han, 1994) and not Pam (Kaufman & Rousseeuw, 1990). While both are k-medoid algorithms, the latter uses an exhaustive search and different initialisation. As can be clearly seen in your code you restrict yourself to removing a medoid and evaluate the total cost as proposed by Ng & Han.; July 14, 2010 at 4:26 AM
rajendra kumar said...: Thank you for it...
please tell me what will be the structure (size of )of the data matrix in the matlab function.... i'm not getting,please elavorate it with example..

thanks
Rajendra Kumar; January 10, 2012 at 2:54 AM
pvgajjar said...: Dear sir,
i have tried to run this kmedoid code in matlab 7.13.0.564, R2011b.
i have save file as kmedoid2.m as the function name is kmedoid2.
but it gives error:
function [IDX, Cluster, Err] = kmedoid2(data, NC, maxIter, varargin)
|
Error: Function definitions are not permitted in this context.

plz give me solution so that i can run this prog.; March 12, 2013 at 4:22 AM
Unknown said...: hi,
I am trying to cluster the digital circuit using clustering algorithm,so whether ur k medoid algorithm can be used my work??; December 22, 2013 at 11:59 PM
Unknown said...: Thank you ,It is perfect.; May 28, 2016 at 4:11 AM

Just Another Control System Blog

Wednesday, January 27, 2010

K-Medoid Algorithm in Matlab

6 comments:

Followers

Labels

Related Links

Blog Archive

Just Another Control System Blog

Wednesday, January 27, 2010

K-Medoid Algorithm in Matlab

6 comments:

Followers

Labels

Subscribe To Swagat's blog

Related Links

Blog Archive