Remove duplicate variables depending on a second variable

7 views (last 30 days)
Dear experts, I have a list of variables where I need te remove duplicate variables. However, in case of duplicate variables I want to keep the varibles that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column. I hope it is clear. Im puzzling how to do this. This is the code I came up with so far.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
% find duplicates
ppn2 = ppn(:,1);
idx = find(strcmp(ppn2(1:end-1),ppn2(2:end)))+1;
%remove duplicates
ppn((idx),:) = [];

Accepted Answer

Kirby Fears
Kirby Fears on 21 Sep 2015
Hi Marty,
Try the code below.
% Defining ppn (all at once)
ppn = [ {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'},... % start col 3
{'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'}];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Deleting all duplicates with 0 in bPpn
idx = strcmp(ppn(1:end-1,1),ppn(2:end,1));
delidx = ([idx;false] | [false;idx]) & ~bPpn;
ppn(delidx,:)=[];
clear bPpn idx delidx;
% Get names of remaining duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end,
end,
Hope this helps.
  2 Comments
Marty Dutch
Marty Dutch on 22 Sep 2015
Hi Kirby,
Thanks for your response. And this works perfectly! Although I forgot to mention something... The script you've written deletes duplicates when they have a zero. In cases when there are multiple duplicates with a zero then it needs to keep randomly only one variable.
I really appreciate your time helping me! I'll have a look at your script and maybe I can adapt it on my own.
Marty Dutch
Marty Dutch on 22 Sep 2015
Wait, it works now. I just deleted this part of your code:
% Deleting all duplicates with 0 in bPpn
idx = strcmp(ppn(1:end-1,1),ppn(2:end,1));
delidx = ([idx;false] | [false;idx]) & ~bPpn;
ppn(delidx,:)=[];
clear bPpn idx delidx;

Sign in to comment.

More Answers (1)

the cyclist
the cyclist on 21 Sep 2015
This is not the world's most efficient code, but is a very straightforward implementation of what you want (or at least my understanding of it). It displays the indices you want to keep.
It's not documented at all, but I tried to use some intuitive variable names, so maybe you can figure it out.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
[unique_ppn,~,indexFromUniqueBackToAll] = unique(ppn(:,1));
number_unique_ppn = numel(unique_ppn);
indices_to_keep = [];
for np = 1:number_unique_ppn
index_to_this_ppn = find((indexFromUniqueBackToAll==np));
if numel(index_to_this_ppn) == 1
indices_to_keep = [indices_to_keep; index_to_this_ppn];
else
remove_zero_index = ismember(ppn(index_to_this_ppn,2),'0');
index_to_this_ppn(remove_zero_index) = [];
random_one_to_keep = index_to_this_ppn(randi(numel(index_to_this_ppn)));
indices_to_keep = [indices_to_keep; random_one_to_keep];
end
end
indices_to_keep

Categories

Find more on Filter Banks in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!