Hello, I'm doing text mining in attempt to organize my data. I have a table with more then 200K rows and 12 columns and I want extract some information from one of columns. Indeed, I'm looking for names that match with my reference table (approx. 14K names). For that, I'm using contains function. For make this search, I'm using two loops. First to lock one of 14K names and second for look for this name in the 200K rows. This takes a very long time. Could help to speed up my script? Thanks
Here I show you the code:
if true
1st loop (reference name table)
for k=2:14045;
clear test
clear Genr
test=DNP(k,10);
virgula=',';
Space= ' ';
Genr=Space+test+Space;
Second loop (my raw table with more than 200K rows)
for i=15001:16000;
clear Presence
clear A
clear B
clear C
BiolSource=DNP(i,3);
Presence=contains(BiolSource, Genr, 'IgnoreCase',true);
if Presence ==1;
A=DNP(i,13);
B=DNP(k,11);
DNP(i,13)=A+virgula+Space+test+Space+B;
C=DNP(i,13);
DNP(i,13)=erase(C,"0, ");
end
end
end