what is the fastest way to convert a cell array of delimited numbers into a matrix

4 views (last 30 days)
I have a cell array called 'out', which has 1 column and 500k rows and looks the following way:
1,5012,0,35,6
2,395,1,35,8
...
That is, each cell contains a string of numbers delimited by a comma. I'd like to obtain a matrix of numbers, D. I used the following script:
for i=1:size(out,1)
singline = (textscan(out{i},'%s','delimiter',','));
D(i,:) = str2double(singline{1});
end
It takes 206 seconds, which seems long, as this is just a small part of a big script to be repeated many times.
So if I use it in the script:
File = 'E:\result.csv';
dlmcell(File,out,',');
D = dlmread(File);
it takes only 31 seconds, and I get what I wanted, i.e. matrix D made of numbers
But it seems weird to me that a script with writing a file to a hard-drive and reading it back works faster than the one without that. So I'd like to ask what would be actually the fastest way to do this (using MATLAB 2010).
Thanks.

Answers (2)

Jos (10584)
Jos (10584) on 22 Mar 2016
A = {'1,5012,0,35,6' ; '2,395,1,35,8'} ;
A = repmat(A,250000,1) ; % big array!
tic ;
A2 = strcat(A,',') ;
V = sscanf([A2{:}],'%f,') ;
V = reshape(V,5,[]).' ;
toc
% Elapsed time is 2.397291 seconds.
  12 Comments
Sandeep Chaudhuri
Sandeep Chaudhuri on 4 Aug 2023
I am sorry, I forgot to attach the files. I can attach the file with 100 rows but the one with 10000 is like 144 MB and there is no way I can attach it..
Stephen23
Stephen23 on 4 Aug 2023
Edited: Stephen23 on 4 Aug 2023
" I can attach the file with 100 rows but the one with 10000 is like 144 MB and there is no way I can attach it.."
The file you attached has exactly one row, not 100 rows. It is very easy to import:
V = readmatrix('testpulses_100_08042023.txt')
V = 1×146900
0.0020 -0.0001 -0.0001 0.0006 -0.0006 0.0022 -0.0011 0.0000 -0.0010 -0.0009 0.0004 0.0007 -0.0005 -0.0015 -0.0024 -0.0005 -0.0006 -0.0021 -0.0012 -0.0022 0.0013 0.0001 -0.0014 0.0022 -0.0006 -0.0021 0.0004 0.0022 0.0029 0.0006
Of course if the number of elements is suitable, there is nothing stopping you from reshaping it:
M = reshape(V,[],100).' % taking a guess about the order
M = 100×1469
0.0020 -0.0001 -0.0001 0.0006 -0.0006 0.0022 -0.0011 0.0000 -0.0010 -0.0009 0.0004 0.0007 -0.0005 -0.0015 -0.0024 -0.0005 -0.0006 -0.0021 -0.0012 -0.0022 0.0013 0.0001 -0.0014 0.0022 -0.0006 -0.0021 0.0004 0.0022 0.0029 0.0006 0.0035 -0.0005 0.0033 -0.0003 0.0017 -0.0015 -0.0010 -0.0015 -0.0027 0.0029 -0.0017 0.0004 0.0015 -0.0024 -0.0001 0.0016 0.0001 -0.0028 0.0032 -0.0016 -0.0008 -0.0003 -0.0008 -0.0004 -0.0007 0.0033 0.0007 -0.0008 -0.0002 0.0001 -0.0025 -0.0020 -0.0012 -0.0020 0.0030 -0.0016 0.0009 0.0012 -0.0016 -0.0015 -0.0014 -0.0017 -0.0008 0.0001 0.0009 -0.0044 -0.0014 0.0002 -0.0008 0.0025 0.0006 -0.0022 0.0005 -0.0002 -0.0008 -0.0007 -0.0003 0.0021 0.0002 -0.0021 -0.0008 -0.0001 -0.0003 -0.0011 -0.0004 0.0013 -0.0008 0.0009 0.0006 0.0005 -0.0013 -0.0007 -0.0003 0.0003 0.0001 -0.0003 -0.0005 -0.0002 -0.0003 0.0001 0.0016 -0.0001 0.0008 0.0002 -0.0013 -0.0001 0.0022 0.0007 -0.0016 -0.0003 0.0005 -0.0007 0.0003 0.0003 -0.0001 -0.0006 -0.0003 -0.0006 -0.0016 0.0012 0.0011 -0.0009 0.0002 -0.0014 -0.0003 0.0000 0.0009 0.0002 -0.0007 -0.0002 -0.0002 -0.0005 0.0005 -0.0014 0.0010 -0.0011 -0.0009 0.0004 0.0002 -0.0002 -0.0009 0.0001 -0.0001 0.0017 0.0008 -0.0008 -0.0006 0.0012 -0.0017 0.0011 0.0002 -0.0001 -0.0009 0.0022 -0.0012 0.0002 0.0012 0.0013 0.0016 0.0005 0.0012 0.0005 0.0006 0.0006 -0.0013 0.0029 -0.0014 -0.0000 -0.0010 -0.0004 -0.0003 -0.0001 0.0001 -0.0002 0.0001 -0.0001 -0.0001 0.0002 0.0000 -0.0001 0.0000 -0.0001 0.0001 0.0001 -0.0001 0.0002 -0.0002 0.0002 -0.0001 0.0000 -0.0000 0.0000 0.0000 0.0001 -0.0000 0.0000 -0.0002 -0.0001 0.0002 -0.0001 0.0015 -0.0008 -0.0005 0.0006 -0.0010 0.0015 0.0003 -0.0018 -0.0018 0.0010 -0.0013 0.0005 0.0015 -0.0009 0.0003 0.0023 0.0006 -0.0000 -0.0010 0.0005 -0.0013 -0.0010 -0.0012 0.0001 -0.0001 0.0003 -0.0009 -0.0017 0.0035 -0.0006 0.0002 0.0000 0.0004 -0.0002 0.0002 0.0001 -0.0002 -0.0001 0.0001 0.0011 0.0005 -0.0001 0.0002 0.0005 0.0002 0.0001 -0.0001 0.0001 -0.0005 0.0002 0.0001 -0.0000 0.0000 0.0001 -0.0003 0.0005 0.0001 -0.0002 0.0001 0.0001 -0.0003 0.0003 -0.0000 -0.0001 -0.0003 -0.0004 0.0004 0.0007 -0.0003 0.0007 0.0000 -0.0003 -0.0000 -0.0001 0.0006 0.0002 -0.0006 0.0003 -0.0010 -0.0003 -0.0007 -0.0009 0.0000 0.0003 0.0006 -0.0005 0.0002 -0.0001 -0.0002 0.0003
If your imported data does not reshape, it is because the number of elements is not suitable.
Show us the outputs from these commands:
V = readmatrix('your huge data file.txt');
size(V)
and tell us the exact size you want to reshape it into.

Sign in to comment.


Fangjun Jiang
Fangjun Jiang on 22 Mar 2016
Edited: Fangjun Jiang on 22 Mar 2016
a={'1,5012,0,35,6';'2,395,1,35,8'};
b=str2num(char(a))
b =
1 5012 0 35 6
2 395 1 35 8
a={'1,5012,0,35,6';'2,395,1,35,8'};
aa=repmat(a,250000,1);
tic;
b=str2num(char(aa));
toc
Elapsed time is 19.681015 seconds.
  3 Comments
Fangjun Jiang
Fangjun Jiang on 22 Mar 2016
You mean it is slower than the dlmcell() approach? My comparison shows it is faster than the dlmcell() approach.
Olga Petrik
Olga Petrik on 22 Mar 2016
Edited: Olga Petrik on 22 Mar 2016
yes, I have tested both methods with a generated cell array of the same size, and indeed, the dlmcell method was slower:
char(a): Elapsed time is 15.682667 seconds.
dlmcell(): Elapsed time is 23.495048 seconds.
But when I test it with my real cell array the dlmcell is still faster:
char(a): Elapsed time is 59.761913 seconds.
dlmcell(): Elapsed time is 31.591709 seconds.
So it might be something about this particular cell array I have. The real cell of it looks like this: '71,1,4,1,4,16856,538131,5,1,21.25,21.75,1380003891,506080,21.25'
And it was obtained from another cell array, which contained both numbers and words, by substituting the words with some numbers.
But then, when I tried to copy-paste just the above single cell content and repmat it 500k times, and test it one more time, the results were again 15 seconds and 25 seconds respectively, instead of 59/31 observed with the real data.

Sign in to comment.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!