Get numerical data from a string with multiple kinds of delimiters

3 views (last 30 days)
I have a text file (with very strange formatting) where each row is printed as
{2, 3} -> {-0.003201472132661235 + 0.00011724512961300188*I, 0.0024489343681961366 + 0.0012251936705803077*I,....}
This is very messy - let me clarify
The numbers 2 and 3 are indices. At point (2,3) we have a list of complex numbers where I = sqrt(-1). I'd like to translate this string into the row of a matrix formatted as
[2, 3, -0.00320+0.0001172i, 0.0024489+0.001225i, ...]
I have tried load / csvread but obviously not the right functions. Currently I am using fopen, reading each line with fgets, and then strtok repeatedly but this is going badly.
Do you have any suggestions? I would be most appreciative of your input.
Thanks!
PS these files can be North of 14MB so a real pain, too.

Accepted Answer

Walter Roberson
Walter Roberson on 4 Dec 2012
T = fileread('YourFileNameHere.txt');
T = regexprep(T, '[{},]|->|\*I|(?<=[-+]) (?=\d)', '');
At this point, T is a single string delimited by newlines, in which all of the decoration is gone (including the commas and *I), so each line is just a list of numbers, such as
2 3 -0.003201472132661235 +0.00011724512961300188 0.0024489343681961366 +0.0012251936705803077 ...
If there are then a fixed number of items per line, you can
NumComplex = 2;
NumItem = 2 + 2 * NumComplex;
fmt = repmat('%f', 1, NumItem);
C = textscan(T, fmt, 'CombineOutput', 1);
and your matrix would then be
M = [C{1}(:,1) C{1}(:,2), complex(C{1}(:,3:2:end), C{1}(:,4:2:end))];
If the number of entries per line is variable, but there is a defined maximum, then you can use the above with NumComplex being the maximum number of complex pairs, and add 'MissingValue', NaN to the calling options.
If the number of entries per line is variable and there is no defined maximum, you cannot be using regular numeric arrays to store all the data at the same time. For efficient solutions to this scenario, please deposit more chocolate ;-)

More Answers (1)

David K
David K on 5 Dec 2012
Thanks Walter your example code was really helpful! The number of entries per line is constant. Unfortunately, the 'CombineOutput' option for textscan didn't seem to work for me... Here is the code that I now have (the only issue is when I call it, it can take 5-10 minutes to run for large files but hey it gets the format right):
%% Script to get the first line of complex coefficients % step through fid in other versions of code to get all lines
clc clear all close all
fid = fopen('CWT.csv','r'); %# open csv file for reading (sorry not .txt)
line = fgets(fid); %# read first line as a string
line = regexprep(line,... '[{},]|->|\*I|(?<=[-+]) (?=\d)', ''); %# remove decorations
line = strrep(line,'*^','e'); %# change exponential format
[token, remain] = strtok(line, ' '); %# get first index (octave) oct = str2num(token);
[token, remain] = strtok(remain, ' '); %# get second index (voice) voc = str2num(token);
i = sqrt(-1); j = 1;
while isempty(remain)~=1 [token, remain] = strtok(remain, ' '); %# get real part X = str2num(token); [token, remain] = strtok(remain, ' '); %# get imaginary part Y = str2num(token); C(j) = X + i*Y; %# get complex coefficient j=j+1; end
fclose(fid);
It takes a while to run because the size of C is not specified ahead of time, but gets the output right. To do every line of the file (say 200 lines each with 2000 entries) can take 5 minutes. My largest files are 200+ lines each with 100,000+ entries.
I'd say this question is answered, unless you know how to make the code more efficient.
Thank you again,
David
  1 Comment
David K
David K on 5 Dec 2012
Ugh should have used the code button
%%Script to get the first line of complex coefficients
% step through fid to complete the task
clc
clear all
close all
fid = fopen('CWT.csv','r'); %# open text/csv file for reading
line = fgets(fid); %# read first line as a string
line = regexprep(line,...
'[{},]|->|\*I|(?<=[-+]) (?=\d)', ''); %# remove decorations
line = strrep(line,'*^','e'); %# change eponential format
[token, remain] = strtok(line, ' '); %# get first index (octave)
oct = str2num(token);
[token, remain] = strtok(remain, ' '); %# get second index (voice)
voc = str2num(token);
i = sqrt(-1);
j = 1;
while isempty(remain)~=1
[token, remain] = strtok(remain, ' '); %# get real part
X = str2num(token);
[token, remain] = strtok(remain, ' '); %# get imaginary part
Y = str2num(token);
C(j) = X + i*Y; %# get complex coefficient
j=j+1;
end
fclose(fid);

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!