Importing ascii data with mixed in headers

5 views (last 30 days)
I have an old program that creates ascii files with data in a format similar to the below:
Header line
Set 1 Parameter 1 1.0 Parameter 2 1.0 Parameter 3 1.1
0.0000 1.0000 2.0000 3.0000
0.1000 1.0001 2.0001 3.0001
0.2000 1.0002 2.0002 3.0002
Set 2 Parameter 1 2.0 Parameter 2 2.0 Parameter 3 2.1
0.0000 1.0005 2.0005 3.0005
0.1000 1.0006 2.0006 3.0006
0.2000 1.0007 2.0007 3.0007
This pattern repeats for several more sets, and is generally much larger than the simplified form I have presented.
I would like to extract the data from each set, as well as the values of the parameters for each set, but am having trouble doing so. I could go line by line with fgetl, but this doesn't seem particularly efficient in this case. All the bulk data readers that I can think of have problems with the mixed data format created by the parameter lines. Is there a way to extract the information I want without having to pass through the file at each individual line?
  2 Comments
Walter Roberson
Walter Roberson on 6 Jun 2018
Does the output have to be broken up by block, or can it be all together as if the Set lines were not there?
Bob Thompson
Bob Thompson on 6 Jun 2018
It would be ideal to have the outputs broken by block, but I could probably manage that separately if necessary.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 6 Jun 2018
S = fileread('YourFile.txt');
bpos = regexp(S, '^Set', 'start', 'lineanchors');
blocks = mat2cell(S, 1, diff([1 bpos length(S)]));
blocks(1) = [];
fmt = repmat('%f', 1, 4);
data = cellfun(@(B) textscan(B, fmt, 'HeaderLines', 1, 'CollectOutput', 1), blocks);
data is now a cell array of numeric arrays.
The code does assume that each block has the same number of columns, but it does not assume that the blocks have the same number of rows.
  5 Comments
Bob Thompson
Bob Thompson on 7 Jun 2018
Cool, that took care of it. I had to do a bit of finessing to make it fit what the actual expression was, but it got me right on track.
Is it possible to pull multiple parameters with different labels using a single regexp? I didn't see an example like that in the documentation for it. An example of what I mean could be like the following.
pstr = regexp(blocks, '(?<=ParamA=\s+)\S+(\s+?<=ParamB=\s+)\S+', 'match');
As I understand it this would look and identify both numbers, but would keep them in the same output, which I could understand if it just got messy. Alternatively,
pstr = regexp(blocks, '(?<=ParamA=\s+)\S+','(\s+?<=ParamB=\s+)\S+', 'match');
would individually look for the two different parameter entries, and ideally would produce two individual results stored in a matrix.
Walter Roberson
Walter Roberson on 7 Jun 2018
I suggest you look at the regexp named token facility and the 'names' option.
ptokens = regexp(blocks, '(?<=ParamA=\s+)(?<ParamA>\S+).*(?<=ParamB=\s+)(?<ParamB>(\S+)', 'names');
This would produce a struct array with fields ParamA and ParamB that held the relevant content.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!