How can I determine the number of Headerlines for varying, non-rectangular text files so that I can parse it with textscan?

4 views (last 30 days)
I would like to use textscan to read in the tabular integer and floating point data, keying off of the *NODE line. This line can be anywhere in the file with other, non-comment strings and integer lines in there as well. How can I find the varying number of headerlines for any given input file?
My code and example input file are as follows, Thanks!
fid4 = fopen('E:\scratch\ANSYS_macro\MATLAB dyna beams\sample.k'); g = textscan(fid4,'%d %f %f %f','Delimiter','\n','headerlines',15); celldisp(g); fclose(fid4);
*KEYWORD
*TITLE
*DATABASE_FORMAT
$ 1IFORM 2IBINARY
0
$
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$ NODE DEFINITIONS $
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$
*NODE
$ 1NID 2X 3Y 4Z 5TC 6RC
1 0.141746 0.55315 -0.00592088
2 0.141746 0.538028 -0.00592088
3 0.126928 0.55315 -0.00669746
4 0.126926 0.538027 -0.00669757
5 0.112141 0.55315 -0.00747244
6 0.112138 0.538025 -0.00747256
7 0.0973459 0.55315 -0.0082478
8 0.0973435 0.538024 -0.00824792
9 0.0825538 0.55315 -0.00902302
10 0.0825514 0.538022 -0.00902315
11 0.0677682 0.55315 -0.0097979
$

Answers (1)

per isakson
per isakson on 30 Apr 2014
Edited: per isakson on 1 May 2014
I'm not sure I understand whether your file contains one or more blocks of numerical data. Here is a file that handles both cases.
function g = ccsm()
str = fileread( 'cssm.txt' );
cac = regexp(str,'(?<=\*NODE\s+).+?(?=((\*KEYWORD)|($)))','match');
g = cell( 1, length( cac ) );
for jj = 1 : length( cac )
g{jj} = textscan(cac{jj},'%d%f%f%f', 'CommentStyle','$');
end
end
returns a cell array g, where
>> g{3}
ans =
[11x1 int32] [11x1 double] [11x1 double] [11x1 double]
and where cssm.txt contains three copies of text you included in your question.
.
Comments:
  • the entire text file must fit in memory to use this approach
  • it is not possible read and parse the file in one step with textscan
  • it is safer to use a definition of the file format than guess based on one sample
  • regexp is powerful and fast, but ... . The expression I used assumes that blocks of numerical data are enclosed by "*NOTE" and "*KEYWORD" or by "*NOTE" and end of file.
  1 Comment
Cedric
Cedric on 1 May 2014
Edited: Cedric on 1 May 2014
Ah, a regexp challenge, I take it! ;-)
I'd propose the following:
blocks = regexp( content, '\*NODE(.*?\n){2}([\s\d\-\.]+)', 'tokens' ) ;
if the block doesn't always end with a $ character, and
blocks = regexp( content, '\*NODE(.*?\n){2}([^$]+)', 'tokens' ) ;
if it does. Then, blocks{1} (only cell if there is only one block) is a cell array whose cell 1 contains the header, and whose cell 2 contains the data.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!