Reading nth value in a line starting with a word
5 views (last 30 days)
Show older comments
Alex Dworzanczyk
on 10 Apr 2020
Commented: Walter Roberson
on 11 Apr 2020
I am trying to read a text file that's designed more for human than for computer reading, i.e. it has large sections of copyright information, full sentences, etc. But there is one line I want to read that has this structure:
Ivac, m/s 3 4
The "3" and "4" are placeholders for real numerical values. I want to extract the value in Column 4 from this line starting with "Ivac," but don't know how. I have tried to look up how to do it, but I can't really understand any of the information about how to do this with regular expressions, and the other methods fail on the first line of my text file, which is just blank (right before a lot of copyright info).
Any assistance in resolving this problem would be greatly appreciated.
0 Comments
Accepted Answer
Walter Roberson
on 10 Apr 2020
S = fileread('FilenameGoesHere.txt');
parts = regexp(S, 'lvac,\s+m/s\s+(?<col3>\S+)\s+(?<col4>\S+)', 'names', 'once');
col3 = str2double(parts.col3);
col4 = str2double(parts.col4);
This code does not assume integers, but it does assume that there are no stray characters immediately adjacent. For example,
lvac, m/s 3° 17.2µm
would fail.
2 Comments
Walter Roberson
on 11 Apr 2020
\s does not mean space, exactly: it means any kind of "whitespace", which includes space, tab, newline, formfeed. \s+ means one or more whitespace characters are to be matched.
\S means anything that is not whitespace, so \S+ means one or more non-whitespace characters. Basically \S+ matches a column of space (or tab) delimited data.
'once' as an option means that MATLAB should stop searching once the pattern has been found once. It also has some effect on how output is returned when the user has asked for matched text to be returned as character vectors: with 'once' it returns one character vector for each input character vector, whereas without 'once', it returns a cell array of character vectors for each input character vector, even if only one result is found.
regexp() uses the syntax (?SOMETHING) for a number of advanced features. The ? does not act as a quantifier for (?SOMETHING) syntax.
In particular, (?<NAME>PATTERN) means that what is matched by the given PATTERN is to be stored in a struct field named NAME . So (?<col3>\S+) means to match an entire non-blank column, and to store what was found in a struct field named col3 . Thus, the effect of this regexp is to request that a struct be returned that has a field named col3 that matches the column after "lvac, m/s" and a field named col4 that matches the column after that.
'names' as an option tells it to return the structure mentioned before. When 'names' is requested and 'once' is not present in the options, then the struct returned can be a non-scalar struct; with 'once' it will either be an empty struct or a scalar struct.
You can tell whether there was any match by checking isempty(parts)
S = fileread('FilenameGoesHere.txt');
parts = regexp(S, 'lvac,\s+m/s\s+(?<col3>\S+)\s+(?<col4>\S+)', 'names', 'once');
if isempty(parts)
%the line is not present, do something appropriate
else
col3 = str2double(parts.col3);
col4 = str2double(parts.col4);
end
More Answers (0)
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!