Read a textfile in MATLAB

1 view (last 30 days)
Gautam Marwaha
Gautam Marwaha on 1 Oct 2013
Commented: Cedric on 5 Oct 2013
I am using MATLAB to call another application which generates a huge text file of the results I want. I'm looking for a way to scan for lines which contain specific strings, retrieve specific numeric values from those lines and store them in a numeric array.
I've been using the following code:
fid = fopen('filename.txt');
var1 = textscan(fid,'%s','delimiter','\n');
lines = var1{1,1};
Since textscan creates a cell which has the whole text file as the first entry in that cell, I declare lines as a secondary variable to get get the text in separate lines.
Thereafter, I use the following code:
if strfind(lines{i}, 'String Sequence')== 1
line2 = char(lines{i});
This converts the lines{i} cell to a string. I need to retrieve specific numeric values from this string:
1. I know the position where the values I need are located in this string. 2. I know the string sequence which precedes these values.
For example the string retrieved could read:
line2 =
The fare for a 7% ROI = 343.24 DOLLARS/PASS
For multiple runs, the structure of the string remains the same, only the numeric value - 343.24 in this case - varies. Can someone suggest a way to implement the seach logic described above.
My program generates a text file with some 20,000 lines at each optimization run. Is there a better (faster) way of implementing the text search?
  4 Comments
Matt Kindig
Matt Kindig on 1 Oct 2013
Can you clarify what data you are trying to extract? Just the number in front of DOLLARS/PASS (as Cedric's answer below), or additional information?
Gautam Marwaha
Gautam Marwaha on 1 Oct 2013
Yes, the number before DOLLARS/PASS is the value I'm looking for. However it is just one of the numbers I need. In a more general output, there would be several (132 or more) lines which have the same phrase and I need all of them.

Sign in to comment.

Accepted Answer

Cedric
Cedric on 1 Oct 2013
Edited: Cedric on 2 Oct 2013
Could you copy/paste e.g. 20 lines of a typical file? If these lines that you are looking for are spread in a bunch of other types of lines, you can go for a REGEXP-base solution, e.g.
content = fileread( 'filename.txt' ) ;
values = str2double( regexp( buffer, '[\d\.]+ (?=DOLLARS/PASS)', 'match' )) ;
which assumes that only these lines contain the literal "DOLLARS/PASS". If it is not the case, the pattern could be complexified.
Note that if you were interested in this 7 in the "7% ROI" part, we could extract it as well.
On the other hand, if there is a well defined structure in your files, we should be able to avoid regular expressions and focus on a more basic approach.
EDIT 1
Ok, these files look quite regular but they are not easy to process, so I'll write a bit more than just hints. Here is how I would do it if I had to quickly find a solution. We start by reading the file:
content = fileread( 'outtie.txt' ) ;
For direct operating costs, assuming that you need both numbers (the per trip and the other), I would use a pattern with two tokens and convert the outcome into a 2 columns numeric array with the per trip as first column and the second number as second column:
pattern = ' DIRECT OPERATING COSTS\s+\(\$\s*([\d\.]+)/TRIP\)\s+([\d\.]+)' ;
tokens = regexp( content, pattern, 'tokens' ) ;
directOperatingCosts = reshape( str2double( [tokens{:}] ), 2, [] ).' ;
After running this, we get..
>> directOperatingCosts(:,1)
ans =
20364
18621
18621
28099
8361
23549
23513
28314
26232
28099
18621
30449
9526
10514
10514
>> directOperatingCosts(:,2)
ans =
269.7050
247.1580
247.1580
253.1300
233.5890
250.5370
250.5120
253.2440
252.1060
253.1240
247.1560
254.3680
236.2350
238.0660
238.0660
For the detailed flight summaries, we first get blocks of data associated with each segment. Then we extract all heights (1 number per data entry), and we finally store all other fields in their own array, because there are 2 numbers per data entry.
tokens = regexp( content, 'SEGMENT\s+\d+.+?ENG PAR([\d\.\-\s]+)', 'tokens' ) ;
nSummaries = numel(tokens) / 3 ;
detailedFlightSummary = cell( nSummaries, 3 ) ;
for sId = 1 : nSummaries
for phaseId = 1 : 3 % Climb, cruise, descent.
tokenId = (sId-1)*3 + phaseId ;
buffer = reshape( sscanf( tokens{tokenId}{1}, '%f' ), 27, [] ) ;
S.weight = buffer(1,:).' ;
S.otherFields = reshape( buffer(2:end,:), 13, [] ).' ;
detailedFlightSummary{sId,phaseId} = S ;
end
end
Note that you don't need to build a cell array of structs to store data; I proposed this just to illustrate the general approach. After running this, you get e.g. for the first plane, phase 2 (cruise):
>> detailedFlightSummary{1,2}
ans =
weight: [6x1 double]
otherFields: [12x13 double]
All weights:
>> detailedFlightSummary{1,2}.weight
ans =
183869
183000
180000
177000
174000
171644
Altitude/energy for 1st weight.
>> detailedFlightSummary{1,2}.otherFields(1:2,1)
ans =
40000
49321
Play a bit with this material and let me know if there is anything that you don't understand. Also, I give no guaranty that it is working, so it is up to you to understand and check.
  4 Comments
Gautam Marwaha
Gautam Marwaha on 5 Oct 2013
It works! I used regexp before but I wasn't sure how to select different values. The second part doesn't always work but I'm playing around to get the results I need. Just below 'the Direct Operating cost' section in the file, there's an OBJ/Vars summary. which has the variable names in one line and the values in the line after that. I need to select values for GW, AR, Thrust, W/S and T/W. How would you recommend selecting this, considering they don't follow the same order as operating costs?
Cedric
Cedric on 5 Oct 2013
Edited: Cedric on 5 Oct 2013
The regexp engine is most efficient with short/simple patterns, so when you have to extract values from complex tables, it is often more efficient to use regexp to extract the whole tables as blocks and to post-process the output (sometimes using a 2nd regexp), rather than to try a one shot operation with the most beautiful/elaborate pattern.
For your last question, I would actually get relevant lines with numeric values using something like
pattern = '#OBJ/VAR/CONSTR SUMMARY.+?T/W(.+?)DESIGN' ;
tokens = regexp( content, pattern, 'tokens' ) ;
convert lines/tokens to a numeric array of data
buffer = cellfun( @(c)sscanf( c{1}, '%f' ), tokens, ...
'UniformOutput', false ) ;
data = [buffer{:}].' ;
and then select relevant columns, e.g.
thrusts = data(:,10) ;

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!