Skip Lines (other than the Header) when Importing CSV File

11 views (last 30 days)
I have read a couple of entries about skipping header information when importing CSV files. While I don't fully understand them yet, I know that I'll also need to skip lines with text interspersed in with my data as well. How would I import a CSV file that has a Header but also includes lines of text between "blocks" of data? For instance, in the attached file, Lines 1-45 can be considered the "Header" and are easily skipped over. Lines 46-74 contain the actual data... skipping Lines 75-76... and then Lines 77-105 contain the next "block" of data. This pattern repeats and, depending on the length of file to be handled, could repeat a couple of thousand times (meaning could have around 2K "blocks" of data). I would like to be able to import the data blocks only so that I can do math (summing, averaging, max and min values) for specific "blocks" of data... I could do this in Excel, but I don't know how to automate the process without using Matlab. Any suggestions would be appreciated. Thank you.

Answers (2)

per isakson
per isakson on 10 Feb 2014
Edited: per isakson on 11 Feb 2014
Are there any string values,which can be used as "Begin" and "End" of the blocks?
.
[The following day]
Try this
str = fileread('cssm.txt');
look_behind = '(?<=Frame \d{1,3}\s*\n)';
look_ahead = '(?=(\s*Frame \d{1,3}\s*)|(\s*$))';
expr2match = '[0-9\.\s]+?';
cac = regexp( str, [look_behind,expr2match,look_ahead], 'match' );
cac{3}
where cssm.txt contains
Frame 1
11.1 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 2
22.2 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 99
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
returns
ans =
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
.
To understand might take hours (or more) of reading and experimenting with regular expressions, especially "Lookaround Assertions" . However, it is worth the effort.
.
Use textscan to convert the text to numeric
buf = textscan( transpose(cac{3}), '%f%f%f%f', 'CollectOutput',true );
and
>> buf{1}
ans =
33.3000 2.0000 3.0000 13.0000
5.0000 11.0000 10.0000 8.0000
9.0000 7.0000 6.0000 12.0000
4.0000 14.0000 15.0000 1.0000
  3 Comments
per isakson
per isakson on 10 Feb 2014
Edited: per isakson on 11 Feb 2014
Here is an alternative value of look_behind, which is 'cleaner':
look_behind = '(?<=Frame \d{1,3}\s+)';
I had problems to make \d{1,3} match as many digit as possible, i.e. make it greedy. Next try
look_behind = '(?<=Frame \d++\s+)';
\d++ stands for all consecutive digits that there are (at that position)
per isakson
per isakson on 10 Feb 2014
Edited: per isakson on 11 Feb 2014
I didn't study the answer of Kelly. However, length is a function of Matlab

Sign in to comment.


joe
joe on 11 Feb 2014
Ended up doing this
% code
fid = fopen(filename); %opens file of name
r = 1; %starts at 1
tline = fgets(fid); %reads first line of file of name
while(ischar(tline)) %while the first line of the file contains a character
if(isstrprop(tline(1), 'digit') || (tline(1) == 'B' && tline(2) == ',')) %if the first line of the file is a number or a 'B' or a ','... do...
tline = strrep(tline, 'B', '0'); %replace all "B's" with zeros for later mathmatical manipulation of data
%disp(r)
eval(['x = {' tline '}']); %this line takes the string (each line) of file and places in a cell 'x'
M(r,:) = x; %makes a row of the matrix M with x's
r = r + 1; %keeping track of what row in matrix M we are in
end
tline = fgets(fid); %error case
end
fclose(fid); %closing file
end
I'll admit I got some help with the while loop and the "eval" line.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!