Skip Lines (other than the Header) when Importing CSV File
11 views (last 30 days)
Show older comments
I have read a couple of entries about skipping header information when importing CSV files. While I don't fully understand them yet, I know that I'll also need to skip lines with text interspersed in with my data as well. How would I import a CSV file that has a Header but also includes lines of text between "blocks" of data? For instance, in the attached file, Lines 1-45 can be considered the "Header" and are easily skipped over. Lines 46-74 contain the actual data... skipping Lines 75-76... and then Lines 77-105 contain the next "block" of data. This pattern repeats and, depending on the length of file to be handled, could repeat a couple of thousand times (meaning could have around 2K "blocks" of data). I would like to be able to import the data blocks only so that I can do math (summing, averaging, max and min values) for specific "blocks" of data... I could do this in Excel, but I don't know how to automate the process without using Matlab. Any suggestions would be appreciated. Thank you.
0 Comments
Answers (2)
per isakson
on 10 Feb 2014
Edited: per isakson
on 11 Feb 2014
Are there any string values,which can be used as "Begin" and "End" of the blocks?
.
[The following day]
Try this
str = fileread('cssm.txt');
look_behind = '(?<=Frame \d{1,3}\s*\n)';
look_ahead = '(?=(\s*Frame \d{1,3}\s*)|(\s*$))';
expr2match = '[0-9\.\s]+?';
cac = regexp( str, [look_behind,expr2match,look_ahead], 'match' );
cac{3}
where cssm.txt contains
Frame 1
11.1 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 2
22.2 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 99
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
returns
ans =
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
.
To understand might take hours (or more) of reading and experimenting with regular expressions, especially "Lookaround Assertions" . However, it is worth the effort.
.
Use textscan to convert the text to numeric
buf = textscan( transpose(cac{3}), '%f%f%f%f', 'CollectOutput',true );
and
>> buf{1}
ans =
33.3000 2.0000 3.0000 13.0000
5.0000 11.0000 10.0000 8.0000
9.0000 7.0000 6.0000 12.0000
4.0000 14.0000 15.0000 1.0000
3 Comments
per isakson
on 10 Feb 2014
Edited: per isakson
on 11 Feb 2014
Here is an alternative value of look_behind, which is 'cleaner':
look_behind = '(?<=Frame \d{1,3}\s+)';
I had problems to make \d{1,3} match as many digit as possible, i.e. make it greedy. Next try
look_behind = '(?<=Frame \d++\s+)';
\d++ stands for all consecutive digits that there are (at that position)
per isakson
on 10 Feb 2014
Edited: per isakson
on 11 Feb 2014
I didn't study the answer of Kelly. However, length is a function of Matlab
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!