Skip Lines (other than the Header) when Importing CSV File

Question

joe on 10 Feb 2014

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/115490-skip-lines-other-than-the-header-when-importing-csv-file

Answered: joe on 11 Feb 2014

I have read a couple of entries about skipping header information when importing CSV files. While I don't fully understand them yet, I know that I'll also need to skip lines with text interspersed in with my data as well. How would I import a CSV file that has a Header but also includes lines of text between "blocks" of data? For instance, in the attached file, Lines 1-45 can be considered the "Header" and are easily skipped over. Lines 46-74 contain the actual data... skipping Lines 75-76... and then Lines 77-105 contain the next "block" of data. This pattern repeats and, depending on the length of file to be handled, could repeat a couple of thousand times (meaning could have around 2K "blocks" of data). I would like to be able to import the data blocks only so that I can do math (summing, averaging, max and min values) for specific "blocks" of data... I could do this in Excel, but I don't know how to automate the process without using Matlab. Any suggestions would be appreciated. Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

per isakson on 10 Feb 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/115490-skip-lines-other-than-the-header-when-importing-csv-file#answer_123813

Edited: per isakson on 11 Feb 2014

Are there any string values,which can be used as "Begin" and "End" of the blocks?

See: Can REGEXP or TEXTSCAN be used to split 2 distinct data sets from a single text file?

.

[The following day]

Try this

    str = fileread('cssm.txt');
    look_behind = '(?<=Frame \d{1,3}\s*\n)';
    look_ahead  = '(?=(\s*Frame \d{1,3}\s*)|(\s*$))';
    expr2match  = '[0-9\.\s]+?';
    cac = regexp( str, [look_behind,expr2match,look_ahead], 'match' );
    cac{3}

where cssm.txt contains

    Frame 1
1     2     3    13
  11    10     8
   7     6    12
  14    15     1
    Frame 2
2     2     3    13
  11    10     8
   7     6    12
  14    15     1
    Frame 99
3     2     3    13
  11    10     8
   7     6    12
  14    15     1

returns

    ans =
3     2     3    13
  11    10     8
   7     6    12
  14    15     1

.

To understand might take hours (or more) of reading and experimenting with regular expressions, especially "Lookaround Assertions" . However, it is worth the effort.

.

Use textscan to convert the text to numeric

buf = textscan( transpose(cac{3}), '%f%f%f%f', 'CollectOutput',true );

and

    >> buf{1}
    ans =
       33.3000    2.0000    3.0000   13.0000
        5.0000   11.0000   10.0000    8.0000
        9.0000    7.0000    6.0000   12.0000
        4.0000   14.0000   15.0000    1.0000

3 Comments
Show 1 older commentHide 1 older comment

per isakson on 10 Feb 2014

Edited: per isakson on 11 Feb 2014

Here is an alternative value of look_behind, which is 'cleaner':

look_behind = '(?<=Frame \d{1,3}\s+)';

I had problems to make \d{1,3} match as many digit as possible, i.e. make it greedy. Next try

look_behind = '(?<=Frame \d++\s+)';

\d++ stands for all consecutive digits that there are (at that position)

per isakson on 10 Feb 2014

Edited: per isakson on 11 Feb 2014

I didn't study the answer of Kelly. However, length is a function of Matlab

Sign in to comment.

Answer 2

joe on 11 Feb 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/115490-skip-lines-other-than-the-header-when-importing-csv-file#answer_124025

Ended up doing this

    % code
    fid = fopen(filename);  %opens file of name 
 r = 1;                  %starts at 1
tline = fgets(fid);     %reads first line of file of name
 while(ischar(tline))    %while the first line of the file contains a character
    if(isstrprop(tline(1), 'digit') || (tline(1) == 'B' && tline(2) == ',')) %if the first line of the file is a number or a 'B' or a ','... do...
        tline = strrep(tline, 'B', '0');    %replace all "B's" with zeros for later mathmatical manipulation of data
        %disp(r)
        eval(['x = {' tline '}']);          %this line takes the string (each line) of file and places in a cell 'x'
        M(r,:) = x;                         %makes a row of the matrix M with x's
        r = r + 1;                          %keeping track of what row in matrix M we are in
    end
    tline = fgets(fid);                     %error case
end
 fclose(fid);                                %closing file
 end

I'll admit I got some help with the while loop and the "eval" line.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Skip Lines (other than the Header) when Importing CSV File

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Skip Lines (other than the Header) when Importing CSV File

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

3 Comments Show 1 older commentHide 1 older comment

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments