How to remove header in .txt file while retaining format of data

25 views (last 30 days)
Hello everyone,
I have a .txt file with 12 lines of a header that is useless information to me. Is there a way to remove the header so that I am left with only data?
dlmread does not work because I have dates to read, csvread is producing an error, and textscan seems to provide the data in an altered format.
This is the code I am using to retrieve the data so far:
A = textscan(FID,'%s','headerLines',12)
This is what the data looks like when it is output:
'3298602123321'
'1'
'2011-09-20'
'01:00:00'
'10.8554401397705'
'3298602123321'
'1'
'2011-09-20'
'01:15:00'
'10.8555603027344'
This is what I need the data to look like:
3298602123321 1 2011-09-20 01:00:00 10.8554401397705
3298602123321 1 2011-09-20 01:15:00 10.8555603027344
Any help will be greatly appreciated, thanks for taking the time to consider it. Thanks in advance for any posts!
~Sarah (:
  1 Comment
Sarah
Sarah on 30 Sep 2011
What I need the data to look like is the way that it is in the .txt file before accessing it. I only want the header gone and the data to remain unchanged.
Thanks again!

Sign in to comment.

Answers (3)

Fangjun Jiang
Fangjun Jiang on 30 Sep 2011
To read it directly from the text file and got the format you want, you can do
FID=fopen('test.txt','rt');
A = textscan(FID,'%f %d %s %s %f');
fclose(FID);
format long
Sensor_ID=A{1}
Point_ID=A{2}
SampleDate=A{3}
SampleTime=A{4}
SampleValue=A{5}
Or you can use reshape()
A={'3298602123321'
'1'
'2011-09-20'
'01:00:00'
'10.8554401397705'
'3298602123321'
'1'
'2011-09-20'
'01:15:00'
'10.8555603027344'}
B=reshape(A,5,[])';
  4 Comments
Sarah
Sarah on 6 Oct 2011
Thanks for the updated code. The format long option however is providing empty matrices. I can not think of a reason for this. The reshape option is still returning an error as well. Thanks for all you help. (:
Fangjun Jiang
Fangjun Jiang on 6 Oct 2011
Edited: Walter Roberson on 31 Mar 2016
What do you mean? Copy the four lines below to test.txt and run the code.
3298602123321 1 2011-09-20 01:00:00 10.8554401397705
3298602123321 1 2011-09-20 01:15:00 10.8555603027344
3298602123321 1 2011-09-20 01:30:00 10.8555603027344
3298602123321 1 2011-09-20 01:45:00 10.8560495376587
I got:
Sensor_ID =
1.0e+012 *
3.298602123321000
3.298602123321000
3.298602123321000
3.298602123321000
Point_ID =
1
1
1
1
SampleDate =
'2011-09-20'
'2011-09-20'
'2011-09-20'
'2011-09-20'
SampleTime =
'01:00:00'
'01:15:00'
'01:30:00'
'01:45:00'
SampleValue =
10.855440139770501
10.855560302734400
10.855560302734400
10.856049537658700

Sign in to comment.


Walter Roberson
Walter Roberson on 30 Sep 2011
Corrected:
A = textscan(FID,'%[^\n]','headerLines',12);
  4 Comments
Sarah
Sarah on 6 Oct 2011
Thank you Walter for the update, the returned information is formatted well line by line. However, I am having trouble getting the information to be parsed as 5 separate variables while they retain the correlations with the other variables. I don't know if I am explaining myself well, but my goal is to be able to access the information either independently or as a group. Thanks for all your help! :)
Walter Roberson
Walter Roberson on 6 Oct 2011
This contradicts what you wrote earlier,
"What I need the data to look like is the way that it is in the .txt file before accessing it. I only want the header gone and the data to remain unchanged."
The code I supplied skips the header and reads in everything else as strings *exactly* the same way, space for space, character for character, as appears in the file.
If you want the data in the file split up into different variables, then you need to tell us what data type you want for each column, and you need to understand that unless you are wanting to split in to strings only, that the whitespace (blanks) might change, which would leave data that is *not* "the way it is in the text file before accessing it".

Sign in to comment.


lpetley
lpetley on 31 Mar 2016
When using an ASCII file with several lines of headers, the best approach is to use the importdata() function. You can tell it how many lines of the file comprise the header, and it will load subsequent data in a numerical format.
  1 Comment
Walter Roberson
Walter Roberson on 31 Mar 2016
That would not meet the requirement the user had for "the data to remain unchanged." Also, the data is not all in numeric format: there are two fields which contain time strings. importdata() would return a scalar struct that would have to be examined for its 'data' field and its 'textdata' field, and it would be necessary to figure out how textimport handles such things when the text fields might occur in the middle of a line. It is not clear that that is "best".
It would make more sense to use readtable() from R2013b onward, especially from R2014b onward (when it gained datetime handling), as that does not change the order of fields and is well defined as to how the various field types are handled.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!