Reading in data from an irregular text file

6 views (last 30 days)
John
John on 23 Apr 2014
Commented: Benjamin Wells on 12 Oct 2018
Hi
I have a text file in the following format:
##########################################################
# Query made at 04/23/2014 14:04:07 UTC
# Time interval: from 04/23/2014 00:00 to 04/23/2014 23:59 UTC
##########################################################
##########################################################
# CYTZ, Toronto Island Airport Automated Reporting Station (Canada)
# WMO index: 71265
# Latitude 43-38N. Longitude 079-24W. Altitude 77 m.
##########################################################
###################################
# METAR/SPECI from CYTZ
###################################
201404231300 METAR CYTZ 231300Z AUTO 33016G23KT 310V010 9SM OVC020 04/M00
A2990 RMK MAX WND 32018KT AT 1201Z
SLP127=
201404231200 METAR CYTZ 231200Z AUTO 32015G28KT 300V010 9SM OVC018 03/00
A2987 RMK MAX WND 32020KT AT 1153Z
SLP118=
201404231100 METAR CYTZ 231100Z AUTO 33019G27KT 9SM BKN016 BKN045 OVC060
03/01 A2983 RMK MAX WND 32020KT AT
1059Z SLP102=
201404231047 SPECI CYTZ 231047Z AUTO 33016G26KT 9SM BKN016 BKN043 OVC049
03/01 A2982 RMK MAX WND 32019KT AT
1034Z SLP100=
201404231045 SPECI CYTZ 231045Z AUTO 33017G26KT 9SM SCT016 BKN043 OVC049
03/01 A2982 RMK MAX WND 32019KT AT
1034Z SLP100=
201404231000 METAR CYTZ 231000Z AUTO 32015G22KT 9SM -RA OVC041 04/01 A2979
RMK MAX WND 33018KT AT 0933Z SLP091=
201404230928 SPECI CYTZ 230928Z AUTO 32017G24KT 9SM -RA FEW017 OVC038
04/01 A2978 RMK MAX WND 33018KT AT
0913Z SLP086=
201404230900 METAR CYTZ 230900Z AUTO 33012G21KT 9SM SCT017 BKN038 BKN048
OVC065 04/01 A2976 RMK MAX WND 32018KT
AT 0844Z SLP081=
201404230800 METAR CYTZ 230800Z AUTO 33014G27KT 9SM FEW019 OVC060 04/01
A2974 RMK MAX WND 32020KT AT 0750Z
SLP073=
201404230749 SPECI CYTZ 230749Z AUTO 32016G26KT 9SM FEW019 BKN060 OVC076
04/01 A2974 RMK MAX WND 33019KT AT
0704Z SLP071=
201404230732 SPECI CYTZ 230732Z AUTO 33014G22KT 9SM -RA FEW018 SCT028
BKN048 OVC065 04/01 A2973 RMK MAX
WND 33019KT AT 0704Z SLP069=
201404230700 METAR CYTZ 230700Z AUTO 32015G21KT 9SM FEW018 OVC030 05/02
A2973 RMK MAX WND 33019KT AT 0643Z
SLP068=
This file has an undefined length. I wish to have a vector for the year, month, day and time.
I also wish to associate another vector to this one using the values following "RMK MAX WIND". In the first line this would be 32018KT, but without the KT.
Ideally I would want two vectors with the following first 6 indexes:
date =
201404231300
201404231200
201404231100
201404231047
201404231045
201404231000
wind =
32018
32020
32020
32019
32019
NaN
  5 Comments
dpb
dpb on 23 Apr 2014
Are the line breaks in the file as shown here or is that a figment of the forum formatting? It's more complicated by the fact that there are a variable number of lines/records per observation if it is as shown...
John
John on 23 Apr 2014
I have attatched the original file if you want to look into the line breaks

Sign in to comment.

Answers (2)

Joshua Hrisko
Joshua Hrisko on 15 Feb 2018
If you're reading in METAR data, I recommend using MATLAB's 'webread' to parse in the html from your specific METAR site. I wrote an entire blog article about it here:
When you import the METAR data directly from the database, it maintains its shape and you don't have to worry about the irregular columns and row like you have in your case.
  1 Comment
Benjamin Wells
Benjamin Wells on 12 Oct 2018
I love what you did there. Great work. The scale is odd though, it makes it look like the dew point is higher than the temperature. Especially in very dry areas, because the scale on the right does not match the scale on the left.

Sign in to comment.


Kelly Kearney
Kelly Kearney on 23 Apr 2014
Edited: Kelly Kearney on 23 Apr 2014
If every entry always ends with the =, then that makes things much easier to parse. I think you should be able to grab what you need via regular expressions
txt = fileread('~/Downloads/METAR.txt');
txt = regexprep(txt, '.*#', ''); % Remove header
txt = regexp(txt, '=', 'split'); % Split entries
notxt = regexp(txt, '^\s*$');
txt(~cellfun('isempty', notxt)) = [];
ds = regexp(txt, '\d+', 'match', 'once');
dn = datenum(ds, 'yyyymmddHHMM');
ds = cellfun(@str2num, ds);
numtmp = regexp(txt, 'RMK\s*MAX\s*WND\s*(\d+)', 'tokens');
isemp = cellfun('isempty', numtmp);
tmp = cellfun(@(x) x{1}{1}, numtmp(~isemp), 'uni', 0);
num = nan(size(numtmp));
num(~isemp) = cellfun(@str2num, tmp);
[ds' num']
---------
ans =
201404231300 32018
201404231200 32020
201404231100 32020
201404231047 32019
201404231045 32019
201404231000 33018
201404230928 33018
201404230900 32018
201404230800 32020
201404230749 33019
201404230732 33019
201404230700 33019
201404230623 32018
201404230618 33017
201404230605 NaN
201404230600 NaN
201404230538 NaN
201404230519 NaN
201404230500 31020
201404230400 NaN
201404230300 NaN
201404230257 NaN
201404230200 NaN
201404230126 NaN
201404230100 31018
201404230014 31018
201404230000 30021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!