Increase speed to read text file and parse date time data.

4 views (last 30 days)
I'm searching a faster way to read this file and convert the date and time to serial date number.
Datum Tid Värde
2015-10-12 00:02:16 23.399999619
2015-10-12 00:07:16 23.399999619
2015-10-12 00:12:16 23.399999619
2015-10-12 00:17:16 23.399999619
2015-10-12 00:22:17 23.399999619
2015-10-12 00:27:17 23.399999619
2015-10-12 00:32:17 23.399999619
2015-10-12 00:37:17 23.399999619
...
The text file contains a few hundred lines up to several thousands. I've tested five alternate solutions on R2016a. I attach the script and the text file. The results with tic/toc and profile are consistent.
&nbsp
The best code, "sscanf", is nearly twice as fast as the "standard".
The FEX-contribution, DateStr2Num by Jan Simon, is really fast. However, I failed to find a fast way to arrange the input data to fit the function. The line,
str = strcat( cac{1}(:,1), repmat({' '},[length(cac{1}(:,2)),1]), cac{1}(:,2) );
ruins the performance. There must be a better way!
Question: Which are the possibilities to increase the speed further?
&nbsp
ADDENDUM 2016-08-31
textsdn_2 (attached) is adapted to runperf. It contains two new cases. The summary result of runperf('textsdn_2.m') is
Name GroupCount mean_MeasuredTime
__________________________ __________ _________________
text2sdn_2/Standard 4 1.0608
text2sdn_2/DateStr2Num 4 0.7491
text2sdn_2/sscanf 4 0.41834
text2sdn_2/fscanf 4 0.59519
text2sdn_2/datetime 4 1.2143
text2sdn_2/DateStr2Num_19c 4 0.3142
text2sdn_2/dtstr2dtnummx 4 1.0475
In production a text file is read once. In these tests the file is read the first time during warmup and is from that point in time available in the system cache.
text2sdn_2/DateStr2Num_19c is three times faster than text2sdn_2/Standard and more than twice as fast as text2sdn_2/DateStr2Num. One reason is that date and time are kept together by using the format '%19c%f'. DateStr2Num doesn't distinguish between tab and space.
cac = textscan( fid, '%19c%f', 'Headerlines',1, 'CollectOutput',true );
text2sdn_2/dtstr2dtnummx is only slightly faster than text2sdn_2/Standard, which is because this test is based on a column of 480 timestamps. With a single timestamp the relative difference is much larger.

Accepted Answer

Yair Altman
Yair Altman on 30 Aug 2016
@Per - try to use dtstr2dtnummx(), as explained here: http://undocumentedmatlab.com/blog/datenum-performance
Yair Altman

More Answers (0)

Categories

Find more on MATLAB in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!