Find the filename with the biggest number?

8 views (last 30 days)
Jacqueline
Jacqueline on 24 Jul 2013
Hi, so I have a folder with a bunch of files that come in each day. They look like this...
20130721_SPLBRENT3_140554.mat
20130721_SPLBRENT3_160554.mat
20130721_SPLBRENT3_180554.mat
20130722_SPLBRENT3_075651.mat
20130722_SPLBRENT3_095651.mat
20130723_SPLBRENT3_075949.mat
20130723_SPLBRENT3_102025.mat
So, for example, 20130722_SPLBRENT3_095651.mat is from 7/22/2013 and the data in the file was gathered at 9:56am. I am trying to write a code that finds the latest data (20130723_SPLBRENT3_102025.mat), NOT the last file uploaded (because all the files are uploaded at once and one from the 21st may come in before one from the 23rd). How do I search for the file with the latest date and time in the file name?
  3 Comments
Jacqueline
Jacqueline on 24 Jul 2013
The latest date and time overall, out of all the files.
Jan
Jan on 24 Jul 2013
+1: This is a nice example to demonstrate different techniques to improve code. Sorry, Jacqueline, I know that this was not your intention. But at least a fast, faster and fastest solution is still a solution :-)

Sign in to comment.

Answers (3)

Jan
Jan on 24 Jul 2013
Edited: Jan on 24 Jul 2013
Some simplifications to Azzi's code:
s = {'20130721_SPLBRENT3_140554.mat'; ...
'20130721_SPLBRENT3_160554.mat'; ...
'20130721_SPLBRENT3_180554.mat'; ...
'20130722_SPLBRENT3_075651.mat'; ...
'20130722_SPLBRENT3_095651.mat'; ...
'20130723_SPLBRENT3_075949.mat'; ...
'20130723_SPLBRENT3_102025.mat'}
a = regexp(s, '_|\.', 'split');
b = cat(1, a{:});
date = datenum(b(:,1), 'yyyymmdd') + datenum(b(:,3), 'HHMMSS');
[max_date,idx] = max(date);
latest_file = s{idx};
When a function operates on cells directly like REGEXP and DATENUM, CELLFUN especially when combined with anonymous functions is much slower. When s contains 10'000 distinct strings, omitting CELLFUN reduces the runtime from 6.7 seconds to 0.16 seconds (R2009a/64/Win7). In addition the leaner code is less prone to typos and easier to understand and debug.
Of course the runtime does not matter here most likely, because the number of files might be small. But it could be useful for other problems, when equivalent solutions are applied.
Btw., this is reduces the runtime by further 50%:
c = CStrCatStr(b(:, 1), 'T', b(:, 3));
date = DateStr2Num(c, 30);
See FEX: CStrCatStr and FEX: DateStr2Num. But be aware, that downloading and compiling would need much more time that you ever could win for such small problems. But it can be useful when working with millions of files or with 1000 files in real-time.
And the last thought about efficient programs: I've shown different methods to perform the same operations faster. But exploiting, that the chronological order equals the alphabetical order is again 4 times faster than the C-Mex monsters. The recognition of such useful patterns in the data is usually much more important than multi-cores, Gigas (Hz or Bytes) or sophisticated vectorizations. Then the person, who decided to use these nice names solved the problem most efficiently already.
  3 Comments
Cedric
Cedric on 25 Jul 2013
Edited: Cedric on 25 Jul 2013
It splits file names using either '_' or '.' as a separator:
>> s = regexp('20130722_SPLBRENT3_075651.mat', '_|\.', 'split')
s =
'20130722' 'SPLBRENT3' '075651' 'mat'
The pipe | means "or", and the . has to be backslash-ed because it has a special meaning in regular expressions ( '\.' codes the dot character, and '.' is a wildcard for any character).

Sign in to comment.


Azzi Abdelmalek
Azzi Abdelmalek on 24 Jul 2013
Edited: Azzi Abdelmalek on 24 Jul 2013
s={'20130721_SPLBRENT3_140554.mat'
'20130721_SPLBRENT3_160554.mat'
'20130721_SPLBRENT3_180554.mat'
'20130722_SPLBRENT3_075651.mat'
'20130722_SPLBRENT3_095651.mat'
'20130723_SPLBRENT3_075949.mat'
'20130723_SPLBRENT3_102025.mat'}
a=cellfun(@(x) regexp(x,'_|\.','split'),s,'un',0)
date=cell2mat(cellfun(@(x) datenum([x{1} ' ' x{3}],'yyyymmdd HHMMSS'),a,'un',0))
[max_date,idx]=max(date)
latest_file=s{idx} % The latest file
latest_date=datestr(max_date,'dd-mm-yyyy HH:MM:SS')

Jan
Jan on 24 Jul 2013
Congratulations! If the format of the names is "20130723_SPLBRENT3_102025", the alphabetical order equals the temporal order. Then this is sufficient:
list = dir(fullfile(FolderName, '*.mat'));
name = {list.name};
sorted = sort(name);
latest = sorted{length(sorted)};
In all cases I have seen yet, the reply of dir is alphabetically sorted already. But as long as this is not documented, I'd rely on an explicit sorting.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!