Identify and build a list of unique combinations found in filenames using regexp?

1 view (last 30 days)
I have the file names of four files stored in a cell array called F2000. These files are named:
L14N_2009_2000MHZ.txt L8N_2009_2000MHZ.txt L14N_2010_2000MHZ.txt L8N_2009_2000MHZ.txt Each file consists of an mxn matrix where m is the same but n varies from file to file. I'd like to store each of the L14N files and each of the L8N files in two separate cell arrays so I can use dlmread in a for loop to store each text file as a matrix in an element of the cell array. To do this, I wrote the following code:
idx2009=cellfun('isempty',regexp(F2000,'L\d{1,2}N_2009_2000MHZ.txt'));
F2000_2009=F2000(idx2009);
idx2010=~idx2009;
F2000_2010=F2000(idx2010);
cell2009=cell(size(F2000_2009));
cell2010=cell(size(F2000_2010));
for k = 1:numel(F2000_2009)
cell2009{k}=dlmread(F2000_2009{k});
end
and repeated a similar "for" loop to use on F2000_2010. So far so good. However.
My real data set is much larger than just four files. The total number of files will vary, but there will be five years of data for each L\d{1,2}N (so, for instance, L8N_2009, L8N_2010, L8N_2011, L8N_2012, L8N_2013). I won't know what the number of files is ahead of time (although I do know it will range between 50 and 100), and I won't know what the file names are, but they will always be in the same L\d{1,2}N format.
In addition to what's already working, I want to count the number of files that have unique combinations of numbers in the portion of the filename that says L\d{1,2}N so I can further break down F2000_2010 and F2000_2009 in the above example to F2000_2010_L8N and F2000_2009_L8N before I start the dlmread loop.
Can I use regexp to build a list of all of my unique L\d{1,2}N occurrences? Next, can I easily change these list elements to strings to parse the original file names and create a new file name to the effect of L14N_2009, where 14 comes from \d{1,2}? I am sure this is a beginner question, but I discovered regexp yesterday! Any help is much appreciated!

Answers (1)

Jos (10584)
Jos (10584) on 7 Mar 2014
Regexp is nice but also a little cumbersome. Here is an alternative approach:
Files = {'L14N_2009_2000MHZ.txt' ;
'L8N_2009_2000MHZ.txt' ;
'L14N_2010_2000MHZ.txt' ;
'L8N_2009_2000MHZ.txt' }
FileYear = cellfun(@(S) sscanf(S,'L%*fN_%f_2000MHZ.txt'), Files)
FileType = cellfun(@(S) sscanf(S,'L%fN_%*f_2000MHZ.txt'), Files)
AllYears = unique(FileYear)
AllTypes = unique(FileType)
for YearIter = 1:numel(AllYears)
for TypeIter = 1:numel(AllTypes)
tf = FileYear == AllYears(YearIter) && ...
FileType == AllTypes(TypeIter)
tmpFiles = Files(tf)
% processing all files of this Year and Type
for FileIter = 1:numel(tmpFiles)
% your code for a single file here
end
end % TypeIter
end % YearIter

Categories

Find more on File Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!