How to inclusively extract rows of a large cell array between cells given start and end patterns?

Hello Folks,
I am searching for the most efficent method to parse a large text file (typically 2-4 GB) for ocurrences of a message. I have to search ~100 large files for dozens of messages so efficiency will be quite significant. I have attached a sample_input.txt with two occurrences of a message specified in the considerations below.
Considerations:
1) start of the message is: 'Hello_Message.pdf'
2) end of the message is: '&&&'
3) store all lines of each occurence of the message to an array within a structure
5) all messages have a header pattern '.*\.[a-zA-Z]{3}\n\r' and end with pattern '&&&\n\r'
4) hopeful to avoid for loops by filtering using a function for example extractBetween, Contains, regexpPattern, or other function(s)
The code below does not work but hopefully it provides an idea of what I was thinking...
clear
close all
clc
Input_fid = fopen(sample_input.txt);
ftext = textscan(Input_fid,'%s','Delimeter','\n\r');
fclose(Input_fid)
% I want to inclusively capture the start of the message 'Hello_Message.pdf' and the end
% of the message '&&&' along with all rows between the start and end of each ocurrence
% of the message
for check = 1:height(ftext{1})
HelloMsgs.Occurrences(check) = extractBetween(ftext{1},regexpPattern('Hello_Message.pdf.*\n\r'),regexpPattern('&&&\n\r'));
end
Desired Output:
HelloMsgs.Occurrences(1) <--- cell array of all lines of first occurrence of the Hello_Message in its
own row cell
HelloMsgs.Occurrences(2) <--- cell array of all lines of second occurrence of the Hello_Message in its
own row cell
HelloMsgs.Occurrences(3) <--- cell array of all lines of third occurrence of the Hello_Message in its
own row cell
Thank you in advance for your time. I am new to posting a coding question in a forum so hopefully I explained
the problem well enough.

4 Comments

There are anomalies in the file that keep this approach from working correctly.
My attempt —
type('sample_input.txt')
Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.txt 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> thisdata</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Bye_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> sadfsdfdsfasdf</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\ &&&
fidi = fopen('sample_input.txt','rt')
fidi = 3
k = 1;
while ~feof(fidi)
Line{k,:} = fgetl(fidi);
k = k+1;
end
fclose(fidi);
k
k = 92
Line
Line = 91×1 cell array
{0×0 char } {'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'} {'&&&' } {0×0 char } {0×0 char } {0×0 char } {'Hello_Message.txt' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' }
for k1 = 1:k-1
if ~isempty(Line{k1,:})
if strmatch(Line{k1,:},'Hello_Message.pdf')
% Start(k1) = 1
sprintf('Start = %2d',k1)
end
if strmatch(Line{k1}, '&&&')
% End(k1) = 1;
sprintf('End = %2d',k1)
end
end
end
ans = 'Start = 2'
ans = 'End = 21'
ans = 'End = 44'
ans = 'End = 65'
ans = 'Start = 72'
ans = 'End = 91'
.
You specifically stated:
start of the message is: 'Hello_Message.pdf'
so that is all I considered. If you want to get all of them, there are ways to do that, fopr example the extractBefore function, and then compare only the part up to the end of the file prefix. I changed it in my posted Answer.
Hi Star Strider,
Thank you very much for your time and patience with me. Looks like I could have done better with how I explained the problem. I am reviewing your solution.
Thank you.
I substituted extractBetween for extractBefore since that gives the appropriate result in my ‘Extract’ cell array.

Sign in to comment.

 Accepted Answer

type('sample_input.txt')
Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.txt 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> thisdata</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Bye_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> sadfsdfdsfasdf</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\ &&&
fidi = fopen('sample_input.txt','rt');
fidi = 3
k = 1;
while ~feof(fidi)
Line{k,:} = fgetl(fidi);
k = k+1;
end
fclose(fidi);
k
k = 92
% Line
Line = 91×1 cell array
{0×0 char } {'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'} {'&&&' } {0×0 char } {0×0 char } {0×0 char } {'Hello_Message.txt' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' }
for k1 = 1:k-1
if ~isempty(Line{k1,:})
Lc = strfind(extractBetween(Line{k1,:},'_','.'),'Message');
if ~isempty(Lc)
Start(k1) = 1;
% sprintf('Start = %2d',k1)
end
if strfind(Line{k1}, '&&&')
End(k1) = 1;
% sprintf('End = %2d',k1)
end
end
end
StartIdx = find(Start)
StartIdx = 1×4
2 25 46 72
EndIdx = find(End)
EndIdx = 1×4
21 44 65 91
for k = 1:numel(StartIdx)
Extract{k,:} = Line(StartIdx(k):EndIdx(k));
end
Extract{1}
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'} {'&&&' }
Extract{end}
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\'} {'&&&' }
EDIT — (18 Oct 2023 at 03:42)
I am a bit lost with respect to ‘start key’ and ‘stop key’. My code defines ‘StartIdx’ and ‘StopIdx’ as the indices that define the ‘Message’ and ‘&&&’ entries. The ‘Extract’ cell arrays are those lines and all the lines between them.
My initial approach was to use the fileread function and then do ‘logical indexing’, however that failed so the loop was the only other available option.
My code here is the same code I posted as a Comment, changed to test for all the ‘Message’ lines and not only ‘Hello_Message.pdf’ that was initially specified.
The regexp approach is not specific enough for this requirement.
.

6 Comments

Hi Star Strider. I reviewed your code above to make certain I understand what is happening. I am not sure how I would modify the line with Lc to only get the 'Hello_Message.pdf' messages, everything I have tried still extracts all the messages.
I made a modified version, code below, that only pulls out the 'Hello_Message.pdf' messages. I greatly appreciate your time! The code below looks to be inefficient in that the first message at StartIdx = 2 & EndIdx = 21, the line at "for k1 = 1:k-1" jumps all the way back to StartIdx = 3 instead of continuing from StartIdx = 22 which would not be so inefficient.
I would like to try to get the results of the code below but with your method used in the code above.
clear
clc
fidi = fopen('sample_input.txt','rt');
k=1;
while ~feof(fidi)
Line{k,1} = fgetl(fidi);
k = k+1;
end
fclose(fidi);
for k1 = 1:k-1
if strfind(Line{k1,1}, 'Hello_Message.pdf')
Start(k1)=1;
StartIdx = find(Start);
for k2 = k1+1:k-1
if strfind(Line{k2,1}, '&&&')
End(k2) = 1;
EndIdx = find(End);
break
end
end
end
end
for k = 1:numel(StartIdx)
Extract{k,1} = Line(StartIdx(k):EndIdx(k)); % contains the solution I am looking for
end
Filtered_Msgs_StartEnd = {StartIdx,EndIdx}
Cnt_Filtered_Msgs = numel(Filtered_Msgs_StartEnd)
Extract{:}
I just now ran this and it seems to do what you want.
What specifically would you want to change?
clear
clc
fidi = fopen('sample_input.txt','rt');
k=1;
while ~feof(fidi)
Line{k,1} = fgetl(fidi);
k = k+1;
end
fclose(fidi);
for k1 = 1:k-1
if strfind(Line{k1,1}, 'Hello_Message.pdf')
Start(k1)=1;
StartIdx = find(Start);
for k2 = k1+1:k-1
if strfind(Line{k2,1}, '&&&')
End(k2) = 1;
EndIdx = find(End);
break
end
end
end
end
for k = 1:numel(StartIdx)
Extract{k,1} = Line(StartIdx(k):EndIdx(k)); % contains the solution I am looking for
end
Filtered_Msgs_StartEnd = {StartIdx,EndIdx}
Filtered_Msgs_StartEnd = 1×2 cell array
{[2 72]} {[21 91]}
Cnt_Filtered_Msgs = numel(Filtered_Msgs_StartEnd)
Cnt_Filtered_Msgs = 2
Extract{:}
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'} {'&&&' }
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\'} {'&&&' }
.
The output for the code above is correct but the way I am iterating is a bit off and I would like to make it more efficient. The first instance of the 'Hello_Message.pdf' is on sample_input.txt lines 2-21 with the second instance of the 'Hello_Message.pdf' occurring between lines 72-91...
On the first iteration of the first for loop the first message is extracted as intended, however at the start of the second iteration of that first for loop K1 is back at line 3. I can't figure out how to not repeat lines 3-21 in the first for loop and proceed from the line after the first extracted message so that I don't spend any time reparsing lines of the first extracted message. Maybe a better way of explaining what I mean is that for iteration 2 of the first for loop I'd prefer that, instead of starting from line 3, I would like it start from line 23 to conitue seaching for the next instance of the StartIdx so that previously parsed lines don't get reparsed.
Perhaps it would be better for me to just modify the Lc variable from your response on (on 18 Oct 2023 at 2:12), but nothing I have tried seems to result in the same output as the code from my most recent response?
Without altering my previous code significantly, I added an extra for loop to extract the next ‘EndIdx’ value greater than the preceeding ‘StartIdx’ value, and then saving those values (initially assigned to ‘NextEnd’) to ‘EndIdx’ afterwards. That produced two sets of consecutive ‘’StartIdx’ and ‘EndIdx’ values for each section.
type('sample_input.txt')
Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.txt 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> thisdata</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Bye_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> sadfsdfdsfasdf</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds &&& Hello_Message.pdf 2341234342 3214234 ert 2341234342 3214234 abc 2341234342 3214234 Some_ting 23453425 Blah_bleh Sadf_5 Ouch 4 TEST Asdff: sdf_sdf Is_sdf: asdf IS_ssg: sadf NJ_T: adfgh Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\ &&&
fidi = fopen('sample_input.txt','rt');
k = 1;
while ~feof(fidi)
Line{k,:} = fgetl(fidi);
k = k+1;
end
fclose(fidi);
k
k = 92
% Line
for k1 = 1:k-1
if ~isempty(Line{k1,:})
Lc = strfind(Line{k1,:}, 'Hello_Message.pdf');
% Lc = strfind(extractBetween(Line{k1,:},'_','.'),'Message');
if ~isempty(Lc)
Start(k1) = 1;
% sprintf('Start = %2d',k1)
end
if strfind(Line{k1}, '&&&')
End(k1) = 1;
% sprintf('End = %2d',k1)
end
end
end
StartIdx = find(Start);
EndIdx = find(End);
for k = 1:numel(StartIdx)
NextEnd(k) = EndIdx(find(EndIdx > StartIdx(k), 1));
end
StartIdx
StartIdx = 1×2
2 72
EndIdx = NextEnd
EndIdx = 1×2
21 91
for k = 1:numel(StartIdx)
Extract{k,:} = Line(StartIdx(k):EndIdx(k));
end
Extract{1}
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'} {'&&&' }
Extract{end}
ans = 20×1 cell array
{'Hello_Message.pdf' } {'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> iron </Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds\'} {'&&&' }
This approach (adding an additional loop) is likely the most efficient way to choose the correct ‘EndIdx’ for each ‘StartIdx’.
.
Thank you Star Strider, your solution works great. Thanks again for your time and patience with me.

Sign in to comment.

More Answers (1)

Why do you want to avoid loops? Reading the file completely to apply vectorized methods requires 8 GB of contiguous free RAM for a 4 GB file (16 bit per char). I'd choose such an approach only on computers with >= 32 GB RAM, while a loop method is less demanding concering the RAM. In addition a filtering during the reading avoid to keep the complete text in the RAM.
S = ParseFile("sample_input.txt");
S{1}
ans = 18×1 cell array
{'2341234342 3214234 ert' } {'2341234342 3214234 abc' } {'2341234342 3214234' } {'Some_ting' } {'23453425' } {'Blah_bleh' } {'Sadf_5' } {'Ouch 4' } {'TEST' } {' ' } {' ' } {' ' } {'Asdff: sdf_sdf' } {'Is_sdf: asdf' } {'IS_ssg: sadf' } {'NJ_T: adfgh' } {0×0 char } {'Some_data_: 4 sadf sadf asdf 45676578675 sdaf sadf asdf asdf sadf 4365436546 sdfdsf 0 sadfsdffds 0 <Item> toaster</Item> dsfasdf sadfdsakfdsfklj sdafsdafdsa fds'}
function S = ParseFile(File)
startKey = "Hello_Message.pdf";
stopKey = "&&&";
fid = fopen(File, 'r');
assert(fid > 0, "Cannot open file: %s", File);
bS = 1000; % Pre-allocate output in blocks
nS = bS;
iS = 0;
S = cell(1, nS);
buffer = cell(20, 1); % Grows iteratively at first
ibuffer = 0;
doGrab = false;
while ~feof(fid)
Line = fgetl(fid);
if startsWith(Line, startKey)
buffer(:) = {[]}; % Clear the buffer
ibuffer = 0;
doGrab = true; % Start grabbing in next line
elseif startsWith(Line, stopKey)
doGrab = false; % Stop grabbing
iS = iS + 1; % Expand output S in blocks on demand
if iS > nS
nS = nS + bS;
S{nS} = [];
end
S{iS} = buffer(1:ibuffer); % Store the buffer
elseif doGrab
ibuffer = ibuffer + 1;
buffer{ibuffer} = Line;
end
end
fclose(fid);
if doGrab % Store last buffer, if stopKey is missing?!?
iS = iS + 1;
S{iS} = Line;
end
S = S(1:iS); % Crop pre-allocated output cells
end

1 Comment

Hi Jan,
With regards to my reason for wanting to avoid for loops, I "assumed" there could be a more resource/time efficient way to accomplish what I was trying to do. The input files are maintained on a network and not stored locally on the machine (64GB RAM) where matlab is being executed.
I do like your approach a lot and will be looking at it in detail so that I understand what is happening...
How would your solution/code be modified to so that the startkey and stopkey for the messages are included in the cell arrays captured by S?
Perhaps the startkey would need to be defined as regexpPattern('.*\.[a-z]{3}') then a filter for the message where line1 is equal to "Hello_Message.pdf" applied?
Thank you for your time.

Sign in to comment.

Categories

Products

Release

R2022b

Asked:

on 17 Oct 2023

Commented:

on 20 Oct 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!