Implementing ellipsis, also known as dot dot dot or "..." for line continuation in a regular expression statement

27 views (last 30 days)
So this is driving me nuts. Matlab documentation says "dot dot dot" or ellipsis is treated like a space, but obviously not and it's driving me crazy. I'm sure it's something so easy to figure-out for an experienced Matlab programmer, which clearly I'm not. I appreciate your help on this matter.
parts = regexp(filtered, '(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ...
(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)', 'names')
I've tried ending single quotes on first part and then wrapping second part expression with it's own quotes. I've tried placing the comma on second part. Combinations of comma inside quotes. Matlab says ellipsis is treated like a space so the above should technically work. Well it doesn't. I need help. Thank you for your time and on this piece of matlab code.
  1 Comment
Stephen23
Stephen23 on 2 Mar 2022
Edited: Stephen23 on 2 Mar 2022
"Matlab documentation says "dot dot dot" or ellipsis is treated like a space..."
For character vectors the MATLAB documentation actually states "Build a long character vector by concatenating shorter vectors together... The start and end quotation marks for a character vector must appear on the same line" and procedes to give examples.
Your code does not follow what the MATLAB documentation specifies.

Sign in to comment.

Accepted Answer

Steven Lord
Steven Lord on 2 Mar 2022
Instead of trying to split a long char vector across multiple lines, why not write your regular expression as a series of string arrays that you concatenate across multiple lines with +? That way each section is self-contained, you can't forget the ] at the end of a potentially long series of lines because you don't need one.
filtered = "The quick brown fox jumped over the lazy dog"; % Random text
regexpPattern = "(?<TNT>d+\.(\d)+), " + ...
"(?<T>\w*), " + ...
"(?<refTm>\d+), " + ... % Looking for trademark symbols?
"(?<P>\w+), " + ...
"(?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), " + ...
"(?<rbncntrl>[^w]\w+), " + ...
"(?<cntrlStatus>\d+, " + ...
"(?<satsTrk>\d+), " + ...
"(?<lastRbUpdt>\d+)" % Leaving off the semicolon so you can check the assembly
regexpPattern = "(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), (?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)"
parts = regexp(filtered, regexpPattern, 'names')
parts = 0×0 empty struct array with fields: TNT T refTm P tmSpyRefmns rbncntrl satsTrk lastRbUpdt
This has an added benefit that you can add a comment after the ellipsis to explain what each part of your regular expression means (like I did on the line with refTM.) This will help someone else reading your code (or you reading your code six months from now) to understand its purpose.
  3 Comments
jimmy zubiate
jimmy zubiate on 5 Mar 2022
Edited: jimmy zubiate on 5 Mar 2022
I accepted prematurely. I'm using Matlab R2010a version. Double quotes were implemented much later, 4-5 years ago? Either way solution didn't work for me. I've tried using single quotes, removing the "+" between elements of the expression. I finally ended using single quotes and removing the "+" per each line. See the following code. For some reason it still doesn't work.
The returned output is empty structure arrays. The regexpPattern gets created correctly, but once it goes into parts variable line of code, it doesn't like it. Any ideas?
regexpPattern = ['(?<TNT>d+\.(\d)+), ' ...
'(?<T>\w*), ' ...
'(?<refTm>\d+), ' ...
'(?<P>\w+), ' ...
'(?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ' ...
'(?<rbncntrl>[^w]\w+), ' ...
'(?<cntrlStatus>\d+, ' ...
'(?<satsTrk>\d+), ' ...
'(?<lastRbUpdt>\d+)']
parts = regep(filtered, regexpPattern,'names')
[m,n] = size(parts)
for j=1:m
for k=1:n
MJD =str2double({parts{j,k}.MJD});
format long
disp(MJD)
end
end
jimmy zubiate
jimmy zubiate on 5 Mar 2022
Edited: jimmy zubiate on 5 Mar 2022
Disregard the above. A cross contamination of original "regexpPattern". Problem has been fixed. The enclosure of the expression parts within square brackets ([ ]) was the glue that eventually made it work. I like this structure of code and how I feel it should be written. Credit given. Thanks Steven Lord!

Sign in to comment.

More Answers (2)

Voss
Voss on 2 Mar 2022
Edited: Voss on 2 Mar 2022
When you use ellipses inside a character array, you have to end it on that line, start it again on the next line, and concatenate the different parts. In this case, that might look like this (check that the pattern in regexp is accurate):
parts = regexp(filtered, ['(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ' ... not sure if the space belongs inside the pattern or not
'(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)'], 'names')

jimmy zubiate
jimmy zubiate on 2 Mar 2022
I've tried the following too. No dice. Matlab doesn't like it. Reason I'm wanting to use dot dot dot is to provide a wrap-around effect to read the code better.
parts = regexp(filtered, '(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+),'...
'(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)', 'names')
  3 Comments
Stephen23
Stephen23 on 2 Mar 2022
Edited: Stephen23 on 15 Nov 2024
"I've tried the following too. No dice. Matlab doesn't like it."
Because you built two separate character vectors, without joining them together like the MATLAB documentation shows:
As Voss stated, you are missing the square brackets.

Sign in to comment.

Categories

Find more on Environment and Settings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!