Implementing ellipsis, also known as dot dot dot or "..." for line continuation in a regular expression statement
27 views (last 30 days)
Show older comments
So this is driving me nuts. Matlab documentation says "dot dot dot" or ellipsis is treated like a space, but obviously not and it's driving me crazy. I'm sure it's something so easy to figure-out for an experienced Matlab programmer, which clearly I'm not. I appreciate your help on this matter.
parts = regexp(filtered, '(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ...
(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)', 'names')
I've tried ending single quotes on first part and then wrapping second part expression with it's own quotes. I've tried placing the comma on second part. Combinations of comma inside quotes. Matlab says ellipsis is treated like a space so the above should technically work. Well it doesn't. I need help. Thank you for your time and on this piece of matlab code.
1 Comment
Stephen23
on 2 Mar 2022
Edited: Stephen23
on 2 Mar 2022
"Matlab documentation says "dot dot dot" or ellipsis is treated like a space..."
For character vectors the MATLAB documentation actually states "Build a long character vector by concatenating shorter vectors together... The start and end quotation marks for a character vector must appear on the same line" and procedes to give examples.
Your code does not follow what the MATLAB documentation specifies.
Accepted Answer
Steven Lord
on 2 Mar 2022
Instead of trying to split a long char vector across multiple lines, why not write your regular expression as a series of string arrays that you concatenate across multiple lines with +? That way each section is self-contained, you can't forget the ] at the end of a potentially long series of lines because you don't need one.
filtered = "The quick brown fox jumped over the lazy dog"; % Random text
regexpPattern = "(?<TNT>d+\.(\d)+), " + ...
"(?<T>\w*), " + ...
"(?<refTm>\d+), " + ... % Looking for trademark symbols?
"(?<P>\w+), " + ...
"(?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), " + ...
"(?<rbncntrl>[^w]\w+), " + ...
"(?<cntrlStatus>\d+, " + ...
"(?<satsTrk>\d+), " + ...
"(?<lastRbUpdt>\d+)" % Leaving off the semicolon so you can check the assembly
parts = regexp(filtered, regexpPattern, 'names')
This has an added benefit that you can add a comment after the ellipsis to explain what each part of your regular expression means (like I did on the line with refTM.) This will help someone else reading your code (or you reading your code six months from now) to understand its purpose.
3 Comments
More Answers (2)
Voss
on 2 Mar 2022
Edited: Voss
on 2 Mar 2022
When you use ellipses inside a character array, you have to end it on that line, start it again on the next line, and concatenate the different parts. In this case, that might look like this (check that the pattern in regexp is accurate):
parts = regexp(filtered, ['(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ' ... not sure if the space belongs inside the pattern or not
'(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)'], 'names')
jimmy zubiate
on 2 Mar 2022
3 Comments
See Also
Categories
Find more on Environment and Settings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!