Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Help using textscan or sscanf

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 16 Jan, 2013 17:30:08

Message: 1 of 22

Hello,

My problem is very simple, but I cannot find an efficient way to accomplish it. Here is an excerpt of my array (which is one column of a dataset):

Subset001.AccountNumber(1:6,:)

ans =

'154045001'
'926665001'
'615017'
'1976936151'
'801700'
'4506702001'

I am trying to create a new account number in the form of a string. I need to append '4412' to the the last 9 digits of the above account numbers. So for the results is:

'4412154045001'
'4412926665001' - Account Numbers have 9 digits so this is easy
'4412615017' - Only has 6 digits so take the whole account number
'4412976936151' - Has 10 digits so drop first and append to '4412'
etc.

So my problem is difficult in that I need to use sscanf to find the last 9 digits of the account number (even if it has less or more) and then use strcat to append '4412' to the beginning. A few poor attempts:

strcat('4412',sscanf(Subset001.AccountNumber{6},'%*c%9c',1))
strcat('4412',sscanf(Subset001.AccountNumber{6},'%s,[1,9]))

These obviously do not work and I can seem to find a metacharacter in the sscanf documentation for indicating a way to start at the end of the string count back 8 characters and return the result. Does it sscanf always have to read left to right? Any help would be much appreciated. Thanks.

Kevin Ellis

Subject: Help using textscan or sscanf

From: dpb

Date: 16 Jan, 2013 18:18:45

Message: 2 of 22

On 1/16/2013 11:30 AM, Kevin Ellis wrote:
> Hello,
>
> My problem is very simple, but I cannot find an efficient way to
> accomplish it. Here is an excerpt of my array (which is one column of a
> dataset):
>
> Subset001.AccountNumber(1:6,:)
>
> ans =
> '154045001'
> '926665001'
> '615017'
> '1976936151'
> '801700'
> '4506702001'
>
> I am trying to create a new account number in the form of a string. I
> need to append '4412' to the the last 9 digits of the above account
> numbers. So for the results is:
>
> '4412154045001'
> '4412926665001' - Account Numbers have 9 digits so this is easy
> '4412615017' - Only has 6 digits so take the whole account number
> '4412976936151' - Has 10 digits so drop first and append to '4412'
> etc.
>
> So my problem is difficult in that I need to use sscanf to find the last
> 9 digits of the account number...

Nope...

 >> s
=strvcat('154045001','926665001','615017','1976936151','801700','4506702001');
 >> [repmat('4412',6,1) s(:,end-9:end)]
ans =
4412154045001
4412926665001
4412615017
44121976936151
4412801700
44124506702001
 >>

I'd suggest in the end it _might_ be better to "normalize" the account
no's by prepending zeros for the short strings. Then they would sort,
etc., more nearly normally (for at least one definition of "normal",
anyway).

--

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 16 Jan, 2013 21:21:08

Message: 3 of 22

dpb <none@non.net> wrote in message <kd6qtq$nu3$1@speranza.aioe.org>...
> On 1/16/2013 11:30 AM, Kevin Ellis wrote:
> > Hello,
> >
> > My problem is very simple, but I cannot find an efficient way to
> > accomplish it. Here is an excerpt of my array (which is one column of a
> > dataset):
> >
> > Subset001.AccountNumber(1:6,:)
> >
> > ans =
> > '154045001'
> > '926665001'
> > '615017'
> > '1976936151'
> > '801700'
> > '4506702001'
> >
> > I am trying to create a new account number in the form of a string. I
> > need to append '4412' to the the last 9 digits of the above account
> > numbers. So for the results is:
> >
> > '4412154045001'
> > '4412926665001' - Account Numbers have 9 digits so this is easy
> > '4412615017' - Only has 6 digits so take the whole account number
> > '4412976936151' - Has 10 digits so drop first and append to '4412'
> > etc.
> >
> > So my problem is difficult in that I need to use sscanf to find the last
> > 9 digits of the account number...
>
> Nope...
>
> >> s
> =strvcat('154045001','926665001','615017','1976936151','801700','4506702001');
> >> [repmat('4412',6,1) s(:,end-9:end)]
> ans =
> 4412154045001
> 4412926665001
> 4412615017
> 44121976936151
> 4412801700
> 44124506702001
> >>
>
> I'd suggest in the end it _might_ be better to "normalize" the account
> no's by prepending zeros for the short strings. Then they would sort,
> etc., more nearly normally (for at least one definition of "normal",
> anyway).
>
> --
dpb,

Thanks for attempting the solution. For those that may have the same problem, I found the following solution. It is ugly, but works. I am working with datasets so I first add another column to the dataset counting the length of each element in the account numbers column using:

Subset.Count = (cellfun('length',Subset.AccountNumber));

Then because I have a dataset, I can parse into two distinct datasets for when count is greater than 9 (account numbers with greater than 9 digits) and for when count is less than 9 (account numbers with less than 9 digits:

Subset9DigL = Subset(Subset.Count > 9,{'Type','AccountNumber',etc.});

Subset9DigH = Subset(Subset.Count <= 9,{'Type','AccountNumber',etc.});

Then just a couple lines using cellfun to create the new account number for each case:

Subset9DigL.DocNo = cellfun(@(x) strcat('4412',sscanf(x,'%s',1),'001'),...
Subset9DigL.AccoutNumber,'UniformOutput',false);

And

Expr = arrayfun(@(x) strcat('%*',num2str((x-9)),'c%9c'),Subset9DigH.Count,...
'UniformOutput',false);

Subset9DigH.DocNo = cellfun(@(x,y) strcat('4412',sscanf(x,y,1),'001'),...
Subset9DigH.AccoutNumber,Expr,'UniformOutput',false);

Then I just merge the two datasets once more:

Subset = cat(1,Subset9DigL,Subset9DigH);

And remove the Count column:

Subset.Count = [];

Again, a terrible way to do this, but if anyone can think of anything more innovative, that would be great.

Kevin

Subject: Help using textscan or sscanf

From: dpb

Date: 16 Jan, 2013 22:01:15

Message: 4 of 22

On 1/16/2013 3:21 PM, Kevin Ellis wrote:
...

> Thanks for attempting the solution....

What was wrong with it, specifically?

--

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 16 Jan, 2013 22:11:10

Message: 5 of 22

dpb <none@non.net> wrote in message <kd77v0$rlu$1@speranza.aioe.org>...
> On 1/16/2013 3:21 PM, Kevin Ellis wrote:
> ...
>
> > Thanks for attempting the solution....
>
> What was wrong with it, specifically?
>
> --

Adding zeros to the shorter account numbers is easy to do, but I didn't understand how the zeros would "be sorted out" and what specific function would do that (sscanf?,textscan?).

Subject: Help using textscan or sscanf

From: dpb

Date: 16 Jan, 2013 22:33:49

Message: 6 of 22

On 1/16/2013 4:11 PM, Kevin Ellis wrote:
> dpb <none@non.net> wrote in message <kd77v0$rlu$1@speranza.aioe.org>...
>> On 1/16/2013 3:21 PM, Kevin Ellis wrote:
>> ...
>>
>> > Thanks for attempting the solution....
>>
>> What was wrong with it, specifically?
>>
>> --
>
> Adding zeros to the shorter account numbers is easy to do, but I didn't
> understand how the zeros would "be sorted out" and what specific
> function would do that (sscanf?,textscan?).

You wouldn't have to, that was just an optional suggestion.

Whether it is a good idea or not is dependent on what you do w/ the
account numbers later on--if they're just arbitrary numbers and nothing
is done with them in database searches or somesuch then it may make no
difference--often, however, folks do things like sort if there is some
sort of a key that caused a given number to be generated or the like.
It would be entirely dependent upon the application. There's some
symmetry in having a consistent length--again whether it's of any value
for reports, etc., is application-dependent.

The solution I gave reproduced your values for your sample set afaik. I
was wondering what was wrong with it that you weren't apparently
satisfied and went to something else you said you didn't like.

--

Subject: Help using textscan or sscanf

From: dpb

Date: 17 Jan, 2013 15:18:31

Message: 7 of 22

On 1/16/2013 4:33 PM, dpb wrote:
....

> The solution I gave reproduced your values for your sample set...

OBTW, NB "the trick" in the above solution--horzcat() or shorthand of
[s1 s2] where s1,s2 are character arrays will transparently "eat"
intervening blanks to create the shorter lengths automagically you're
working so hard to create..."use the (hidden) force, Luke..." :)

That's what eliminates the need for all the string parsing and just lets
it pick the last N desired characters--when were blank-filled and
right-adjusted in strvcat(), then when the concatenation of the
prepended string occurs, the spurious blanks go away on their own w/ no
additional effort.

I was simply noting elsewhere it _might_ be convenient to have a fixed
length for reporting purposes or for sorting by prepending the zeros in
place of blanks so the numeric order would be preserved and columns
would be same width, etc., etc., etc., ... Again, that would be purely
application-specific as to what is actually being done w/ them.

HTH...

--

Subject: Help using textscan or sscanf

From: dpb

Date: 17 Jan, 2013 19:23:49

Message: 8 of 22

On 1/17/2013 9:18 AM, dpb wrote:
> On 1/16/2013 4:33 PM, dpb wrote:
> ....
>
>> The solution I gave reproduced your values for your sample set...
>
> OBTW, NB "the trick" in the above solution--horzcat() or shorthand of
> [s1 s2] where s1,s2 are character arrays will transparently "eat"
> intervening blanks to create the shorter lengths automagically you're
> working so hard to create..."use the (hidden) force, Luke..." :)
>
> That's what eliminates the need for all the string parsing and just lets
> it pick the last N desired characters--when were blank-filled and
> right-adjusted in strvcat(), then when the concatenation of the
> prepended string occurs, the spurious blanks go away on their own w/ no
> additional effort.
...

And, if you have cell strings, then just use

strcat(c1,strtrim(c2))

where the c1,c2 are the two cell strings to squish together w/o
intervening spaces.

--

Subject: Help using textscan or sscanf

From: dpb

Date: 19 Jan, 2013 21:10:18

Message: 9 of 22

On 1/16/2013 11:30 AM, Kevin Ellis wrote:
...

> Subset001.AccountNumber(1:6,:)
>
> ans =
> '154045001'
> '926665001'
> '615017'
> '1976936151'
> '801700'
> '4506702001'
>
> I am trying to create a new account number in the form of a string. I
> need to append '4412' to the the last 9 digits of the above account
> numbers. So for the results is:
>
> '4412154045001'
> '4412926665001' - Account Numbers have 9 digits so this is easy
> '4412615017' - Only has 6 digits so take the whole account number
> '4412976936151' - Has 10 digits so drop first and append to '4412'
> etc.
...

I only got access to Statistics Toolbox within last couple of weeks so
have virtually no proficiency w/ the DATASET object but...you can do
essentially my previous solution (which was correct except I forgot and
used (end:-9:end) instead of (end-8:end) for the length subscript to get
9 characters to append) from a dataset as--

 >> a={'154045001' '926665001' '615017' '1976936151' '801700'
'4506702001'}';
 >> D=dataset({a,'acct'});
 >> c=char(D.acct);
 >> D.newaccdt=cellstr([repmat('4412',6,1) c(:,end-8:end)])
D =
     acct newaccdt
     '154045001' '441254045001'
     '926665001' '441226665001'
     '615017' '441215017'
     '1976936151' '4412976936151'
     '801700' '441201700'
     '4506702001' '4412506702001'
 >>

I'm pretty sure if you worked at it and w/ more proficiency in the
dataset methods you could probably do it w/o the temporary explicit
array c above.

But clearly you don't need two subsets and sscanf() and all the
rigamarole you're presently going thru...

--

Subject: Help using textscan or sscanf

From: dpb

Date: 19 Jan, 2013 21:23:20

Message: 10 of 22

On 1/19/2013 3:10 PM, dpb wrote:
...

> I only got access to Statistics Toolbox within last couple of weeks so
> have virtually no proficiency w/ the DATASET object but...you can do
> essentially my previous solution (which was correct except I forgot and
> used (end:-9:end) instead of (end-8:end) for the length subscript to get
> 9 characters to append) from a dataset as--
...

Ewww....I see the problem--the catenation leaves the whole array w/
length 10 in this case so cut off what don't need. OK, need to do a
strjust() first, then pick up the last 9 and then concatenate.

I have to run at the moment but that's the key...if I get a chance
tomorrow I'll try to clean it up....sorry I forgot/missed the obvious
(altho if you caught it in your previous reply it would have been good
to have pointed it out :) ).

--

Subject: Help using textscan or sscanf

From: dpb

Date: 19 Jan, 2013 21:44:19

Message: 11 of 22

On 1/19/2013 3:10 PM, dpb wrote:
...

> >> a={'154045001' '926665001' '615017' '1976936151' '801700'
> '4506702001'}';
> >> D=dataset({a,'acct'});
> >> c=char(D.acct);
> >> D.newaccdt=cellstr([repmat('4412',6,1) c(:,end-8:end)])
...

Oooops....I forgot about the length being augmented to that of the longest.

Need to right-justify, then pick the last 9 char's and then concatenate...

 >> c=strjust(char(D.acct));
 >> D.newaccdt=cellstr( ...
    [repmat('4412',6,1) strjust(c(:,end-8:end),'left')])
D =
     acct newaccdt
     '154045001' '4412154045001'
     '926665001' '4412926665001'
     '615017' '4412615017'
     '1976936151' '4412976936151'
     '801700' '4412801700'
     '4506702001' '4412506702001'
 >>


OK, now we got the right last N (=9) characters.

--

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 20 Jan, 2013 06:48:07

Message: 12 of 22

dpb <none@non.net> wrote in message <kdf43h$5pl$1@speranza.aioe.org>...
> On 1/19/2013 3:10 PM, dpb wrote:
> ...
>
> > >> a={'154045001' '926665001' '615017' '1976936151' '801700'
> > '4506702001'}';
> > >> D=dataset({a,'acct'});
> > >> c=char(D.acct);
> > >> D.newaccdt=cellstr([repmat('4412',6,1) c(:,end-8:end)])
> ...
>
> Oooops....I forgot about the length being augmented to that of the longest.
>
> Need to right-justify, then pick the last 9 char's and then concatenate...
>
> >> c=strjust(char(D.acct));
> >> D.newaccdt=cellstr( ...
> [repmat('4412',6,1) strjust(c(:,end-8:end),'left')])
> D =
> acct newaccdt
> '154045001' '4412154045001'
> '926665001' '4412926665001'
> '615017' '4412615017'
> '1976936151' '4412976936151'
> '801700' '4412801700'
> '4506702001' '4412506702001'
> >>
>
>
> OK, now we got the right last N (=9) characters.
>
> --

I feel bad. I knew this was a challenging problem, but I'm impressed. The solution you came up with significantly reduces the runtime. One thing I should mention is that I have had to copy the section of code 12 times for each month of the year. I call the dataset using datenum for each month ('001' for October all the way to '012' for September). I append that number onto the end of the number that your code made above. So you were able to accomplish the task in much fewer lines of code which is very impressive. Who knew it could be so simple. Too bad there isn't a built-in function in MatLab where you can just read strings right to left. That would make things much easier. Again I appreciate the dedication to the task. I only recently started using datasets because the data I have is so large that cell arrays and structures become difficult to manage. The dataset is a perfect medium.

Kevin

Subject: Help using textscan or sscanf

From: dpb

Date: 20 Jan, 2013 14:39:52

Message: 13 of 22

On 1/20/2013 12:48 AM, Kevin Ellis wrote:
...

> I feel bad. I knew this was a challenging problem, but I'm impressed.
> The solution you came up with significantly reduces the runtime. One
> thing I should mention is that I have had to copy the section of code 12
> times for each month of the year. I call the dataset using datenum for
> each month ('001' for October all the way to '012' for September).I
> append that number onto the end of the number that your code made above.

Why would you replicate the code 12 times???? Wrap it in a function and
pass the month or somesuch...

> So you were able to accomplish the task in much fewer lines of code
> which is very impressive. Who knew it could be so simple. Too bad there
> isn't a built-in function in MatLab where you can just read strings
> right to left. That would make things much easier....

Well, again, if you want that function, write one that does it. A very
high fraction of Matlab functionality is just that: m- or p-files that
add the higher-level abstraction to the base language. Using the same
ideas for user code simplifies development in a specific application domain.

--

Subject: Help using textscan or sscanf

From: james bejon

Date: 20 Jan, 2013 15:58:08

Message: 14 of 22

% Apologies if this has already been solved elsewhere--haven't had time to read through the thread, but...
S = {'154045001', '926665001', '615017', '1976936151', '801700', '4506702001'};
regexprep(S, '^.*?([0-9]{1,9})$', '4421$1')

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 20 Jan, 2013 23:44:06

Message: 15 of 22

"james bejon" wrote in message <kdh46g$p9$1@newscl01ah.mathworks.com>...
> % Apologies if this has already been solved elsewhere--haven't had time to read through the thread, but...
> S = {'154045001', '926665001', '615017', '1976936151', '801700', '4506702001'};
> regexprep(S, '^.*?([0-9]{1,9})$', '4421$1')

James,

Your solution is perfect for what I need to do with the dataset arrays I have. Using regexprep in this way is very innovative in my mind. I'm pretty impressed. However, I was hoping to get your help once more. To make things more complicated now, is there a way to use regexprep to take as input the following account number and return the following result?

S = '09837381019'

regexprep(S, No idea what to put here, '4412$1')

4412373810

Again, I need to pull out the digits '373810' from the account number and append '4412' to the beginning. I have never been able to use regexprep or regexpi to remove specific digits, but it seems like you may know how to do that. Any help would be much appreciated.

Kevin

Subject: Help using textscan or sscanf

From: james bejon

Date: 21 Jan, 2013 00:19:08

Message: 16 of 22

Shouldn't be too hard. What's the general idea here though? Is it "Whenever you can find 373810, then do such-and-such a thing"? Or whenever the string's a certain length, or what?

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 21 Jan, 2013 01:42:08

Message: 17 of 22

"james bejon" wrote in message <kdi1hs$5ev$1@newscl01ah.mathworks.com>...
> Shouldn't be too hard. What's the general idea here though? Is it "Whenever you can find 373810, then do such-and-such a thing"? Or whenever the string's a certain length, or what?

I have 411,169 transactions across 7 database extracts. Really all I am trying to do is match transactions between databases using a user (me) defined document number. The problem is some transactions always match and some don't, so I am constantly needing to refine the process. Right now I create a 23 digit "document number" to match transactions and then use the data to forecast energy usage. The document number needs to be that specific because I am using the "join" function for datasets on a key variable to accomplish the task. If the number is not "unique" enough then I get multiple transactions being matched (which I see alot) and if I use this data I will overestimate energy usage. So, in answer to your questions, the account number is always different (anywhere from 6 to 16 digits), so I would like a process to use a certain number of digits for creation of the matching document
number. Hope this answers your questions.

Kevin

Subject: Help using textscan or sscanf

From: james bejon

Date: 21 Jan, 2013 19:09:09

Message: 18 of 22

Sorry, I meant more specifically:

What's the rule for getting '09837381019' to '4412373810', e.g., "If there are 11 digits, hack off the first 3 and the last 2 and stick a 4412 on the front"?

Subject: Help using textscan or sscanf

From: Kevin Ellis

Date: 22 Jan, 2013 17:11:09

Message: 19 of 22

"james bejon" wrote in message <kdk3ol$ngp$1@newscl01ah.mathworks.com>...
> Sorry, I meant more specifically:
>
> What's the rule for getting '09837381019' to '4412373810', e.g., "If there are 11 digits, hack off the first 3 and the last 2 and stick a 4412 on the front"?

Yes that is exactly right. I was hoping you could show me that and that way I could figure out how you wrote your code to change it when I need to. Essentially, I want to be able to use regexpprep that I can pull out a certain number of digits anywhere in the account number and append '4412' to the front. So for right now remove the first 3 and the last 2 and stick a '4412' on the front. Thanks.

Subject: Help using textscan or sscanf

From: james bejon

Date: 22 Jan, 2013 23:38:14

Message: 20 of 22

% What's the rule for getting '09837381019' to '4412373810', e.g., "If there are 11 digits, hack off the first 3 and the last 2 and stick a 4412 on the front"?

S = '09837381019';
regexprep(S, '^[0-9]{3}([0-9]{6})[0-9]{2}$', '4412$1')

% Seems like you have a lot of different cases and rules. If you list all of them, I'm sure we could come up with a regexp that will handle them all; otherwise, there's a risk that the regexps will mess each other up

Subject: Help using textscan or sscanf

From: Doug Schwarz

Date: 23 Jan, 2013 17:19:26

Message: 21 of 22

In article <kdn7t6$ci1$1@newscl01ah.mathworks.com>,
 "james bejon" <jamesbejon@yahoo.co.uk> wrote:

> % What's the rule for getting '09837381019' to '4412373810', e.g., "If there
> are 11 digits, hack off the first 3 and the last 2 and stick a 4412 on the
> front"?
>
> S = '09837381019';
> regexprep(S, '^[0-9]{3}([0-9]{6})[0-9]{2}$', '4412$1')
>
> % Seems like you have a lot of different cases and rules. If you list all of
> them, I'm sure we could come up with a regexp that will handle them all;
> otherwise, there's a risk that the regexps will mess each other up

Rather than '[0-9]', use '\d':

  regexprep(S, '^\d{3}(\d{6})\d{2}$', '4412$1')

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: Help using textscan or sscanf

From: james bejon

Date: 23 Jan, 2013 21:02:09

Message: 22 of 22

Yes, that wd be better--thanks. (At home, I use Octave, which doesn't seem to accept "\d", so this was a bit of a quick fix)

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us