Thread Subject:
Scaling of parameters for fminunc?

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 4 Jul, 2012 13:52:08

Message: 1 of 27

Hello everyone,

I'm still trying to make the best use of fminunc.
The solution I'm looking for is something like:
x(1:10)=~10;
x (11:20)=~30;
x(21:30) = ~1e4;
x(31:40) = 1e-2;
For some reason if I don't do anything about it and I use fminunc as is I get in output what all sign points to be the correct solution.
I think that I should use fminunc with all parameters of the same order of magnitude. Or, even better, with all parameters O(1)
First question, is my supposition correct? Is there anything wrong in having parameters that span 10^6 orders of magnitude for fminunc?

Second question: I can define newX as a scaled version of my parameters i.e.: newX = x./scale. Does this help? In which way? Faster convergence? Higher precision (I doubt this second one)?
Now... If I do this I pass this "scale" to my objective function calculator and in the first line I do oldX = newX .* scale and then I go on computing everything in respect to oldX.
Then I compute my gradient with respect to oldX and then I do grad = grad ./scale.
It looks so dumb that I think I couldn't have got it wrong!


Nonetheless if I use this "scale" and I compute the gradient in this way... The minimizer stops after 3 iterations and looks like he didn't almost move any parameter from the inizialization values.
I would expect that, at worst, I would get the same results of the case without scaling.

Can anybody give me any suggestion?

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 4 Jul, 2012 18:18:14

Message: 2 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jt1hq8$23$1@newscl01ah.mathworks.com>...
>
> First question, is my supposition correct? Is there anything wrong in having parameters that span 10^6 orders of magnitude for fminunc?
==============

It depends on how the parameters interact in the objective function. If you're optimizing sum(x), then adding large numbers to small numbers makes the result more vulnerable to precision problems. If you were using a low-precision data type (e.g. singles) then 6 orders of magnitude difference could mean the small numbers have no influence at all.

Scaling isn't always the best way to handle this, though. Sometimes translation is better. Usually, when people try to improve the scaling of their problem, it's not the relative magnitude of the optimal x that they worry about. It's the relative magnitude of the second derivatives. Generally, you want the condition numbers of the Hessian at and near the optimal solution to be close to 1 and the scaling of the parameters will affect this. Since you're now having problems post-scaling, it is possible that you made the conditioning of your Hessian worse than before.

Bear in mind also that FMINUNC tries to correct for bad scaling by using its own estimate of the Hessian. You would have to have a very large scale difference for it not to be robust against this. 6 orders of magnitude doesn't seem very large, at least not if you're using double precision.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 5 Jul, 2012 09:11:06

Message: 3 of 27

"Matt J" wrote in message <jt21d6$ls8$1@newscl01ah.mathworks.com>...

> It depends on how the parameters interact in the objective function. If you're optimizing sum(x), then adding large numbers to small numbers makes the result more vulnerable to precision problems. If you were using a low-precision data type (e.g. singles) then 6 orders of magnitude difference could mean the small numbers have no influence at all.
--------

I'm minimizing the sum of squared differences between a set of experimental data and a function describing them. This function is the sum of many gaussians + constant backgrounds.
So my ~10 are sigmas, my ~30 are peak locations, my ~10e4 are amplitudes etc....
Because of this I can't predict easily how a change in each parameter affects my objective function.
I mean... If I increase the amplitudes by 1 I can easilly predict that my sum of squared differences will increase by a factor 2* the number of gaussian* the integral of a gaussian with peak 1.
If I change my sigma by 1... How much will my objective function change? I probably know it, as I computed analytically the gradient, but it's not so immediate as in the case of a sum of all the parameters.
I would guess that the same relative changes in a parameter would give the same changes in the objective function.

======
> Scaling isn't always the best way to handle this, though. Sometimes translation is better. Usually, when people try to improve the scaling of their problem, it's not the relative magnitude of the optimal x that they worry about. It's the relative magnitude of the second derivatives. Generally, you want the condition numbers of the Hessian at and near the optimal solution to be close to 1 and the scaling of the parameters will affect this. Since you're now having problems post-scaling, it is possible that you made the conditioning of your Hessian worse than before.
------------------

Naively I would think that translation has no meaning in my specific problem. Relative errors should be what I'm looking for!
The fun thing that I notice is that if I analyze the gradient of my objective function at the last iteration I see that it is still clearly pointing in the correct direction.
But fminunc stops by claiming that the size of the step is smaller than TolX.

-------
> Bear in mind also that FMINUNC tries to correct for bad scaling by using its own estimate of the Hessian. You would have to have a very large scale difference for it not to be robust against this. 6 orders of magnitude doesn't seem very large, at least not if you're using double precision.
=====
Ok, I'll keep that in mind.

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 5 Jul, 2012 13:42:08

Message: 4 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jt3lna$h67$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <jt21d6$ls8$1@newscl01ah.mathworks.com>...
>
> > It depends on how the parameters interact in the objective function. If you're optimizing sum(x), then adding large numbers to small numbers makes the result more vulnerable to precision problems. If you were using a low-precision data type (e.g. singles) then 6 orders of magnitude difference could mean the small numbers have no influence at all.
> --------
>
> I'm minimizing the sum of squared differences between a set of experimental data and a function describing them. This function is the sum of many gaussians + constant backgrounds.
> So my ~10 are sigmas, my ~30 are peak locations, my ~10e4 are amplitudes etc....
> Because of this I can't predict easily how a change in each parameter affects my objective function.
============

If all the Gaussians are parameterized the same way, I would imagine that they are all well-scaled relative to each other. So, it seems fine. Because your problem has only 40 unknowns, though, it should be a simple matter to compute the Hessian and experiment with the effect of different scalings on its condition number.

 

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 5 Jul, 2012 14:25:07

Message: 5 of 27

"Matt J" wrote in message <jt45jg$e3k$1@newscl01ah.mathworks.com>...

----------
> If all the Gaussians are parameterized the same way, I would imagine that they are all well-scaled relative to each other. So, it seems fine. Because your problem has only 40 unknowns, though, it should be a simple matter to compute the Hessian and experiment with the effect of different scalings on its condition number.
>

Exactly, they are very similarly scaled. Not only, I added to the objective function some penalties on having all the related parameters (all the amplitudes, all the sigmas etc...) to behave "well". (i.e.: All the parameters of the nearest gaussians should be similar).

Anyway... My parameters are not 40, sadly. They are 4 (parameters for each gaussian+background) * (31*31) = 3844 (31*31 is the number of gaussians I have)

Subject: Scaling of parameters for fminunc?

From: Sargondjani

Date: 5 Jul, 2012 15:15:12

Message: 6 of 27

Sorry to interfere Luca, but i have a question to Matt:

"Generally, you want the condition numbers of the Hessian at and near the optimal solution to be close to 1 and the scaling of the parameters will affect this."

What are 'the condition numbers'? And how do I check if there are close to one?

And maybe you have a good reference (book) to look up such things? (optimization in general). Preferably a practical book that is not too hard to read for a non-mathematician (economist) :-)

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 5 Jul, 2012 15:55:25

Message: 7 of 27

"Sargondjani" wrote in message <jt4b20$7o8$1@newscl01ah.mathworks.com>...
> Sorry to interfere Luca, but i have a question to Matt:
>
> "Generally, you want the condition numbers of the Hessian at and near the optimal solution to be close to 1 and the scaling of the parameters will affect this."
>
> What are 'the condition numbers'? And how do I check if there are close to one?
==============

Matlab's COND command will compute the condition number of a given matrix.
The concept of the condition number is discussed a bit here

http://en.wikipedia.org/wiki/Condition_number

but not its relationship to optimization. Basically, the condition number affects speed of convergence in a known way for several classical algorithm types: gradient descent, conjugate gradient, etc...



 
> And maybe you have a good reference (book) to look up such things? (optimization in general). Preferably a practical book that is not too hard to read for a non-mathematician (economist) :-)
===========

A lot of people seem to like Nocedal and Wright

http://users.eecs.northwestern.edu/~nocedal/book/num-opt.html

I also like "Nonlinear Programming" by Bertsekas.

Subject: Scaling of parameters for fminunc?

From: Steven_Lord

Date: 5 Jul, 2012 17:58:58

Message: 8 of 27



"Matt J " <mattjacREMOVE@THISieee.spam> wrote in message
news:jt4ddd$h8p$1@newscl01ah.mathworks.com...
> "Sargondjani" wrote in message <jt4b20$7o8$1@newscl01ah.mathworks.com>...
>> Sorry to interfere Luca, but i have a question to Matt:
>>
>> "Generally, you want the condition numbers of the Hessian at and near the
>> optimal solution to be close to 1 and the scaling of the parameters will
>> affect this."
>>
>> What are 'the condition numbers'? And how do I check if there are close
>> to one?
> ==============
>
> Matlab's COND command will compute the condition number of a given matrix.
> The concept of the condition number is discussed a bit here
>
> http://en.wikipedia.org/wiki/Condition_number
>
> but not its relationship to optimization. Basically, the condition number
> affects speed of convergence in a known way for several classical
> algorithm types: gradient descent, conjugate gradient, etc...

I'd also recommend section 9 of the Linear Equations chapter of Cleve's
"Numerical Computing with MATLAB":

http://www.mathworks.com/moler/index.html

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 6 Jul, 2012 10:02:12

Message: 9 of 27

I've tought a lot yesterday afternoon and this morning and there's something I don't get.
How can fminunc be giving me much worse results when scaling my X vector?
I mean, it should work as well, at least! Or with very small differences. Otherwise the question becomes: "why is it working without scaling?"
And if I'm not sure about why my software is working with some hesoterical specific settings sooner or later I'll get into some troubles!

I mean... I have my objective function that is the sum of a SSD and a penalty term. I have checked both terms of my analytical gradients against central finite differences gradients in different points and they're compatible to a very high precision. (both when scaling and when not)

So I have reasons to think that all my calculations are correct, therefore I can exclude the most usual problem. :-) :-)

Now, how can my problem become ill conditioned?
I see that without scaling the step norm is like: 0.02, 0.01 0.1 0.001 and so on until the default TolX is reached.
When I scale my variables the step norm are (in order) 0.00001, 1 ,1 ,1,1,1,1, 1e-5 and then it stops (still far from the correct solution) saying that the next step size is less than TolX (it says that il would be 1e-9)

How can this be?
I think I'm missing something about how the next step is calculated and how this can be influenced badly by scaling my parameters.

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 6 Jul, 2012 13:52:07

Message: 10 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jt6d34$337$1@newscl01ah.mathworks.com>...
>
> Now, how can my problem become ill conditioned?
> I see that without scaling the step norm is like: 0.02, 0.01 0.1 0.001 and so on until the default TolX is reached.
> When I scale my variables the step norm are (in order) 0.00001, 1 ,1 ,1,1,1,1, 1e-5 and then it stops (still far from the correct solution) saying that the next step size is less than TolX (it says that il would be 1e-9)
>
> How can this be?
> I think I'm missing something about how the next step is calculated and how this can be influenced badly by scaling my parameters.
=================

Too little information to know, but here are some thoughts:

1. When you scale your variables in the objective function, do you scale your initial starting guess in the same way? If not, you could effectively be starting farther away from the solution, and getting stuck at a local min.

2. fminunc can output the gradient and Hessian at the final point. Did you analyze them to see if you are stopping at a local minimum? Did you run COND on the hessian to see how the conditioning has changed due to the scaling?

3. Scaling can cause fminunc to take smaller steps because now a small change in x(i) can produce a large change in your objective function - enough of a change for fminunc to think it's made significant progress in an iteration. If fminunc is taking smaller steps, it will be easier for it to trip the TolX stopping criterion.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 6 Jul, 2012 14:17:08

Message: 11 of 27

"Matt J" wrote in message <jt6qi7$lq8$1@newscl01ah.mathworks.com>...

> Too little information to know, but here are some thoughts:

I'm willing to give lots of information, but I don't know where to start from!
 If you can point me somewhere I will be glad.

> 1. When you scale your variables in the objective function, do you scale your initial starting guess in the same way? If not, you could effectively be starting farther away from the solution, and getting stuck at a local min.
--------

Yes, absolutely.
I compute the same initial guess (inParam), then I define my scale
Scale = ones(size(inParam));
Scale (1:100) = 10;
Scale (101:200) = 1e4;
Scale (201:300) = 1;

inParam = inParam./scale.
I pass scale to my objective function and its first line is:
X = X.*scale;
So my objective function is working on exactly the same values.
On top of this I see that the initial value at the first iteration are the same that before scaling.

------------
> 2. fminunc can output the gradient and Hessian at the final point. Did you analyze them to see if you are stopping at a local minimum? Did you run COND on the hessian to see how the conditioning has changed due to the scaling?
=========

I inspected the gradient. Without scaling I see a lot of "small" numbers randomly distributed around zero.
When I scale the gradient points strongly towards the direction of the correct solution!
I tried to get the Hessian in output but it takes forever. (It's a 3814x3814). Cond(hess) gives Inf when working with scaled parameters. I know that this is a problem but... I would like to understand why I get to this point!
I would like to break up the hessian in its different components but I'm not really sure about how to reshape it.

-------
> 3. Scaling can cause fminunc to take smaller steps because now a small change in x(i) can produce a large change in your objective function - enough of a change for fminunc to think it's made significant progress in an iteration. If fminunc is taking smaller steps, it will be easier for it to trip the TolX stopping criterion.
====
For sure. But you can look at it the other way round. Before scaling an increase of 1 on an amplitude parameter (average: 3e4) produced almost no change in the objective function, while a change of 1 on a sigma parameter (average: 10) produce a massive change in the objective function. Now a unit change on each parameter should produce the same change in the objective function, therefore TolX should intervene at the same level of precision on each parameter!

But I'm probably missing something about how the step norm is computed.

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 6 Jul, 2012 14:49:15

Message: 12 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jt6s14$rpk$1@newscl01ah.mathworks.com>...
>
> I inspected the gradient. Without scaling I see a lot of "small" numbers randomly distributed around zero.
> When I scale the gradient points strongly towards the direction of the correct solution!
> I tried to get the Hessian in output but it takes forever. (It's a 3814x3814). Cond(hess) gives Inf when working with scaled parameters. I know that this is a problem but... I would like to understand why I get to this point!
============

Looks like you scaled it the wrong way and made the conditioning worse as a result. When the gradient points in the direction of the correct solution, it means your problem is well-conditioned and cond(Hessian) is close to 1.


> Now a unit change on each parameter should produce the same change in the objective function,
===============

That seems doubtful considering that your Hessian is inf. What did you do to test this?


>therefore TolX should intervene at the same level of precision on each parameter!
==============

Not sure that matters, in any case. FMINUNC expects TolX to represent an insignificant step norm in the problem data supplied it. If step norms less than TolX are still producing significant changes in the objective function, then TolX is badly tuned.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 6 Jul, 2012 15:04:05

Message: 13 of 27

"Matt J" wrote in message <jt6ttb$62g$1@newscl01ah.mathworks.com>...
-----
> Looks like you scaled it the wrong way and made the conditioning worse as a result. When the gradient points in the direction of the correct solution, it means your problem is well-conditioned and cond(Hessian) is close to 1.
====

I know that!! But I would like to understand it, and I'm not!!
The gradient of my scaled parameters point toward the correct solution, so the minimizer is not using it correctly. But still my gradient is computed correctly.
I mean... I can't see why on earth the minimization of f(10^4*X) should differ from the minimization of f(X).

> > Now a unit change on each parameter should produce the same change in the objective function,
> ===============
>
> That seems doubtful considering that your Hessian is inf. What did you do to test this?

I've written that it should. I don't have any idea about how to test my hessian. I can't even make up my mind about how to reshape the hessian to be able to display it.

-------
> Not sure that matters, in any case. FMINUNC expects TolX to represent an insignificant step norm in the problem data supplied it. If step norms less than TolX are still producing significant changes in the objective function, then TolX is badly tuned.
====
You're right, but I'm not getting what the step norm is. Documentation is confusing. Is it the euclidean distance between the previous and the next point? Because if it is so the values of the step norm that are written at each iteration look extremely dubious!

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 6 Jul, 2012 15:30:16

Message: 14 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jt6up5$9hm$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <jt6ttb$62g$1@newscl01ah.mathworks.com>...
> -----
> > Looks like you scaled it the wrong way and made the conditioning worse as a result. When the gradient points in the direction of the correct solution, it means your problem is well-conditioned and cond(Hessian) is close to 1.
> ====

OK. I misread your post. So, in the scaled domain things are well-conditioned.
Well then the problem is TolX. You must figure out what a significant step magnitude in you scaled domain is and you must set TolX accordingly. Or, alternatively, now that you've scaled the x(i) relative to each other, you may need an additional global scaling c*X to ensure that a unit change in X produces approximately a unit change in f(X).


> I mean... I can't see why on earth the minimization of f(10^4*X) should differ from the minimization of f(X).
==============

Well, they're very different in terms of deciding when to stop iterating. For one thing, the gradient wet X of f(10^4*X) is quite a bit larger, so if the code is looking for a stopping threshold on the gradient magnitude, it will treat the two functions very differently. Similar ideas apply to TolX, which I think is the issue you're encountering.



> You're right, but I'm not getting what the step norm is. Documentation is confusing. Is it the euclidean distance between the previous and the next point?
===========

I think it's the max norm, not the Euclidean norm.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 9 Jul, 2012 09:37:07

Message: 15 of 27

"Matt J" wrote in message <jt70a8$g2b$1@newscl01ah.mathworks.com>...

------
> Well, they're very different in terms of deciding when to stop iterating. For one thing, the gradient wet X of f(10^4*X) is quite a bit larger, so if the code is looking for a stopping threshold on the gradient magnitude, it will treat the two functions very differently. Similar ideas apply to TolX, which I think is the issue you're encountering.
=====
I understant this, and I knew it, but I don't know how to deal with it.
I'm working on PET images and, without any specific reason, I'm reading my values in kBq/cc. Therefore all of my pixel values are 1e4. But I could switch and read it in MBq/cc and the pixel values would change to 10, or to SUV values (~1) or to mCi/pint (~1e-3... if it ever made sense ;-) ). Therefore the X dealing with intensities could vary by orders of magnitudes while in fact I would be dealing with exactly the same image and I should be able to describe it with the same level of accuracy!
My other parameters are image independent (tipically: position of something, units: "pixel units"). That's another reason that I had for scaling my parameters. So that the accuracy of my parametrization becomes independent of the particular units I use to measure the activity concentration.

Why if the gradient of the first N elements of X is 10e4 bigger convergence of all the parameters is better? That's really puzzling me!


> > You're right, but I'm not getting what the step norm is. Documentation is confusing. Is it the euclidean distance between the previous and the next point?
> ===========
>
> I think it's the max norm, not the Euclidean norm.

What is max norm? max(abs(X-Y)) ?
If it is so... I have to tell you that I find the step values extremely weird, even in the case where the minimization works perfectly!!!

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 9 Jul, 2012 13:51:06

Message: 16 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jte8o3$t4b$1@newscl01ah.mathworks.com>...
>
> Why if the gradient of the first N elements of X is 10e4 bigger convergence of all the parameters is better? That's really puzzling me!
==============

Here's the classic example. Try minimizing the following using steepest descent starting at x=1,y=1.


f(x,y)=x^2 + 10e4 y^2

It's something you can do by hand or write a very short MATLAB code to do for you. You will see that convergence is very slow. The main problem is that the negative gradient -g=[-2x, -20e4 y] never points very accurately toward the solution x=0, y=0 at any iteration.

However, if you the rescale the first variable as x=sqrt(10e4) z.^2, the problem becomes

f(z,y)=10e4(z^2 + y^2)

and the negative gradient -g=-20e4*[x,y] point directly at the solution from any x,y. Steepest descent will minimize this in 1 iteration.


 
> What is max norm? max(abs(X-Y)) ?
=========

yes.

> If it is so... I have to tell you that I find the step values extremely weird, even in the case where the minimization works perfectly!!!
=========

Well, I'm just guessing. You can check what the step values are manually by using OutputFcn to save all of the iterations.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 9 Jul, 2012 15:09:12

Message: 17 of 27

"Matt J" wrote in message <jtenka$p8j$1@newscl01ah.mathworks.com>...

> Here's the classic example. Try minimizing the following using steepest descent starting at x=1,y=1.
>
[cut]
> and the negative gradient -g=-20e4*[x,y] point directly at the solution from any x,y. Steepest descent will minimize this in 1 iteration.

And that's exactly what's making me mad!!
My original problem is of the kind 1e4*x^2 + y^2, and the minimizer finds the minimum correctly, even if it takes ~200 iterations.
Then I say: "this doesn't make any sense, it could be working much better. I'll scale the parameters avoid this situation".
And it stops working!


> Well, I'm just guessing. You can check what the step values are manually by using OutputFcn to save all of the iterations.

That I will do for sure!! Good idea.

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 9 Jul, 2012 15:24:20

Message: 18 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jtes6o$ffv$1@newscl01ah.mathworks.com>...
>
> And that's exactly what's making me mad!!
> My original problem is of the kind 1e4*x^2 + y^2, and the minimizer finds the minimum correctly, even if it takes ~200 iterations.
> Then I say: "this doesn't make any sense, it could be working much better. I'll scale the parameters avoid this situation".
> And it stops working!
==========

Well, we've already discussed why. Your TolX needs to be adjusted.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 9 Jul, 2012 16:23:06

Message: 19 of 27

"Matt J" wrote in message <jtet34$jdk$1@newscl01ah.mathworks.com>...

>
> Well, we've already discussed why. Your TolX needs to be adjusted.

Yes, but I don't understand how. (Taking into account that I don't think that "empirically" is a good choiche).
If my parameters assume values between 0.5 and 5 and I need for each parameter a precision of 0.1 how much should I set TolX to?
Now it's set to 1e-6 and it stops far away from convergence.
Should I decrease it? How much? 1e-9? 1e-12? Using which criteria?

In the meantime... I will do the X saving at each iteration to see what's the deal with the step size. Without scaling, with parameters ranging from 2 to 5*1e4 the biggest step size reported by fminunc is about 0.01. While each of the 3'000 parametes is changed much more than 0.01, at least during the first iterations!!

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 9 Jul, 2012 17:52:07

Message: 20 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jtf0ha$4bj$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <jtet34$jdk$1@newscl01ah.mathworks.com>...
>
> >
> > Well, we've already discussed why. Your TolX needs to be adjusted.
>
> Yes, but I don't understand how. (Taking into account that I don't think that "empirically" is a good choiche).
> If my parameters assume values between 0.5 and 5 and I need for each parameter a precision of 0.1 how much should I set TolX to?
============

If your desired precision is 0.1 then I would set TolX safely below that, e.g. 1e-4. However, this doesn't include the effect of the scaling you're doing. If 1e-4 is an insignificant change in x, then an insignificant change in x/scale is 1e-4/scale.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 10 Jul, 2012 09:40:13

Message: 21 of 27

"Matt J" wrote in message <jtf5o7$qof$1@newscl01ah.mathworks.com>...

------------
> If your desired precision is 0.1 then I would set TolX safely below that, e.g. 1e-4. However, this doesn't include the effect of the scaling you're doing. If 1e-4 is an insignificant change in x, then an insignificant change in x/scale is 1e-4/scale.
========

I would like my scaled values (all of them near 1) to have a precision of 0.1. but still TolX = 1e-6 is not enough!
Anyway I've opened a separate thread to understand the step-sizes.

Coming back to my scaling problem I've discovered what's happening. If I do not rescale the gradient on very large parameters (1e4) are much smaller (by a factor 1e4, quite reasonably) than those on "small" (~1) parameters. Therefore only small parameters are fitted, the "big" ones remains fixed to the initial guess!
If I scale my parameters all the terms of the gradient become of the same order of magnitued and therefore fminunc changes each parameter.
Apparently in this situation I get stuck in a local minimum.
Uncorrect scaling instead makes my fit more robust!
Very interesting indeed.
It won't be trivial to deal with this!! (Or should I just forget about it and hope everything keeps working?)

Subject: Scaling of parameters for fminunc?

From: Bruno Luong

Date: 10 Jul, 2012 10:39:12

Message: 22 of 27

See the general guideline to setup the optimization here:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/311760

When you scale the parameters, you affects the (1) non-linearity AND (2) the conditioning of the cost function. You can't overlook either aspect.

Bruno

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 10 Jul, 2012 11:24:18

Message: 23 of 27

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message
> When you scale the parameters, you affects the (1) non-linearity AND (2) the conditioning of the cost function. You can't overlook either aspect.

Thank you, very interesting.
I have to note that even without formal training on the topic I got right most of the points!
I have regularization, I spend lots of time in making a first guess which is "almost the solution", I compute the gradient (I took me a bit to get it correct :-p ) and so on.
The reason why the whole topic started is stated in this point in your old post:
"At east, the correct rescaling should be carried out on the unknown space so that the variation among unknowns must affect comparably on the cost function. Do not mix brutally parameters with different physical scales together."

The optimization worked correctly before scaling the parameters. But I knew that it wasn't correct to mix such different parameters, affecting much differently the cost function. Therefore I scaled my parameters hoping to improve the performance of the minimization. But it turned out the other way round! If I scale the parameters such that each unknown affects the cost function comparably... I get stuck in local minima!! (Worse to me than my initial guess!)
Without scaling some unknown are just untouched by the minimizator, they remain at the initial guess... And everything works better and local minima are avoided.

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 10 Jul, 2012 14:20:12

Message: 24 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jth3d2$5p8$1@newscl01ah.mathworks.com>...
>
>
> The optimization worked correctly before scaling the parameters. But I knew that it wasn't correct to mix such different parameters, affecting much differently the cost function. Therefore I scaled my parameters hoping to improve the performance of the minimization. But it turned out the other way round! If I scale the parameters such that each unknown affects the cost function comparably... I get stuck in local minima!! (Worse to me than my initial guess!)
> Without scaling some unknown are just untouched by the minimizator, they remain at the initial guess... And everything works better and local minima are avoided.
=============

There's something strange and inconsistent about what you've been saying. At the beginning of the paragraph, you say that the optimization worked correctly without scaling the parameters. Later, you say that without the scaling, some parameters get stuck at their initial values (but somehow that's okay). Surely, it can't be correct behavior that some parameters are untouched by the minimizer. The result can only be a solution if the initial guess for those parameters was already optimal.

In any case, all of this scaling you've been attempting is expected to make little improvement. As I said earlier, FMINUNC already does internally the very same scaling you're attempting now. That's what the Hessian computations are for.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 10 Jul, 2012 14:57:07

Message: 25 of 27

"Matt J" wrote in message <jthdms$hgv$1@newscl01ah.mathworks.com>...

>
> There's something strange and inconsistent about what you've been saying.

Well... It's not in what I'm saying, it's in what's happening! I don't know where to bang my head anymore!

-----
>At the beginning of the paragraph, you say that the optimization worked correctly without scaling the parameters. Later, you say that without the scaling, some parameters get stuck at their initial values (but somehow that's okay). Surely, it can't be correct behavior that some parameters are untouched by the minimizer. The result can only be a solution if the initial guess for those parameters was already optimal.
=======
To be precise I meant that: "the results that I get without scaling are what I have reasons to believe to be the correct solution" (I don't have a ground truth to compare to).

------
> In any case, all of this scaling you've been attempting is expected to make little improvement. As I said earlier, FMINUNC already does internally the very same scaling you're attempting now. That's what the Hessian computations are for.
====
And here my doubts come in. The behaviour of fminunc changes dramaticaly when I scale my parameters. For once the solution is for sure wrong, and also the minimizer stops in 3 or 4 iterations instead of ~200. Even with TolX set to be extremely small. Setting it to smaller values at best gives me and additional iteration with a ridiculus step-size.
Than you said that fminunc scales using the Hessian. But you cited the example of x^2+1e4y^2, that is difficult to minimize along the gradient direction. That should not happen if the Hessian scaled everything correctly, correct? (and what I'm doing with scaling is making the gradient of similar magnitudes in each direction)

Subject: Scaling of parameters for fminunc?

From: Matt J

Date: 10 Jul, 2012 15:29:07

Message: 26 of 27

"Luca " <l.presottoRE@MOVE.campus.unimib.NOTit> wrote in message <jthfs3$qoi$1@newscl01ah.mathworks.com>...
>
>
> >At the beginning of the paragraph, you say that the optimization worked correctly without scaling the parameters. Later, you say that without the scaling, some parameters get stuck at their initial values (but somehow that's okay). Surely, it can't be correct behavior that some parameters are untouched by the minimizer. The result can only be a solution if the initial guess for those parameters was already optimal.
> =======
> To be precise I meant that: "the results that I get without scaling are what I have reasons to believe to be the correct solution" (I don't have a ground truth to compare to).
============

If you believe you're getting the correct solution without scaling, why are you trying to fix it?

In any case, what is the basis of your belief that the solution is correct? What have you done to verify that the solution you get without scaling is indeed a local/global minimum? There are various things you can do to check, like plot the objective function through the final point along the direction of the gradient (and maybe a along few other directions as well like the axis of your "untouched parameters"). You could also run the algorithm on simulated data, where you do know ground truth.




> Than you said that fminunc scales using the Hessian. But you cited the example of x^2+1e4y^2, that is difficult to minimize along the gradient direction. That should not happen if the Hessian scaled everything correctly, correct? (and what I'm doing with scaling is making the gradient of similar magnitudes in each direction)
===========

Yes, but FMINUNC doesn't take steps in the direction of the gradient, so making the gradient magnitudes comparable in all directions doesn't necessarily do any good.
The direction that FMINUNC is trying to step in is the Newton direction:

 stepdir = -inv(Hessian)*gradient

This is superior to what steepest descent does. If you apply stepdir to the problem x^2+1e4y^2 you will see that it reaches the minimum in 1 iteration, as opposed to steepest descent which might take hundreds of iterations.

The reason you might be getting "dramatically different" behavior with scaling is partly because we're not sure how the scaling you're doing is playing with fminunc's stopping parameters like TolX (you say that it's really a local min, but I don't know what you've done to check this). It also might be affecting the finite difference computations used by fminunc, making them either better or worse. Probably worse, since you're getting poorer results... You could try making FMINUNC compute the Hessian analytically, both with and without scaling, to experiment with the effect.

Subject: Scaling of parameters for fminunc?

From: Luca

Date: 10 Jul, 2012 15:58:07

Message: 27 of 27

"Matt J" wrote in message <jthho3$5d1$1@newscl01ah.mathworks.com>...

-----
> If you believe you're getting the correct solution without scaling, why are you trying to fix it?
===

I know that "if it ain't broke, don't fix it". But still I tought that making my results independent of a parameter that has no meaning was a good thing. (Now I have a 1e4 because for no specific reason my matrix has values in a specific physical units of measurement. If I changed it I could be dealing with 1e-6 or whatever. Then why not scaling in a way that with every unit my algorithms works in the same way?)


-----
> In any case, what is the basis of your belief that the solution is correct? What have you done to verify that the solution you get without scaling is indeed a local/global minimum? There are various things you can do to check, like plot the objective function through the final point along the direction of the gradient (and maybe a along few other directions as well like the axis of your "untouched parameters"). You could also run the algorithm on simulated data, where you do know ground truth.
====
I'm writing a segmentation algorithm. The parameters I'm getting define the surface of an object I have in my image. If I superimpose the extraced surface on the original image I can distinguish, as far as eyes can see, at least between three conditions: "absolutely perfect", "looks correct with some minor problems", "blatantly wrong".
Without scaling I always get "absolutely perfect", as far as I can distinguish. With scaling I get: "blatantly wrong". My surface is totally mispositioned.

 
------------
> stepdir = -inv(Hessian)*gradient
>
> This is superior to what steepest descent does. If you apply stepdir to the problem x^2+1e4y^2 you will see that it reaches the minimum in 1 iteration, as opposed to steepest descent which might take hundreds of iterations.
>
> The reason you might be getting "dramatically different" behavior with scaling is partly because we're not sure how the scaling you're doing is playing with fminunc's stopping parameters like TolX (you say that it's really a local min, but I don't know what you've done to check this). It also might be affecting the finite difference computations used by fminunc, making them either better or worse. Probably worse, since you're getting poorer results... You could try making FMINUNC compute the Hessian analytically, both with and without scaling, to experiment with the effect.
==========
Ok, got it. I see that in the first iterations fminunc is calling my function once or twice per iteration. How can it approximate the hessian from so few function calls? (that's off topic, just out of curiosity)

To undestrand what happens to TolX (which, I stress, is at least 1e-7 lower than my required precision even on scaled parameters) I opened a topic about how is the step-size reported calculated. But I did not get any answer.

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
numerical sensitiv... Matt J 4 Jul, 2012 14:29:03
variable scaling Luca 4 Jul, 2012 09:54:17
fminunc Luca 4 Jul, 2012 09:54:17
rssFeed for this Thread

Contact us