Regression analysis in Matlab

3 views (last 30 days)
Aseri T
Aseri T on 28 Dec 2013
Edited: dpb on 1 Jan 2014
How can I fit a model to predict a response variable(y) for a set of regressor variables(i.e. x1, x2, x3, x4, x5, x6). Probably the model may or may not be linear one. The 'sample' of simulation data are:
x1=[263,268,273,278,283,288,293,298,303,308,313,318,263,268,273,278,283,288,293];
x2=[323,333,343,353,363,373,343,423,433,473,323,443,463,493,353,363,383,403,453];
x3[10,20,50,40,20,10,30,40,50,40,30,20,20,10,20,30,40,40,20];
x4[0.83,0.88,0.77,0.83,0.84,0.87,0.71,0.84,0.63,0.69,0.83,0.50,0.88,0.83,0.97,0.83,0.96,0.83,0.78];
x5[0.00101325,1.01325,0.000101325,0.101325,1.01325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.00101325,0.101325,1.01325,1.01325];
x6[0.05,0.06,0.06,0.07,0.08,0.07,0.09,0.1,0.06,0.05,0.04,0.08,0.09,0.1,0.07,0.06,0.06,0.08,0.05];
y=[257.98,262.99,268.05,273.17,278.35,283.59,288.9,294.29,299.75,305.3,310.93,316.64,258.22,263.23,268.29,273.4,278.58,283.82,289.12];
Please advice me.....
T. Aseri
  1 Comment
dpb
dpb on 28 Dec 2013
If have Statistics Toolbox, see
doc regress
W/O,
doc slash % NB: the backslash operator '\'

Sign in to comment.

Answers (2)

dpb
dpb on 28 Dec 2013
Edited: dpb on 29 Dec 2013
Now having Matlab open and convenient, to amplify on the above...
Stat Toolbox ...
>> b1=regress(y',[x1' x2' x3' x4' x5' x6'])'
b1 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
Base Matlab backslash operator...
>> b2=[[x1' x2' x3' x4' x5' x6']\y']'
b2 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
>>
Remarkable similarity, wot? :)
Now, as you might expect, the Toolbox solution has some more interesting outputs...
>> [b,bint,r]=regress(y',[x1' x2' x3' x4' x5' x6']);
>> [b bint]
ans =
1.0102 0.9968 1.0237
-0.0005 -0.0085 0.0075
-0.0090 -0.0404 0.0223
-6.8343 -9.6170 -4.0516
-0.2722 -1.2170 0.6726
-13.6140 -37.7253 10.4972
>> sqrt(sum(r.*r)/length(r))
ans =
0.6206
>> [b,bint,r]=regress(y',[x1' x2' x4']);
>> [b bint]
ans =
b =
1.0095 0.9980 1.0210
-0.0024 -0.0091 0.0043
-7.2257 -9.7197 -4.7316
>> sqrt(sum(r.*r)/length(r))
ans =
0.6663
>> [b,bint,r]=regress(y',[x1' x4']);
>> sqrt(sum(r.*r)/length(r))
ans =
0.6786
>>
Looking at the intervals on the estimated coefficients, only a few of the variables are significant and a much more parsimonious model is possible w/ essentially same SSe as with blindly including all six.
Your mission, should you choose to accept it, is to complete the analysis and judiciously choose the overall best model. I have not considered or looked at any interaction terms you'll note.
ADDENDUM:
Oversight--the above doesn't include the intercept term. Write the model as
b1=regress(y',[ones(size(x1')) x1' x2' x3' x4' x5' x6'])'
or similarly to include it.
  4 Comments
Aseri T
Aseri T on 31 Dec 2013
Yes I do have statistics tool box and I am working on it. I need to first learn it then I am able to choose best fitted model with minimum regressor via performing all need tests. Thank you for your precious support, I'll be in touch with you.
Aseri T
Aseri T on 31 Dec 2013
Here is the problem, I've entered all data in column format with equal no. of rows (6696):

Sign in to comment.


dpb
dpb on 1 Jan 2014
Edited: dpb on 1 Jan 2014
NB: you created a Matlab dataset object Datas (BTW, altho it doesn't matter to Matlab what a variable name is, "data" are plural from the Latin, the singular is a "datum" point--common US English use has corrupted this terribly) so you must reference the values by the use of the dot to reference the various variables.
Use
Datas.Properties.VarNames
to see the variable names in the Datas object; then you get the actual data by using
Datas.VarName
where "VarName" is the name for the particular variable. Assuming the Excel sheet has headings of the names you've used above, something like
X=[ones(length(Datas),1) Datas.Ta Datas.Tabs ... Datas.eabs];
would appear to be correct. If there are no headers, then the default variable name 'Var1' would have been assigned and it will be an array in which it's somewhat simpler to reference --
b=regress(Datas.Var1(:,7), [ones(length(Datas),1) Datas.Var1(:,1:6)]);
Again, note that you must specify the constant term in the model explicitly with regress
Since you say you have the Statistics Toolbox, I recommend reverting to regstats to get the additional statistics you'll want/need to evaluate the quality of the model directly.
See
doc dataset % and related for details on using the dataset object
Alternatively, of course, you could use one of the other methods of reading in the file ( xlsread comes to mind) and return the data into a base Matlab array which would obviate all the dataset stuff which may not be of much real use for your present purposes.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!