MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# How to tell if two signals are similar

Asked by Juan P. Viera on 11 Jul 2012

I want to compare two periodic signals. Lets say I have 1 period for each of the signals, and I want to know how similar they are. The final target of this is that if they are similar enough (for a given criteria) I can stop calculating some other stuff and save a lot of time. But that doesn't really matter, the question is the same, how to tell if two signals are similar.

I first considered the xcorr(x,y) function, the problem I see with cross correlation is that it gives good info about how to phase one of the signals for it to best match the other one, but it doesn't say how similar they are (or maybe I don't understand it very well, haha). From what I understand and have tested, cross correlation can be really affected by the signals amplitude, for example:

f1 = 1000*sin(x)

f2 = sin(x)

c = xcorr(f2,f2) is the ideal case, where the correlation is perfect c2 = xcoor(f1,f2) is the case I am evaluating, and the peak at c2 is bigger than c, suggesting better correlation? There is a normalization option in MATLAB which I don't really understand (I do understand it when you are doing autocorrelation, like c = xcorr(f2,f2,'coeff'), then you geat the peak correlation to be 1 at zero lag, that is OK, but for crosscorrelation between two functions, I don't understand how this normalization works).

I don't care about the phase since I will compare just one period of each of the signals, so I just need an algorithm to quantify how equal or similar these two signals are. But from what I see the correlation can sometimes be miss leading.

I may add that the signals are discrete, with the same number of elements, same sampling frequency and their frequency is the same, and known, if that helps.

## Products

No products are associated with this question.

Answer by Honglei Chen on 11 Jul 2012

Perhaps you are looking for a scalar quantity to represent the similarity between two signal as is? If that's the case, you basically only need zero lag, try

```xcorr(f1,f2,0,'coeff')
```

Answer by Juan P. Viera on 12 Jul 2012
Edited by Juan P. Viera on 12 Jul 2012

Yes, I am looking for a scalar quantity to represent the similarity between two signals, and yes, I only need zero lag. But I don't understand the result when using 'coeff'. How is the normalization done?

I ask this because I see that the final result I get is not sensible to amplitude changes. For instance, to illustrate my doubt, run this:

``` x = 1 : 0.1 : 10;
f1 = sin(x);
f2 = 2*sin(x);
f3 = 5*sin(x);```
` figure(1); plot(x,f1,x,f2,x,f3);legend('f1','f2','f3');`
``` [c11,lag11]=xcorr(f1,f1);%,'coeff');
[c12,lag12]=xcorr(f1,f2);%,'coeff');
[c13,lag13]=xcorr(f1,f3);%,'coeff');```
` figure(2); plot(lag11,c11,lag12,c12,lag13,c13);legend('c11','c12','c13');`
``` [c11norm,lag11]=xcorr(f1,f1,'coeff');
[c12norm,lag12]=xcorr(f1,f2,'coeff');
[c13norm,lag13]=xcorr(f1,f3,'coeff');```
``` figure(3);plot(lag11,c11norm,'o',lag12,c12norm,'x',lag13,c13norm,'+');
legend('c11norm','c12norm','c13norm');```

If you don't normalize, then figure(2) shows that you get more correlation as your functions have more amplitud, which is expected if you see the correlation definition. If you then use 'coeff' to normalize (and I don't know what happens here), f2 and f3 correlate equally with f1, which we know should not be the case.

Conclusion, either I don't know how to use xcorr well, or this will not work for what I need.

Answer by Wayne King on 12 Jul 2012
Edited by Wayne King on 12 Jul 2012

The correlation coefficient is bounded by 1 in absolute value. (Normalized) correlation values always lie between -1 and 1. At either extreme, you are saying that the linear relationship between the two signals is perfect. In other words, one is perfectly linearly predictable from knowledge of the other.

If you use xcorr without the 'coeff' option then you are obtaining the unnormalized cross correlation (at zero lag) in your case. That value absolutely depends on the values of the two waveforms. Just as covariance (not correlation) depends on the values of the quantities you are measuring. Look at the mathematical expression for the (unnormalized) cross correlation at zero lag. You are multiplying the two vectors element by element and summing all the values. If you do that, then as you increase or decrease the amplitude, you increase or decrease the value at zero lag accordingly.

That is precisely why xcorr() with 'coeff' is easier to interpret for your case. You can immediately tell the strength of the linear relationship. It's the same situation with covariance and correlation in inferential statistics. If I tell you the covariance between two random variables is 10,000, then what does that mean? Does it mean the linear relationship is strong or not? The answer is you don't know based on that number alone. But if I tell you the correlation is 0.95, that tells you something. That tells you something about the strength of the linear relationship between the two.

Juan P. Viera on 12 Jul 2012

All you say is shown in the example code that I posted. And yes, when I normalize I get the same correlation curve for all 3 functions because they are linearly related:

f3 = a*f1 = b*f2

That is exactly why I mean that xcorr will not work for me to tell if two signals are different or not, which is the main question I posted.

Any other suggestion? I was thinking of simply substract the two signals and evaluate the "difference" resultant vector, or something like that.

Honglei Chen on 12 Jul 2012

What is the similarity you look for? It seems that in your definition, f1, f2, and f3 are not similar. If that's the case, then indeed you are not looking for the correlation between the two.

Answer by Joseph on 25 Jul 2012

Hi Juan, I've got the exact same problem. Xcorr simply gets larger as the amplitude of your comparing signal's multiplier gets larger and if you use the 'coeff' option you get the same value for every multiplier. What I ended up doing is simply dividing one by the other. So.. first do xcorr to find the time shift and then divide to find the amplitude comparison. This makes sense for my situation but mine's a bit different from yours, so maybe it won't work for you. But I thought I'd throw it out there.