Code covered by the BSD License  

Highlights from
Outlier removal v2

Be the first to rate this file! 34 Downloads (last 30 days) File Size: 1.94 KB File ID: #37344

Outlier removal v2

by Paul

 

29 Jun 2012 (Updated 10 Apr 2013)

Rosner's many outlier test, vectorized.

| Watch this File

File Information
Description

Updated Jaco de Groot's outlier test (here: http://www.mathworks.com/matlabcentral/fileexchange/11106-outlier-removal). The code is now entirely vectorized and will probably run faster for large data sets.

There was also an issue in the earlier version where the pcrit and lambda were calculated incorrectly. This has been fixed. Here is his original description:

Grubb's outlier test can be used to remove one outlier (see deleteoutliers.m by Brett Schoelson). If you decide to remove this outlier, you might be tempted to run Grubbs' test again to see if there is a second outlier in your data. However, if you do this, you cannot use the same rejection criteria. Rosner has extended Grubb's method to detect several outliers in one dataset((Rosner, B., 1983. Percentage points for a generalized ESD
many-Outlier Procedure. Technometrics 25, 165?172.))

Rosner's many outliers test is implemented in this Matlab file. For good results, the dataset should be normally distributed after removal of the outliers (this can be tested for by "Pearson Chi Square Hypothesis Test" written by G. Levin). If the dataset is not normally distributed, usually the logarithm of the data will be.

I did not use the original reference by Rosner, but I used a paper that is available for download as PDF ( "Quality control of semi-continuous mobility size-fractionated
particle number concentration data", Atmospheric Environment 38 (2004) 3341?3348, Rong Chun Yu,*, Hee Wen Teh, Peter A. Jaques, Constantinos Sioutas,
John R. Froines)

Acknowledgements

Outlier Removal inspired this file.

Required Products Statistics Toolbox
MATLAB release MATLAB 7.13 (R2011b)
Tags for This File  
Everyone's Tags
outlier, outlier removal, probability, rosner, statistics
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (4)
04 Jul 2012 Paul

Pasco:

Thanks for the advice! Somehow when I tested it with a bunch of 0s and 1s the -NaN thing seemed to work, dunno why. I've changed it at least temporarily to be -Inf. I'll look into modifying how the mean/std is done at some point, since this is clearly a placeholder.

04 Jul 2012 Pasco Alquim

Paul,

I didn't study your program so that I can propose alternatives but my comment goes to the fact that

1- There is no such a thing as a "-NaN". NaNs are not positive or negative. They are just ... NaN

2- yy(yy == -NaN) = 0; will never work because by definition all numbers are different from a NaN, including the NaN itself (just try (NaN == NaN)). But you can get what you want with

yy(isnan(yy)) = 0;

As regarding the usage of NaN instead of zeros, well perhaps you can

1- count them and remove them temporarily.
2- compute the means and stand deviation taking into account the true number of non-zero values (that's for sure what nanmean is doing when it ignores the NaNs)

03 Jul 2012 Paul

Pasco: Do you have an alternative way to do that? I'd be very interested to know, because I couldn't find anything good that still vectorizes the code.

The key is that I make an upper-triangular matrix there, and I need the mean and standard deviation of the upper triangular half. However, if I leave zeros in the bottom half, they are included in the mean. So the only way I saw to do it is to use nanmean and nanstd and set those zeros to NaN instead of 0.

The problem, of course, is that on the rare occasions where you have zeros in your original matrix, those will end up being set to NaN during the triangular matrix step.

Using logical indexing on the yy matrix doesn't preserve the matrix shape, and thus cannot be used. This seems like the fastest way, though I will admit that if you have -NaN in your data for some reason, it will cause problems (but honestly, wouldn't it cause problems anyway?)

03 Jul 2012 Pasco Alquim

A bit difficult to trust in a file that does things like:

ys(ys == 0) = -NaN;

or

yy(yy == -NaN) = 0;

Updates
02 Jul 2012

Fixed a bug where k = 1.

05 Jul 2012

Changed -NaN code to -Inf. Will eventually work out a more reasonable solution.

10 Apr 2013

Didn't put the original Jacco script in the acknowledgement box originally.

Contact us