Hi Niall,
Winsorization is a method of robust statistics. Given a set of samples
S = {s1,s2,...,sN}
sorted in ascending order, Winsorization consists of replacing the k lowest and the k highest samples (putative outliers) with their nearest neighbors s(k+1) and s(N-k), respectively.
Note that this is very different from the usual
trimming procedure, which would just discard the 2*k extreme values at both ends of the sequence. Instead of rejecting data in a single step, Winsorization tries to see what would happen if the supposed outliers would have values inside the supposedly valid range of values. It is a more refined way to reject data, and is particularly well suited for its implementation as an iterative procedure in the context of pixel rejection.
In ImageIntegration, Winsorization is used to compute robust estimates of the mean and standard deviation of a pixel stack. These are called the Winsorized mean and Winsorized sigma, respectively. The basic process has been described by Huber (Huber, Peter J. and Ronchetti, E.,
Robust Statistics, 2nd Ed., Wiley 2009) and can be summarized as follows:
1. We begin with the median and standard deviation of the ordered set of pixel values. Note that I use the median instead of the mean as the initial estimate of the central value. This is different from customary practice but improves robustness in my opinion. Call these values m and s.
2. Calculate:
t0 = m - c*s
t1 = m + c*s
In my implementation, I have set c=1.5 following Huber's recommendations.
3. Replace all pixel values < t0 with t0, and all pixel values > t1 with t1. This forms a Winsorized set.
4. Compute the mean and standard deviation of the Winsorized set of pixel values. Call them m1 and s1, respectively.
5. Set s2 = 1.134*s1. The constant 1.134 assumes a normal distribution of values in the data set.
6. If
|s2 - s|/s > q
then
Set m = m1
Set s = s2
Go to step 2
Otherwise, stop iteration. Use m=m1 and s=s2 as robust estimates of the central value and dispersion of the pixel stack, respectively.
I use q=0.0005 as the convergence limit for iteration. This value is probably too small, but the process converges quickly so it doesn't affect performance.
With the values of m and s resulting from the above iterative procedure, we implement a normal sigma-clipping procedure. This is Winsorized sigma clipping rejection as I have implemented it in ImageIntegration.
Hope this will help to clarify concepts
