Author Topic: Image Integration - Question about Winsorized Sigma Clipping (Read 9211 times)

Niall Saunders · « **on:** 2009 December 29 16:26:08 »

Hi Juan,

In the Winsorized Sigma Clipping method, you do not provide a parameter to control the upper and lower percentile points for the actual Winsorization.

Does it in fact use the parameter values set in the Percentile sliders, even though they are 'greyed out' when Winsorized Sigma Clipping is selected? Or, have you fixed the clipping points at some arbitrary percentile point that you may already have established empirically?

Cheers,

Juan Conejero · « **Reply #1 on:** 2010 January 03 12:04:39 »

Quote

In the Winsorized Sigma Clipping method, you do not provide a parameter to control the upper and lower percentile points for the actual Winsorization.

For simplicity, the Winsorized sigma clipping algorithm uses the same clipping point parameters as regular sigma clipping, Sigma low and Sigma high, respectively for dark and bright pixels (w.r.t. the median value of the current pixel stack).

There's no "fixed" thing here; everything (relevant) is configurable.

Niall Saunders · « **Reply #2 on:** 2010 January 03 12:18:39 »

Hi Juan,

That is exactly what I assumed you would have done (I am beginning to be able to 'tune in' to your wavelength

)

However, you have an unfortunate 'glitch' in your code, in that you have 'greyed out' the Percentile Sliders when Winsorized Sigme Clipping has been selected. This leaves the user 'confused' as to whether they 'can' adjust the Percentile sliders.

I was fairly sure that the workaround would just have been a case of 'select Percentile Clipping, adjust the Percentile sliders, select Winsorized Sigma Clipping, adjust the Sigma sliders', but you have to admit that this is not 'intuitive'.

Should we assume that this is just one of many tasks that you will be adding to your "if only I had more time" pile

[You might want to consider modifying the pop-up tooltip text as well]

Cheers,

Juan Conejero · « **Reply #3 on:** 2010 January 03 12:40:09 »

Niall,

Quote

That is exactly what I assumed you would have done (I am beginning to be able to 'tune in' to your wavelength )

Be careful as my wavelengths are usually outside the visible/audible spectra 8)

Quote

you have 'greyed out' the Percentile Sliders when Winsorized Sigme Clipping has been selected. This leaves the user 'confused' as to whether they 'can' adjust the Percentile sliders.

I don't understand where's the problem here. The percentile sliders are grayed because Winsorized sigma clipping doesn't use those parameters. Why do you think this is wrong? Or am I missing something here... Perhaps the "problem" you see happens on Windows only (I don't have a Windoze machine at hand now, only Linux and everything seems to work well in this regard). Or if you feel I am way too slow this afternoon (which would be no surprise considering the amount of food/drink ingested by me), a screenshot could help...

Niall Saunders · « **Reply #4 on:** 2010 January 03 13:30:56 »

Quote

I don't understand where's the problem here

Sorry Juan,

In my learning process about Winsorization, I understood that there was a 'clipping point', beyond which the data was replaced with the value actually found AT the 'clipping point'

I therefore understood that the data would first be analysed with respect to the clipping points, the necessary data would be substituted, and ONLY then would the statistical analysis begin, resulting in a 'more robust' (first estimate) value for the Mean and Standard Deviation.

Then, given the first iteration values for Mean and SD, the Sigma parameter would be multiplied with the SD, and the data would then be more 'formally' clipped - with data outside the (Sigma x SD) range being excluded from the next iteration.

From here on, the iteration would continue, with no further consideration being given to the original 'Winsorization Clipping Points'

But, having now retuned to the ultra-narrowband gamma-ray frequencies of your mind, I think I now see what happens . . .

You calculate Mean and SD for the FULL data set, then apply the (Sigma x SD) clip range, and it is at THIS point that the Winsorization 'substitution' takes place for the first iteration. Which has the huge advantage that the number of data values in the dataset is NEVER reduced during further iterations.

I think that I ought not to 'delve any deeper' - your code is obviously exquisitely 'clever', and you might want to

delete this response

before the competition reads it !!

Cheers,

Juan Conejero · « **Reply #5 on:** 2010 January 03 16:08:25 »

Hi Niall,

Winsorization is a method of robust statistics. Given a set of samples

S = {s1,s2,...,sN}

sorted in ascending order, Winsorization consists of replacing the k lowest and the k highest samples (putative outliers) with their nearest neighbors s(k+1) and s(N-k), respectively.

Note that this is very different from the usual trimming procedure, which would just discard the 2*k extreme values at both ends of the sequence. Instead of rejecting data in a single step, Winsorization tries to see what would happen if the supposed outliers would have values inside the supposedly valid range of values. It is a more refined way to reject data, and is particularly well suited for its implementation as an iterative procedure in the context of pixel rejection.

In ImageIntegration, Winsorization is used to compute robust estimates of the mean and standard deviation of a pixel stack. These are called the Winsorized mean and Winsorized sigma, respectively. The basic process has been described by Huber (Huber, Peter J. and Ronchetti, E., Robust Statistics, 2nd Ed., Wiley 2009) and can be summarized as follows:

1. We begin with the median and standard deviation of the ordered set of pixel values. Note that I use the median instead of the mean as the initial estimate of the central value. This is different from customary practice but improves robustness in my opinion. Call these values m and s.

2. Calculate:

   t0 = m - c*s
   t1 = m + c*s

   In my implementation, I have set c=1.5 following Huber's recommendations.

3. Replace all pixel values < t0 with t0, and all pixel values > t1 with t1. This forms a Winsorized set.

4. Compute the mean and standard deviation of the Winsorized set of pixel values. Call them m1 and s1, respectively.

5. Set s2 = 1.134*s1. The constant 1.134 assumes a normal distribution of values in the data set.

6. If

   |s2 - s|/s > q

then

   Set m = m1
   Set s = s2
   Go to step 2

Otherwise, stop iteration. Use m=m1 and s=s2 as robust estimates of the central value and dispersion of the pixel stack, respectively.

I use q=0.0005 as the convergence limit for iteration. This value is probably too small, but the process converges quickly so it doesn't affect performance.

With the values of m and s resulting from the above iterative procedure, we implement a normal sigma-clipping procedure. This is Winsorized sigma clipping rejection as I have implemented it in ImageIntegration.

Hope this will help to clarify concepts

Niall Saunders · « **Reply #6 on:** 2010 January 03 16:44:59 »

Ahahhh, you had your 'mind-cloaking' facility still enabled

But, at last I understand your implementation (I think $:-\$ )

Fundamentally, you have three 'fixed constants':
c = 1.5
k = 1.134 (my choice of variable name, for the sake of clarity)
and
q = 0.0005

You also allow the user to control the following two 'variables':
SigmaHi
and
SigmaLo

But, I don't see any reference to the sigma variables in your pseudocode algorithm - unless the 'c' variable is, in fact, supposed to be the sigma variable, set individually for each end of the data range.

Is this the case?

Also, when you set k = 1.134 for 'normal' distribution - should I read 'normal' as 'Gaussian' distribution?
In which case, would there be an alternative k-value that would suit 'Poisson' distribution (perhaps more suitable for smaller datasets?), and could this be under user control?

Or should I give up on statistics (which I NEVER studied in school or university, having been brought up to understand that there are only three kinds of data, "Lies, Damn Lies, and Statistics"

) and stick to nothing more complicated than the 'AVERAGE' function

Cheers,

Juan Conejero · « **Reply #7 on:** 2010 January 04 07:33:07 »

Quote

Fundamentally, you have three 'fixed constants':
c = 1.5
k = 1.134 (my choice of variable name, for the sake of clarity)
and
q = 0.0005

Correct. These are fixed parameters that control the iterative Winsorization / robust estimation process.

Quote

You also allow the user to control the following two 'variables':
SigmaHi
and
SigmaLo

Yes, these are the two user-definable sigma clipping points, for low (< central value) and high (> central value) pixel values, respectively.

Quote

But, I don't see any reference to the sigma variables in your pseudocode algorithm - unless the 'c' variable is, in fact, supposed to be the sigma variable, set individually for each end of the data range.

Sigma clipping points (sigmaLo and sigmaHi) are applied in the sigma clipping part of the algorithm. c=1.5 is used for Winsorization only. Huber says that any value 1 <= c <= 2 is a good choice, and recommends c=1.5 as a good compromise. The c factor defines the limit, in sigma units, below and above which observed values (pixel values in our case) are to be replaced with pseudo-observations for robust estimation of parameters at each iteration.

The goal of this Huber/Winsorization procedure is to provide robust (=immune to outliers) estimates for the central value and the dispersion of values in the pixel stack. Pixel rejection is executed for each element of the stack of integrated images. Hence, each element is a stack of pixel values. The robust estimate of the central value of a stack is a value representative of the "true" pixel value at the stack coordinates. Similarly, the robust estimate of dispersion is representative of the "true" variation of pixel values in the stack. Both values are crucial to achieve an optimal rejection of outliers.

Quote

Also, when you set k = 1.134 for 'normal' distribution - should I read 'normal' as 'Gaussian' distribution?

Yes, normal and Gaussian are synonyms here.

k=1.134 compensates for the fact that we are replacing all values below and above 1.5*sigma at each iteration. This clearly leads to an underestimated value of standard deviation. The 1.134 factor comes from assuming a Gaussian distribution of pixel values in the stack and from the c=1.5 limit in sigma units. 0.134 is approximately the fraction of the area below the tails of a Gaussian function cut at +/- 1.5*sigma.

Quote

In which case, would there be an alternative k-value that would suit 'Poisson' distribution (perhaps more suitable for smaller datasets?), and could this be under user control?

Probably yes, but it wouldn't make too much sense. Bear in mind that this k refers to the distribution of pixel values in a stack of pixels being integrated (at the same coordinates); it has nothing to do with the distribution of the noise in the images. A Gaussian model doesn't seem too wrong here since it is a plausible model for variables that tend to concentrate around a central value, which is just what happens with stacked pixels. Anyway, a sigma clipping rejection should always be applied to relatively large data sets, say 10 images or more, as it requires (to be useful) that the standard deviation be a meaningful estimator of dispersion.

Niall Saunders · « **Reply #8 on:** 2010 January 04 07:59:12 »

Juan, thank-you

I think that (series of) explanation(s) is just what was needed.

Of course, most folks do NOT need to know what is happening 'behind the sliders' - but it IS useful to know the details when you are trying to extract the last possible useful ADU value out of a 'challenged' data-set (which is nearly always the case for me)

And that was the background behind this enquiry. A prospective PixInsight user was struggling to find a method to eliminate aircraft trails for a VERY SMALL set of VERY LONG subs - naturally, the initial recommendation would have been to tell him just to eliminate the 'bad' sub. However, that sub was ONLY BAD where the passing aircraft had left it's doo-doo behind. Presumably the remainder of the image would have been most welcome in helping to contribute to the remainder of the statistical analysis.

We are now waiting for a mutually convenient time for he and I to work with his data - using the excellent TeamViewer 'Remote Control' application - and I felt that I would be best able to help him if I really had as much knowledge of the ImageIntegration process as possible. (Again, I will get him to start his 30-day PixInsight trial accordingly)

I am really glad that you were able to take the time to explain the algorithm in such detail.

Cheers

Jack Harvey · « **Reply #9 on:** 2010 January 04 09:27:15 »

Niall Please report back on your experience with the removal of the airplane trails using the different rejection settings. I have been dealing with dead columns and some very severe vignetting on the 1.3 meter scope we occasionally get to use and have used Windsorized and Percentile Clipping. Both seem to do a good job and I am unable to tell if one is better than the other. Thanks and looking forward to your results.

Niall Saunders · « **Reply #10 on:** 2010 January 04 09:49:49 »

Hi Jack,

I can only see a possible advantage if the images that you acquire have been significantly 'dithered', such that a 'dead column' appears at random x-positions throughout the data set.

If you didn't 'dither' between lights, then the dead column will remain 'stationary' and will therefore become 'statistically significant' throughout the stack of aligned subs. None of the clipping algorithms have been designed to eliminate such 'significant' data.

The same argument, in my mind anyway, would seem to apply to the vignetting issue. The vignetting is spatially common to ALL subs, however dithering would tend to change the location of a vignetted pixel from image to image - once the images are star-aligned.

That said, I cannot see how an anomally as 'large in scale' as vignetting could be entirely eliminated by any form of data-clipping. Not, that is, unless your data samples were significantly large.

I have seen, in the past, a situation where a stack of planetary images of Mars - each with a nasty 'dust-donut' - were able to 'self-heal' during nothing more than a Median Combine. But this was because there was a MASSIVE number (>1,500) of subs, and there was NO GUIDING whatsoever (so the 'dithering' at f/20 on my 8" LX90 was huge

). I am sure if I re-ran THOSE images through a clipping process, the image would be better than I managed using just Registax (over three years ago now, and still my 'best planetary' to date).

Presumably you just don't have appropriate Flats/FlatDarks to otherwise calibrate out the vignetting?

And, as for a 'dead column', I would have been looking for a 'Gaussian Blur' tool to run down the offending column at the end of the exercise - perhaps one area where the 'painting' abilities of PhotoShop are still needed, pending the native ability of PixInsight to achieve the same (yes, I know, I will wash my mouth out with soap

)

HTH

Cheers,

This forum is closed since 5 March 2020

PixInsight Forum is now available at:

https://pixinsight.com/forum/

News:

Author Topic: Image Integration - Question about Winsorized Sigma Clipping (Read 9211 times)

Niall Saunders

Image Integration - Question about Winsorized Sigma Clipping

Juan Conejero

Re: Image Integration - Question about Winsorized Sigma Clipping

Niall Saunders

Re: Image Integration - Question about Winsorized Sigma Clipping

Juan Conejero

Re: Image Integration - Question about Winsorized Sigma Clipping

Niall Saunders

Re: Image Integration - Question about Winsorized Sigma Clipping

Juan Conejero

Re: Image Integration - Question about Winsorized Sigma Clipping

Niall Saunders

Re: Image Integration - Question about Winsorized Sigma Clipping

Juan Conejero

Re: Image Integration - Question about Winsorized Sigma Clipping

Niall Saunders

Re: Image Integration - Question about Winsorized Sigma Clipping

Jack Harvey

Re: Image Integration - Question about Winsorized Sigma Clipping

Niall Saunders

Re: Image Integration - Question about Winsorized Sigma Clipping