Author Topic: Confidence Clipping - a proposal (Read 3006 times)

spokeshave · « **on:** 2016 October 06 15:44:34 »

Greetings. This is my first post to this forum, so if I put this in the incorrect subforum, please feel free to move it appropriately. I would like to offer a suggestion for a more effective pixel rejection routine than those currently offered. As a Health Physicist, part of my background includes radiation detection and measurement. The challenges associated with this field are similar to those in astrophotography - we deal with Poisson statistics, we try to separate the desired signal from statistical and systematic noise, etc. In particular, we deal heavily in counting statistics that establish detection thresholds. Much of our detection thresholding is derived from the work of Currie¹. His work led to the establishment of a critical level determination approach using statistical confidence intervals. This approach assumes that the background radiation is Poisson in nature and as such, the standard deviation (sigma) is simply the square root of the mean. The confidence interval is selected from a standard normal distribution z-score table and the resulting calculation establishes the threshold at which a count can be determined to be outside of the background distribution with the confidence established by the z-score selection.

This approach, I believe, is directly applicable to pixel rejection, if the pixel stack is appropriately normalized. Those pixels that are associated with the target distribution are analogous to the background radiation, outlier pixels (cold and hot pixels, satellite trails, etc.) are analogous to source radiation. I adapted the equation to our astrophotography needs as follows:

delta = z*sqrt(2*N*mean)/N

Where delta represents the confidence interval about the mean, N is the number of pixels in the stack, and z is the z-score value for the confidence interval desired. Of course, we also need the mean for the target distribution and we don't have that. The mean of the pixel stack is a poor estimation since it includes the outliers. However, the median works out to be a good estimator, particularly if the stack is large (>20) and it is largely unaffected by outlier pixels if the number of outliers is small compared to the number in the target distribution. As an example of how this is applied, consider a 20-pixel stack with a median value of 500. I would like to select a confidence interval of 95%, meaning that if a pixel value falls within the interval, there is a 95% probability that it represents the target distribution. Of course, any confidence interval can be selected, this is simply an eample. For a two-tailed 95% confidence interval, the z-score value is 1.96. Thus, delta is calculated:

delta = 1.96*sqrt(2*20*500)/20 = 13.8

Therefore, the confidence interval for pixel rejection is 500 +/- 13.8. Pixel rejection is then a simple matter of rejecting those less than 486.2 and greater than 513.8. I worked a simple example using a simple dataset on Cloudy Nights and contrasted it with sigma clipping and Winsorized sigma clipping here:

http://www.cloudynights.com/topic/552356-a-little-test-with-the-asi1600-and-sparse-dithering/#entry7472106

I think you'll find that this extremely simple approach is very robust, easily adjustable by selecting the desired confidence interval, very fast, and automatically adjusts to the number of subs taken. I believe it will prove to be a better, yet simpler approach than sigma clipping or Winsorized sigma clipping. It requires only one input from the user (confidence interval), does not require sigma high and sigma low settings and would use trivial computational resources since it requires no iteration. I call it "confidence clipping".

My intention is to write a more thorough white paper on this approach and would very much like to test it on some images, though I lack the programming skills to do so. In the meantime, I offer this to you for free and open use and hopefully for incorporation into PI as an alternative pixel rejection technique.

I welcome any comments of questions you may have.

Tim

¹ Currie LA. Limits for qualitative detection and quantification determination. Analytical Chemistry 40(3):587-593; 1968

mschuster · « **Reply #1 on:** 2016 October 06 21:14:25 »

Tim,

I like your proposal.

Please consider incorporating detector gain into your formula. Example median 500 DN, detector gain 0.5 e-/DN:

sqrt(500) = 22 DN

sqrt(0.5 * 500) / 0.5 = 32 DN

The second formula better estimates noise standard deviation in DN, eg, take the square root of e- and then convert back to DN.

IMO detector gain as a parameter would improve your technique.

Thanks,
Mike

PS: Also, PI uses a noise estimator for various scaling and weighting purposes. However, the estimators suffer from target confusion, eg unresolved stars and galaxies compromise the estimate. On my Ha integrations of dense star fields, noise estimates regularly fail to scale with sqrt(N) for this reason, N the number of subs.

Knowing a candidate signal level (eg median), detector gain, and independent detector read noise I think a better noise estimate could be found in a manner similar.

Juan Conejero · « **Reply #2 on:** 2016 October 06 23:20:43 »

Hi Tim,

Welcome to PixInsight Forum. Thank you so much for your suggestion. Tim, Mike, and everybody, please feel free to contribute your algorithms by extending the ImageIntegration tool if you think you can improve over my work. This tool has been released as an open source product, as well as our entire development framework:

https://github.com/PixInsight

Direct link to the source code of ImageIntegration:

https://github.com/PixInsight/PCL/tree/master/src/modules/processes/ImageIntegration

I am currently working hard on a new version of PixInsight, which includes improved versions of ImageIntegration (further parallelization of statistics calculations) and other tools, so your contributions would arrive just in time.

Please bear in mind that I am completely alone on the development front, so any help is always very welcome.

eganz · « **Reply #3 on:** 2016 October 12 11:31:59 »

Tim,

your suggestion above is interesting, but I think that you would benefit from more exposure to real astrophotography data sets.

In particular, it is not clear how much experience you have already using PixInsight on real data sets for astrophotography.
The data set that you posted on cloudy nights is very unrealistic with way too many strong outliers. A more typical data set will have Random noise around a typical mean, with some number of outliers or spurious data. Or, sometimes the data itself will be weaker with stronger noise. Sometimes a satellite will pass overhead…

In any case, the user will typically adjust the high and low rejection points interactively to exclude the noise in difficult situations.
It makes sense to have 2 settings, since often the high and low rejection may come from different causes, may be independent, or may scale differently.

I suppose the right question to ask is not how exactly to set the cut offs, but rather whether the statistical models that are being used are good enough for the job.

I have to say that my experience so far has been very positive with the results of PixInsight stacking based on the statistical models that it uses. One does need to adjust the default cut off settings to accommodate different data sets.

Eric

PS the image integration tool also includes a large number of options and variations that you can explore.

spokeshave · « **Reply #4 on:** 2016 October 23 11:49:38 »

Eric:

Thanks for your thoughts. I tried to make it clear in the examples I used on CN that the dataset I used was contrived and over-simplified. I have been using PI for about 2 years and have a fair amount of experience with astrophotography datasets. Your comment:

"A more typical data set will have Random noise around a typical mean, with some number of outliers or spurious data."

Is precisely what my approach relies upon. It would allow the user to determine the confidence interval about that typical mean for which those outliers and spurious data would be rejected.

Finally, I am not saying that there is anything wrong with the approaches offered by PI. They work quite well when properly dialed in. I'm simply proposing another tool for the toolbox.

Tim

This forum is closed since 5 March 2020

PixInsight Forum is now available at:

https://pixinsight.com/forum/

News:

Author Topic: Confidence Clipping - a proposal (Read 3006 times)

spokeshave

Confidence Clipping - a proposal

mschuster

Re: Confidence Clipping - a proposal

Juan Conejero

Re: Confidence Clipping - a proposal

eganz

Re: Confidence Clipping - a proposal

spokeshave

Re: Confidence Clipping - a proposal