Perhaps some confusion has arisen due to my choice of a wrong title for this thread. Actually, the most important part of the new rejection algorithm is
linear fit, not RANSAC. The use of the RANSAC algorithm is an implementation detail, but the core of the method is that we are fitting all pixels in a stack to a straight line. In fact I am considering the possibility of not using RANSAC at all, in favor of other robust algorithms that are more efficient.
The difference between linear fit clipping and sigma clipping is best explained with a graphical example. Consider the following figure.
Suppose we are integrating N images. In the figure, we have represented a stack of N pixels at the same coordinates, whose values have been sorted in ascending order and plotted as circles on the graphic. The horizontal axis is the ordered sample number, and the vertical axis represents the available range of pixel values.
What we pursue with pixel rejection is to exclude those pixels that have
spurious values, such as pixels pertaining to cosmic rays, plane and satellite trails, and other too high or too low pixels that are not valid data for some reason. We want to reject those pixels to make sure that they won't enter the integrated image.
The symbol m represents the
median of the distribution. The median is the value of the central element in the ordered sequence, which in statistical terms corresponds to
the most probable value in the distribution. Here, the median is being used as a
robust estimate of the true value of the pixel, or more realistically,
the most representative value of the pixel that we have available. Robustness here means
unaffected by outliers. An outlier is a value abnormally low or high, as are both extreme values in the graphic, for example. Outliers are precisely those values that we want to exclude, or reject. By contrast, valid pixel values or
inliers are those that we want to keep and integrate as a single image to achieve better signal-to-noise ratio. Our goal is to reject
only really spurious pixels. We need a very accurate method able to distinguish between valid and not valid data in a smart way, adapted to the variety of problems that we have in our images. Ideally no valid data should be rejected at all.
With two horizontal lines we have represented the
clipping points for high and low pixels, which are the pixels above and below the median of the stack, respectively. Pixels that fall outside the range defined by both clipping points are considered as outliers, and hence rejected. In the sigma clipping algorithm, as well as in its Winsorized variant, clipping points are defined by two multipliers in
sigma units, namely KH and KL in the figure. Sigma is an estimate of the
dispersion in the distribution of pixel values. In the standard sigma clipping algorithms, sigma is taken as the standard deviation of the pixel stack. These algorithms are
iterative: the clipping process is repeated until no more pixels can be rejected.
In the figure, rejected pixels have been represented as void circles. Six pixels have been rejected, three at each end of the distribution. The question is, are these pixels
really outliers? The answer is most likely yes, under normal conditions. Normal conditions include that the N images have flat illumination profiles. Suppose that some of the images have slight additive sky gradients of different intensities and orientations —for example, one of the images is slightly more illuminated toward its upper half, while another image is somewhat more illuminated on its lower half. Due to varying gradients, the sorted distribution of pixels can show tails of
apparent outliers at both ends. This may lead to the incorrect rejection of valid pixels with a sigma clipping algorithm.
Enter linear fit clipping:
In this figure we have the same set of pixels as before. However, the linear fit clipping algorithm is being used instead of sigma clipping. The new algorithm fits a straight line to the set of pixels. It tries to find the best possible fit in the sense of minimizing
average deviation. Average deviation is computed as the mean of the distances of all pixel values to the line. When we find a line that minimizes average deviation, what we have found is the actual
tendency in the set of pixel values. Note that a linear fit is more robust than a sigma clipping scheme, in the sense that the fitted line does not depend, to a larger extent, on all the images having flat illumination profiles. If all the images are flat (and have been normalized, as part of the rejection procedure), then the fitted line will tend to be horizontal. If there are illumination variations, then the line's slope will tend to grow. In practice, a linear fit is more tolerant to slight additive gradients than sigma clipping.
The neat result is that less false outliers are being rejected, and hence more pixels will enter the final integrated image, improving SNR. In the figure, r is the fitted line and d represents average deviation. HL and KH, as before, are two multipliers for high and low pixels, in average deviation units. However, instead of classifying pixels with respect to a fixed value (the median in the case of sigma clipping), in the linear fit algorithm low and high pixels are those that lie below and above the fitted line, respectively. Note that in the linear fit case only two pixels have been rejected as outliers, while the set of inlier pixels fit to a straight line really well.
The remaining question is: is linear fit clipping the best rejection algorithm? I wouldn't say that. Linear fit rejection requires a relatively large set of images to work optimally. With less than 15 images or so its performance may be worse or similar to Winsorized sigma clipping. To reject pixels around a fitted line with low uncertainty, many points are required in order to optimize it in the least average deviation sense. The more images, the better linear fit is possible, and hence the new algorithm will also perform better. It's a matter of testing it and see if it can yield better results than the other rejection algorithms available, for each particular case.