PixInsight 1.6.1 - New ImageIntegration Features

Juan Conejero · Aug 13, 2010

Hi all,

The latest version 1.6.1 of PixInsight comes with a couple new features implemented in the ImageIntegration tool. This is a brief description of both, where I'll borrow some figures and text from other posts to make up a sort of official description.

Linear Fit Clipping Pixel Rejection Algorithm

The new Linear Fit Clipping rejection method fits the best possible straight line, in the twofold sense of minimizing average deviation and maximizing inliers, to the set of pixel values in each pixel stack. At the same time, all pixels that don't agree with the fitted line (to within user-defined tolerances) are considered as outliers and thus rejected to enter the final integrated image.

The new pixel rejection algorithm is controlled, as usual, with two dedicated threshold parameters in sigma units: Linear fit low and Linear fit high, respectively for low and high pixels. The screenshot below shows the new version of ImageIntegration and some results of the new rejection method.

Click here to see a full-size version

The difference between linear fit clipping and sigma clipping is best explained with a graphical example. Consider the following figure.

Suppose we are integrating N images. In the figure, we have represented a stack of N pixels at the same coordinates, whose values have been sorted in ascending order and plotted as circles on the graphic. The horizontal axis is the ordered sample number, and the vertical axis represents the available range of pixel values.

What we pursue with pixel rejection is to exclude those pixels that have spurious values, such as pixels pertaining to cosmic rays, plane and satellite trails, and other too high or too low pixels that are not valid data for some reason. We want to reject those pixels to make sure that they won't enter the integrated image.

The symbol m represents the median of the distribution. The median is the value of the central element in the ordered sequence, which in statistical terms corresponds to the most probable value in the distribution. Here, the median is being used as a robust estimate of the true value of the pixel, or more realistically, the most representative value of the pixel that we have available. Robustness here means unaffected by outliers. An outlier is a value abnormally low or high, as are both extreme values in the graphic, for example. Outliers are precisely those values that we want to exclude, or reject. By contrast, valid pixel values or inliers are those that we want to keep and integrate as a single image to achieve a better signal-to-noise ratio. Our goal is to reject only really spurious pixels. We need a very accurate method able to distinguish between valid and not valid data in a smart way, adapted to the variety of problems that we have in our images. Ideally no valid data should be rejected at all.

With two horizontal lines we have represented the clipping points for high and low pixels, which are the pixels above and below the median of the stack, respectively. Pixels that fall outside the range defined by both clipping points are considered as outliers, and hence rejected. In the sigma clipping algorithm, as well as in its Winsorized variant, clipping points are defined by two multipliers in sigma units, namely KH and KL in the figure. Sigma is an estimate of the dispersion in the distribution of pixel values. In the standard sigma clipping algorithms, sigma is taken as the standard deviation of the pixel stack. These algorithms are iterative: the clipping process is repeated until no more pixels can be rejected.

In the figure, rejected pixels have been represented as void circles. Six pixels have been rejected, three at each end of the distribution. The question is, are these pixels really outliers? The answer is most likely yes, under normal conditions. Normal conditions include that the N images have flat illumination profiles. Suppose that some of the images have slight additive sky gradients of different intensities and orientations ?for example, one of the images is slightly more illuminated toward its upper half, while another image is somewhat more illuminated on its lower half. Or, for example, that some of the images have been acquired under worse transparency conditions. Due to varying gradients and illumination levels, the sorted distribution of pixels can show tails of apparent outliers at both ends. This may lead to the incorrect rejection of valid pixels with a sigma clipping algorithm.

Enter linear fit clipping:

In this figure we have the same set of pixels as before. However, the linear fit clipping algorithm is being used instead of sigma clipping. The new algorithm fits a straight line to the set of pixels. It tries to find the best possible fit in the sense of minimizing average deviation. Average deviation is computed as the mean of the distances of all pixel values to the line. When we find a line that minimizes average deviation, what we have found is the actual tendency in the set of pixel values. Note that a linear fit is more robust than a sigma clipping scheme, in the sense that the fitted line does not depend, to a larger extent, on all the images having flat illumination profiles. If all the images are flat (and have been normalized, as part of the rejection procedure), then the fitted line will tend to be horizontal. If there are illumination variations, then the line's slope will tend to grow. In practice, a linear fit is more tolerant to slight additive gradients than sigma clipping.

The neat result is that less false outliers are being rejected, and hence more pixels will enter the final integrated image, improving the SNR. In the figure, r is the fitted line and d represents average absolute deviation. KH and KL, as before, are two multipliers for high and low pixels, but this time they are expressed in average absolute deviation units. Instead of classifying pixels with respect to a fixed value (the median in the case of sigma clipping), in the linear fit algorithm low and high pixels are those that lie below and above the fitted line, respectively. Note that in the linear fit case only two pixels have been rejected as outliers, while the set of inlier pixels fit to a straight line really well. In addition, the employed tolerance is more restrictive in the linear fit case ?compare d with sigma on the two preceding figures.

The remaining question is: is linear fit clipping the best rejection algorithm? I wouldn't say that as a general statement. Linear fit rejection requires a relatively large set of images to work optimally. With less than 15 images or so its performance may be worse or similar to Winsorized sigma clipping. To reject pixels around a fitted line with low uncertainty, many points are required in order to optimize it in the least average absolute deviation sense. The more images, the better linear fit is possible, and hence the new algorithm will also perform better. It's a matter of testing it and see if it can yield better results than the other rejection algorithms available, for each particular case.

Rejection Maps and the Slope Map

The linear fit clipping rejection algorithm generates, when the Generate rejection maps option is enabled, a slope map image along with the standard low and high rejection maps.

Pixels in a rejection map have values proportional to the number of rejected pixels in each stack of pixels. For example, if a rejection map pixel is white (with a value of 1.0), that means that all pixels have been rejected at its image coordinates in the set of integrated images. A black rejection map pixel (= 0.0) means that no pixel has been rejected, and intermediate values 0 < x < 1 represent the amount of rejected pixels. Rejection maps have many interesting applications. The most obvious is evaluating the quality and suitability of a pixel rejection process. For example, it is easy to know if some plane trails or cosmic rays have been properly rejected by just inspecting the high rejection map at the appropriate locations.

A slope map represents the variations in each pixel stack, as evaluated by the linear fit clipping algorithm. A pixel in a slope map has a value proportional to the slope of the fitted line for the corresponding pixel stack. Recall that the slope of a line that passes through two points {x1,y1} and {x2,y2} on the plane is the quotient between the vertical and horizontal increments, that is: slope = (y2-y1)/(x2-x1).

Thus a slope map value of 0 means that all the pixels in the corresponding stack have the same value (on average), since the fitted line is horizontal (slope = 0) in this case. A slope value of 0.5 corresponds to a slope angle of 45 degrees, which is very high (you rarely will see it). A slope value of 1.0 would correspond, in theory, to infinite slope, or a fitted line that would form an angle of 90 degrees with respect to the X axis ?which is impossible, for obvious reasons.

In the screenshot at the beginning of this post, which I'll repeat below for your convenience, you can see a slope map generated after integration of 19 images with the new linear fit clipping rejection algorithm. It is the image with the 'slope' identifier.

Click here to see a full-size version

Along with the slope map, you can see also the standard 'rejection_low' and 'rejection_high' maps. Note that the three map images have active screen transfer functions (STF) applied, so their actual values are being largely exaggerated on the screenshot.

In the set of 19 integrated images, approximately one half of them have additive sky gradients that are brighter toward the top of the frame, and the other half has the opposite gradients: brighter toward the bottom of the frame. Note how the slope map has represented these variations, as both the top and bottom parts of the map image are brighter on the sky background. Note also that the main objects M81 and M82, as well as most bright stars, are being represented as having relatively strong variations on the slope map. This is normal and happens quite frequently. It means that in the set of integrated images, there are significant variations due to varying atmospheric conditions, different exposure times, and other factors.

Slope maps represent a new tool available in PixInsight for controlling the properties of the image integration process. We haven't designed any specific application for these maps yet, but following the design principles and philosophy that govern PixInsight, we provide you with new tools and resources that maximize your control over everything that happens with your data. This is the PixInsight way that you love

Persistent File Cache

Another important new feature of ImageIntegration is a persistent file cache. ImageIntegration has already had an efficient file cache since its first version. The cache stores a set of statistical data for each integrated image, which are necessary to perform the integration task, such as the extreme pixel values, the average value, the median, some deviations, noise estimates, etc. Calculation of all of these data is quite expensive computationally. By storing these data in a cache, ImageIntegration can run much faster because if you repeat the integration procedure with the same images, for example to try out new pixel rejection parameters, ImageIntegration can retireve the necessary data from the cache, avoiding a new calculation.

In previous versions of the ImageIntegration tool, the cache was volatile, that is, it was lost when you terminated the current PixInsight session. The new cache is persistent, which means that it is remembered across PixInsight sessions. In the screenshot below you can see the new dialog box that allows you to control the persistent cache with several options.

A similar cache will also be available for StarAlignment in one of the next 1.6.x versions. In this case, the persistent cache will store all stars detected in each image, which will speed up the image registration process very significantly.

Nocturnal · Aug 13, 2010

See, I read this and I don't understand people who write that somehow Juan can't write documentation because he writes code. Nonsense. Juan writes excellent documentation. It just needs to be organized and made accessible.

Andres.Pozo · Aug 13, 2010

Nocturnal said:
See, I read this and I don't understand people who write that somehow Juan can't write documentation because he writes code. Nonsense. Juan writes excellent documentation. It just needs to be organized and made accessible.

I concur with you. There are lots of texts like this lost in the forum that are really interesting.
I think it would be more helpful if Juan would publish this documentation in the Resources section of the main site or in the wiki. Then a simple notification in the forum would suffice.

georg.viehoever · Aug 13, 2010

I fully agree with Sander! Juan is an excellent documentation writer...

The other thing I would like to highlight about PI: Most processes have a strong theoretical background, and can be justified on these grounds. Image Integration with linear fit clipping is an example of this. This is quite different to what many other tools provide, which somehow "magically" are supposed to improve images. You never now what you get and why you get it in these "magical" approaches to images processing.

Georg

Niall Saunders · Aug 13, 2010

Hi all - and thanks to Juan for a clear and concise explanation - as always.

On my first read through two questions immediately spring to mind - both concerning the Winsorization process, when associated with 'pixel clipping'.

The first question is basically "Could the Linear Fit algorithm also be further improved by applying 'Winsorization' as well?" - in which case would it not just be simpler to offer <Winsorization> as a RadioButton on the dialogue GUI, whenever it would be appropriate (Sigma Clipping and Linear-Fit Sigma Clipping)

Secondly - I know Juan, that you spent some of your valuable time explaining Winsorization to me in an earlier thread - however, with the new graphed 'fit' image that you have introduced in this thread, perhaps now would be the time to show how Winsorization takes these fundamental ideas a step further. Perhaps you could also explain to others what actually happens when the 'pixel clipping' algorithm is allowed to iterate, showing how a new, 'better' Median is calculated each time, and how this gradually (hopefully!!) 'iterates' to the 'best' value for the 'Median' - which is, after all, the value that we are looking for each time, i.e. the ADU value that this pixel is going to be set to in the final, integrated, image.

You might also then be able to clearly explain how 'Winsorization' improves the Median calculation, by not just simply 'excluding' the outliers from all further calculation (thus reducing the count value of 'n', which is the number of samples in the data set), but - instead - how Winsorization 'replaces' all the outlier values WITH the ADU value actually AT the cut-off (or 'clipping') boundary, thus keeping the same number of 'n' throughout the iteration process.

(Of course, it goes without saying that I 'assume' that my understanding of the process is correct :

:yell: :'()

Then, to complete this little 'tutorial', we need to show, by example, how to 'tweak' the clipping sliders so that 'just enough' of the outliers are disposed of. Just enough to clip the bright aircraft, meteor, satellite and cosmic-ray trails; and just enough to clip the dim 'noise' - and, in both cases, how to use the Clipping Images to make sure that no (or the very minimum) structural detail appears in the Clipping Images. And finally, demonstrate how to use the Console Output to monitor just 'how much' (in terms of overall percentage of image pixels) HAS actually hit the editting-room floor.

ImageIntegration really is one area where, in my opinion, PixInsight now stands head and shoulders above the competition.

Cheers,

Enzo De Bernardini · Aug 13, 2010

Niall Saunders said:
ImageIntegration really is one area where, in my opinion, PixInsight now stands head and shoulders above the competition.

I agree, absolutely!

Thanks Juan for the detailed explanation. A minor question: the persistent file cache uses much disk space? (say, comparable to images?)

Regards,

Enzo.

Harry page · Aug 13, 2010

Hi

Yes excellent write up mr juan , ever thought about doing it for a job >

:-*

Harry

Jharvey · Aug 13, 2010

Pretty darn cool!

mmirot · Aug 13, 2010

Would the output of the planned alignment cashe be able to fed into the integration module?
If we are ever able to couple these process it might speed things up.

Max

Emanuele · Aug 14, 2010

Today I tried Linear Fit Clipping with a Red channel I have been integrating and it worked wonders!! Incredible.
I had 2 very bright satellite trails and they are just gone!

Thank you for such a great process!
E.

Juan Conejero · Aug 15, 2010

Thanks guys!

Some assorted answers to your questions follow.

Would the output of the planned alignment cashe be able to fed into the integration module?

No, a cache has nothing to do with connecting or chaining processes. A cache stores temporary working data that are private to a particular process or procedure.
You're talking about a super task to perform a preprocessing pipeline. This can be implemented with a script. There are active threads about this topic on the development board where I plan to participate.

the persistent file cache uses much disk space? (say, comparable to images?)

No, not at all. Caches are much smaller in data size than images. In the case of ImageIntegration, currently each cache item (there's one item per integrated image) requires about 200 bytes or so. The cache for StarAlignment will be much larger, but still way smaller than an image. For StarAlignment, each cache item will store a list of detected stars. This is because the star detection phase is the actual bottleneck of the StarAlignment process. Each star is a small structure (code snippet excerpted from StarAlignment's source code):

struct Star
{
FPoint pos; // Centroid position in pixels, image coordinates
unsigned count; // Number of pixels
float total; // Integrated flux (pixel values normalized to the [0,1] range)
};

which requires just 16 bytes in binary format. Typically there can be from 50 to 100000 stars per image, which requires a maximum of about 16 MB in the worst practical case. A StarAlignment cache item also stores a cryptographic digest of the parameters used for star detection, for cache validation, which requires another 16 bytes.

In the case of ImageIntegration the cache is so small that it is being stored in the global settings file (PixInsight.ini or PixInsight.conf). For StarAlignment, a different strategy will be necessary to avoid exploding the global settings file, which would have an adverse impact on overall performance. There will be a little stub per cache item in the global settings file but the actual data will be stored as independent files.

Sorry for the lengthy story, but I've been working on these caches recently and I had to tell all this stuff to someone

Could the Linear Fit algorithm also be further improved by applying 'Winsorization' as well?

Not really. The sigma clipping and linear fit clipping algorithms are very different in their design and principles. Winsorization is a robust statistics technique. It does a very good job robustifying sigma clipping because this algorithm is too sensitive to random outliers. Linear fit clipping is based on minimizing average absolute deviation, which is already a robust estimate. Winsorization cannot help in this case; it would just add more complexity without any benefit.

Perhaps you could also explain to others what actually happens when the 'pixel clipping' algorithm is allowed to iterate, showing how a new, 'better' Median is calculated each time, and how this gradually (hopefully!!) 'iterates' to the 'best' value for the 'Median'

An iterative k-sigma clipping procedure is convergent and stable. Convergence refers to the property of an algorithm of reaching a solution after a finite number of iterations. Stability refers to the property of reaching a solution that does not vary when the number of iterations tends to infinite.

At each iteration, k-sigma clipping rejects those pixels that differ by more than k, in sigma units, from the median. The median is taken as the central value of the distribution of pixels in the stack being integrated. In this sense, the median is taken as the best estimate of the true value of the pixel at each iteration, and k defines the maximum deviation that we can tolerate, relative to the distribution's dispersion. Rejected pixels are excluded from the working data set, so the length of the working data set decreases with successive iterations. The process naturally converges to a set where no more pixels can be rejected. No matter how many outliers are in the initial set, convergence is guaranteed because the dispersion of the working set decreases by successive iterations, so eventually a subset will be found where no pixel differs by more than k, in sigma units, from the median of the subset's distribution.

The Winsorization process adds an elegant form of "intelligence" to the basic sigma clipping procedure. Instead of removing outliers at each iteration, Winsorization looks to see what would happen if the outliers would have plausible values. Instead of just removing the outliers, Winsorization tries to reuse them (by replacing them with their nearest valid values) because an outlier might carry some useful information. For example, an outlier that is too high in a distribution of observations might show a true trend of the observed phenomenon, even if the outlier by itself is "wrong" in the statistical sense.

Hope this helps!

mmirot · Aug 15, 2010

Juan Conejero said:
Thanks guys!

Some assorted answers to your questions follow.

Would the output of the planned alignment cashe be able to fed into the integration module?

Click to expand...

No, a cache has nothing to do with connecting or chaining processes. A cache stores temporary working data that are private to a particular process or procedure.
You're talking about a super task to perform a preprocessing pipeline. This can be implemented with a script. There are active threads about this topic on the development board where I plan to participate.

the persistent file cache uses much disk space? (say, comparable to images?)

Click to expand...

Thanks Juan

We miss you input on this. Progress appears to have stalled without your help.

Max

PixInsight 1.6.1 - New ImageIntegration Features

Juan Conejero

PixInsight Staff

Nocturnal

Well-known member

Andres.Pozo

Well-known member

georg.viehoever

Well-known member

Niall Saunders

Well-known member

Enzo De Bernardini

PixInsight Ambassador

Harry page

PixInsight Ambassador

Jharvey

Well-known member

mmirot

Well-known member

Emanuele

Well-known member

Juan Conejero

PixInsight Staff

mmirot

Well-known member