Author Topic: MAD and Average Absolute Deviation (Read 726 times)

ngc1535 · « **on:** 2019 February 26 08:39:09 »

Hi,

Just a quick question on some statistics nomenclature.
When examining the "goodness of the fit" of Dynamic PSF the "MAD" is used.
Here it is the "MEAN Absolute Deviation."
However in the Statistics tool (and elsewhere) MAD is taken to be the "MEDIAN Absolute Deviation."

I am going a little MAD about MAD. lol

Am I wrong to see this as ambiguous?
In order to agree with the Statistics tool, shouldn't the column in Dynamic PSF be labeled the Average Absolute Deviation of the fit?
Or (and I don't think so..but maybe) is the DynamicPSF fit supposed to be the median version MAD? It doesn't appear so based on the formula.

-adam

aworonow · « **Reply #1 on:** 2019 February 26 08:46:16 »

You are correct...we statistical types (my career) use MAD most often for a deviation from the median.

AlexW

Juan Conejero · « **Reply #2 on:** 2019 February 26 11:53:25 »

Unfortunately, the reference documentation for DynamicPSF is now becoming obsolete as a result of successive improvements made to this tool. One of the aspects that I have changed recently is precisely goodness-of-fit estimation.

Current versions of DPSF use a Winsorized mean of the absolute differences between actual image pixels and their corresponding PSF function values. For each fitted star, the absolute differences are evaluated pixel-by-pixel within the sampling region used to compute PSF fits. Roughly, this region is the square plotted around each fitted star on the image when the DPSF tool is working in dynamic mode.

The Winsorized mean is providing more reliable and accurate goodness-of-fit estimates than the trimmed mean absolute deviation from the median (MAD) used in previous versions. This is because the Winsorized mean is both a robust estimator (resilient to outliers) and a more efficient estimator (in the sense of being able to use the sampled data to approximate its parameter more accurately) than MAD.

Admittedly, the acronym MAD has been quite confusing here because it has been representing mean absolute deviation, instead of the much more often used median absolute deviation.

ngc1535 · « **Reply #3 on:** 2019 February 26 13:03:34 »

Quote from: Juan Conejero on 2019 February 26 11:53:25

Unfortunately, the reference documentation for DynamicPSF is now becoming obsolete as a result of successive improvements made to this tool. One of the aspects that I have changed recently is precisely goodness-of-fit estimation.

Current versions of DPSF use a Winsorized mean of the absolute differences between actual image pixels and their corresponding PSF function values. For each fitted star, the absolute differences are evaluated pixel-by-pixel within the sampling region used to compute PSF fits. Roughly, this region is the square plotted around each fitted star on the image when the DPSF tool is working in dynamic mode.

The Winsorized mean is providing more reliable and accurate goodness-of-fit estimates than the trimmed mean absolute deviation from the median (MAD) used in previous versions. This is because the Winsorized mean is both a robust estimator (resilient to outliers) and a more efficient estimator (in the sense of being able to use the sampled data to approximate its parameter more accurately) than MAD.

Admittedly, the acronym MAD has been quite confusing here because it has been representing mean absolute deviation, instead of the much more often used median absolute deviation.

Thank you again Juan (and Alex). This puts my mind to rest and the explanation of the current DPSF implementation is great... good to know.
This is a topic I am covering in lectures and lessons... so a better understanding of what is going on is super helpful.
-adam

ngc1535 · « **Reply #4 on:** 2019 February 26 15:52:36 »

Ok... I want to get this right...

This is MAD->

The ~ indicates MEDIAN of the set. So two medians are in there! It is the Median Absolute Deviation about the Median

This is Average Absolute Deviation as reported in Statistics Process? (see below):

This time the m(X) (couldn't find an X bar) is the mean. So there is a mean both inside and out.

The old version used in Dynamic PSF was a MEAN(|x_i - x_median|)
This time one mean on the outside and a median on the inside! This measure is what I actually think is in the the Statistics process with a mean on the outside and about the median on the inside?

Finally Juan is now using a (winsorized) MEAN Difference formulation:

where x_i and y_i represent the difference of the observed pixel value and the predicted value that comes from the chosen fit equation (Gaussian, Moffat.. )

Please let me know if I am close in the above.
I guess the story for me is that I need things literally spelled out.
"Mean Absolute Deviation about the Median"
"Mean Absolute Deviation about the Mean"
"Median Absolute Deviation about the Median"
"Median Absolute Deviation about the Mean"

(that last one it does not appear is used much...lol kinda the worse of both attributes)

-adam

Juan Conejero · « **Reply #5 on:** 2019 February 28 03:19:25 »

Hi Adam,

Here is a formal description of our current goodness of fit estimator for the DynamicPSF tool:

$\textsc{GoodnessOfFit}(I,\textsc{PSF}):=\left(\textsc{WinsorizedMean}(\mathbf{x})={{1}\over{n}}\sum_{i=0}^{n-1}{w_i}\right)^{-1}$

where n is the number of sampled pixels, and the components of the vector x are absolute differences between sampled pixel values and their corresponding PSF estimates:

$x_i=|I_i-\textsc{PSF}_i|\hspace{1.5em}\mbox{for}\;\;0\le{i}<n$

where I symbolizes the image region being sampled.

The Winsorized difference w_i is given by:

$w_i = \left\{\begin{align*}x_k\hspace{1em}&\mbox{if}\;\;x_i<x_k\\x_{n-k}\hspace{1em}&\mbox{if}\;\;x_i>x_{n-k}\\x_i\hspace{1em}&\mbox{otherwise}\end{align*}\right.\;\;$

where x_k and x_n-k are, respectively, the kth and (n-k)th order statistics of the set of x components. For the DPSF tool, a 20% Winsorization is being applied, so we have k = 0.2n.

The Winsorization process is replacing the k smallest and the k largest values with their corresponding nearest neighbors in the ordered set of absolute differences. This is what makes the goodness of fit estimate a robust one, since any outliers are being rejected and replaced with plausible values.

Evaluating the quality of a fitted PSF model—or, for that matter, of any functional fit of sampled pixel data—is not a trivial task. We need a robust estimator to make the process immune to the noise and spurious data, but at the same time, we need an efficient estimator, able to use the largest possible subset of the sampled data for evaluation. This is why we cannot use MAD (robust, but inefficient) or the standard deviation (efficient, but non-robust) here.

EDIT: Obviously, the goodness of fit evaluates to the reciprocal of the Winsorized mean of absolute differences (i.e., a PSF that reproduces sampled pixel values exactly would be infinitely good). However, the values shown on the DPSF interface as MAD (meaning mean absolute difference here) are the means. In this way the best fits are always at the top of the list when it is sorted by MAD.

ngc1535 · « **Reply #6 on:** 2019 February 28 07:55:16 »

Thank you Juan.
Very helpful.
-adam

This forum is closed since 5 March 2020

PixInsight Forum is now available at:

https://pixinsight.com/forum/

News:

Author Topic: MAD and Average Absolute Deviation (Read 726 times)

ngc1535

MAD and Average Absolute Deviation

aworonow

Re: MAD and Average Absolute Deviation

Juan Conejero

Re: MAD and Average Absolute Deviation

ngc1535

Re: MAD and Average Absolute Deviation

ngc1535

Re: MAD and Average Absolute Deviation

Juan Conejero

Re: MAD and Average Absolute Deviation

ngc1535

Re: MAD and Average Absolute Deviation