fixed WBPP 2.8.2: ImageRegistration is rejecting good images without justification

johnpane

Well-known member
In this CFA dataset with 110 subframes, with prior instances of WBPP ImageRegistration successfully registered all 330 monochrome subframes.

With WBPP 2.8.2, seven seemingly random ones fail without explanation beyond "PCL exception". Here is a log file excerpt from one of those seven instances.

Code:
[2024-12-31 06:08:19] * Loading target file: /Users/pane/NOT BACKED UP/vdb/debayered/Light_BIN-1_9576x6388_EXPOSURE-120.00s_FILTER-NoFilter_CFA/vdb_15_Light_120_secs_107_c_d_B.xisf
[2024-12-31 06:08:19] Loading image: w=9576 h=6388 n=1 Gray Float32
[2024-12-31 06:08:19] 5 image properties
[2024-12-31 06:08:19] 95 FITS keyword(s) extracted
[2024-12-31 06:08:19] Noise reduction:
[2024-12-31 06:08:19] Structure map:
[2024-12-31 06:08:19] Local maxima map:
[2024-12-31 06:08:19] Detecting stars: done
[2024-12-31 06:08:19] Fitting 8066 stars:
[2024-12-31 06:08:19] 7300 PSF fits.
[2024-12-31 06:08:19] Minimum star size: 1 px
[2024-12-31 06:08:19] * Computing global linear transformation.
[2024-12-31 06:08:19] * Reference image: limiting to 2000 brightest stars.
[2024-12-31 06:08:19] * Target image: limiting to 2000 brightest stars.
[2024-12-31 06:08:19] Matching stars: done
[2024-12-31 06:08:19] 1846 putative star pair matches.
[2024-12-31 06:08:19] Performing RANSAC: done
[2024-12-31 06:08:19] 1838 star pair matches in 168 RANSAC iterations.
[2024-12-31 06:08:19] * Summary of model properties:
[2024-12-31 06:08:19] Inliers       : 0.996
[2024-12-31 06:08:19] Overlapping   : 1.000
[2024-12-31 06:08:19] Regularity    : 0.993
[2024-12-31 06:08:19] Quality       : 0.976
[2024-12-31 06:08:19] Root mean square error:
[2024-12-31 06:08:19] delta_RMS     :  0.159 px
[2024-12-31 06:08:19] RMS error deviation:
[2024-12-31 06:08:19] sigma_RMS     :  0.072 px
[2024-12-31 06:08:19] Peak errors:
[2024-12-31 06:08:19] delta_x_max   :  0.851 px
[2024-12-31 06:08:19] delta_y_max   :  0.718 px
[2024-12-31 06:08:19] translation   :      1.48 px
[2024-12-31 06:08:19] translation_x :     -1.27 px
[2024-12-31 06:08:19] translation_y :     -0.76 px
[2024-12-31 06:08:19] rotation      :     +0.00 deg
[2024-12-31 06:08:19] scale         : 1.0000
[2024-12-31 06:08:19] scale_x       : 1.0000
[2024-12-31 06:08:19] scale_y       : 1.0000
[2024-12-31 06:08:19] * Projective transformation matrix:
[2024-12-31 06:08:19]      +1.000027     -0.000068     -1.271693
[2024-12-31 06:08:19]      +0.000069     +0.999985     -0.764688
[2024-12-31 06:08:19]      +0.000000     -0.000000     +1.000000
[2024-12-31 06:08:19] * Computing distortion correction model.
[2024-12-31 06:08:19] * Star matching tolerance: 3 px
[2024-12-31 06:08:19] Iteration #1: 7034 star pair matches, residual = 0.0337 px
[2024-12-31 06:08:19] Building DDM thin plate splines:
[2024-12-31 06:08:19] X: eps = 0.24 px, n = 3709
[2024-12-31 06:08:19] Y: eps = 0.25 px, n = 3939
[2024-12-31 06:08:19] Iteration #2: 7035 star pair matches, residual = 0.0073 px
[2024-12-31 06:08:19] Building DDM thin plate splines:
[2024-12-31 06:08:19] X: eps = 0.23 px, n = 3751
[2024-12-31 06:08:19] Y: eps = 0.24 px, n = 3390
[2024-12-31 06:08:19] Iteration #3: 7035 star pair matches, residual = 0.0434 px
[2024-12-31 06:08:19] * Distortion correction: Converged after 3 iterations.
[2024-12-31 06:08:19] * Best distortion model with 7035 star pair matches and residual = 0.0073 px
[2024-12-31 06:08:19]
[2024-12-31 06:08:19] *** Error:
[2024-12-31 06:08:19] *** PCL Exception:
[2024-12-31 06:08:19]
[2024-12-31 06:08:19] * Applying error policy: Continue on error.
 
seems like a bug of some sort, can you winnow down the set of subs to a small one that fails and upload the images so juan can look at it?

of course could just be more rosetta2 issues, but worth looking at i hope, if it can be reproduced.
 
I'm currently rerunning the full WBPP process with a few settings tweaks.

It will be interesting to see if the same images fail again this run. Registration is running as I type this.
 
As suspected, these failures are random and not data dependent. There are no correlations between which 7 subframes failed yesterday and which 34 failed today.

By all evidence, this is random. Even an essentially perfect registration with 0.000 residual can fail. In my run today, there were 34 failures versus 7 yesterday. The changes to WBPP parameters all affect processes after registration. The specific subframes that failed can be seen in the attached Failures.pdf.

Screenshot 2025-01-01 at 15.38.27.png


Screenshot 2025-01-01 at 15.09.04.png
 

Attachments

so it seems there must still be something wrong with the emitted x64 code in 1.9.2. not promising. i haven’t tried 1.9.2 ‘in anger’ yet, i’ve only done some messing around with MGC. so i can’t say if i see the same thing on my M1 machines.
 
"PCL Exception" is a particularly unhelpful report, isn't it!

... is there really no more context it can give us?
 
These are clearly I/O errors. We cannot reproduce them on our working and testing machines on any supported platform, including up-to-date macOS 14 and 15. We'll keep trying, of course, but given the amount, intensity, and extension of the tests we are doing (especially on macOS), I doubt we'll be able to reproduce this problem with our machines and test data sets.

The "PCL Exception" error message does not provide additional information because our code expects to gather that information from the system, but there is no error code (e.g., through the errno global variable), and no additional data is being received in an unexpected way. This is an abnormal situation that cannot be reproduced on our machines. From my experience, this is, with a high probability, a conflict with some third-party application, which is out of our control.

To understand what happens, we need a data set where this can be reproduced, along with your exact WBPP configuration. If you can upload it, we'll be glad to repeat the process on our machines. If it cannot be reproduced, then it is machine-specific.
 
so it seems there must still be something wrong with the emitted x64 code in 1.9.2. not promising. i haven’t tried 1.9.2 ‘in anger’ yet, i’ve only done some messing around with MGC. so i can’t say if i see the same thing on my M1 machines.
This is pure speculation unless proven, and it does not help build confidence in our work and our software in a generalized way without justification. If there were invalid executable code in our binaries, we would have detected it in our intensive tests at this point, and it would be a severe bug in Apple's Clang C++ compiler. This is not what is happening here by any means.
 
Juan, I will try to isolate this further and if not productive can upload the requested 14 GB of data.

Can you tell me generally what the code is doing between these two ImageRegistration outputs? In particular, are operations performed on data in the swap directories?

Code:
[2025-01-02 05:21:07] * Best distortion model with 9898 star pair matches and residual = 0.0081 px
[2025-01-02 05:21:07] Generating registered image
 
This is pure speculation unless proven, and it does not help build confidence in our work and our software in a generalized way without justification. If there were invalid executable code in our binaries, we would have detected it in our intensive tests at this point, and it would be a severe bug in Apple's Clang C++ compiler. This is not what is happening here by any means.

oh really? like you released 1.9.0 and 1.9.1 with a similar problem and then proceeded to mark all the bugs as “cannot reproduce”, gaslighting your customers as usual?

you guys can ignore user reports all day and blame “machine specific” problems but in your heart i know you know there are real problems that for some reason, you are uninterested in solving.
 
oh really? like you released 1.9.0 and 1.9.1 with a similar problem and then proceeded to mark all the bugs as “cannot reproduce”, gaslighting your customers as usual?

you guys can ignore user reports all day and blame “machine specific” problems but in your heart i know you know there are real problems that for some reason, you are uninterested in solving.
It sure looks to me like PI isn't fully compatible with MacOS and/or Mac hardware. It works if you have just the right setup, but can go wrong in a lot of different ways if you don't have things exactly right. And the trend over the last few years is that stability has deteriorated in Macs, even as PI itself has steadily improved and has remained very robust in both Linux and Windows.

I'm guessing that you, like me, start reading a post about certain problems, and the first thing that you think is "ah... I'll bet they're running on a Mac". And then see that confirmed by the end of the post. (Ignoring, of course, the regular operator error issues associated with things like Gaia configuration.)
 
Juan, I will try to isolate this further and if not productive can upload the requested 14 GB of data.

Can you tell me generally what the code is doing between these two ImageRegistration outputs? In particular, are operations performed on data in the swap directories?

Code:
[2025-01-02 05:21:07] * Best distortion model with 9898 star pair matches and residual = 0.0081 px
[2025-01-02 05:21:07] Generating registered image

Nothing relevant happens between these two messages besides the code used to check for successful image registration and to write them on the console. After the "Generating..." message, the process interpolates the registered image and writes it on the output directory. Here is where the problem happens, and we cannot reproduce it. Have you tried selecting a different output directory, BTW?

Be absolutely sure that if we manage to find a bug in our code causing these errors, we'll do our best to solve it as diligently and efficiently as possible. Contrary to what is being said here, we take our work and our responsibilities seriously on all supported platforms, including macOS, of course.
 
Last edited:
Nothing relevant happens between these two messages besides the code used to check for successful image registration and to write them on the console. After the "Generating..." message, the process ...
When the error occurs the "Generating..." message does not appear in the log. Does buffered log content get lost when there is an exception?
 
It sure looks to me like PI isn't fully compatible with MacOS and/or Mac hardware. It works if you have just the right setup, but can go wrong in a lot of different ways if you don't have things exactly right. And the trend over the last few years is that stability has deteriorated in Macs, even as PI itself has steadily improved and has remained very robust in both Linux and Windows.

I'm guessing that you, like me, start reading a post about certain problems, and the first thing that you think is "ah... I'll bet they're running on a Mac". And then see that confirmed by the end of the post. (Ignoring, of course, the regular operator error issues associated with things like Gaia configuration.)

i really don't think there are any actual mac hardware or software issues. like most other people here with macs have reported, they are incredibly stable with other software. i never, ever, have to reboot my machines except to install security updates. application hangs are very unusual and when they do happen the OS is not affected by them. there's one vendor for the bios. there's one vendor for the CPU (well, two, but that's rapidly coming to an end). there's one vendor for the system. there's one vendor for 99% of drivers. these systems are incredibly stable.

more likely one big problem lies in the fact that by necessity, PI is built on top of a 3rd party GUI toolkit. any bugs there manifest in the application.(and as i've seen over the years, there have been many of them, some of which juan just had to work around)

a 2nd complication is of course the rosetta2 binary translation layer. for the most part this just works, but clearly it can have problems too. it doesn't implement AVX2/FMA, forcing juan to make an exception while building the macosx version of PI. but at this point every other piece of software i use is now ARM-native, except for one piece of security camera abandonware. PI really should be ARM native by now but as we know that hasn't happened either.
 
When the error occurs the "Generating..." message does not appear in the log. Does buffered log content get lost when there is an exception?
If the buffered log content is being generated in a running thread, it can be lost, especially if the exception is unexpected and not correctly handled, as seems to be the case here.
 
i really don't think there are any actual mac hardware or software issues. like most other people here with macs have reported, they are incredibly stable with other software. i never, ever, have to reboot my machines except to install security updates. application hangs are very unusual and when they do happen the OS is not affected by them. there's one vendor for the bios. there's one vendor for the CPU (well, two, but that's rapidly coming to an end). there's one vendor for the system. there's one vendor for 99% of drivers. these systems are incredibly stable.

more likely one big problem lies in the fact that by necessity, PI is built on top of a 3rd party GUI toolkit. any bugs there manifest in the application.(and as i've seen over the years, there have been many of them, some of which juan just had to work around)

a 2nd complication is of course the rosetta2 binary translation layer. for the most part this just works, but clearly it can have problems too. it doesn't implement AVX2/FMA, forcing juan to make an exception while building the macosx version of PI. but at this point every other piece of software i use is now ARM-native, except for one piece of security camera abandonware. PI really should be ARM native by now but as we know that hasn't happened either.
My point is that Mac users are the most impacted. I'd guess that the user base by numbers is Windows > Mac > Linux. But once obvious operator error is removed, Mac issues seem way out of proportion to other platforms. I agree that these issues are likely a consequence of third-party tools. (We already know of a number of PI coding changes that were required to work around Qt problems.) But that doesn't change my observation that PI compatibility with Macs is rather poor. To the end user, it doesn't really matter where in the code the problem lies.
 
My point is that Mac users are the most impacted. I'd guess that the user base by numbers is Windows > Mac > Linux. But once obvious operator error is removed, Mac issues seem way out of proportion to other platforms. I agree that these issues are likely a consequence of third-party tools. (We already know of a number of PI coding changes that were required to work around Qt problems.) But that doesn't change my observation that PI compatibility with Macs is rather poor. To the end user, it doesn't really matter where in the code the problem lies.

yes the 10,000 foot view is correct, i just don't think this is a fault in OSX. i think they are solvable problems in PI. also i think getting an ARM-native version would go a long way toward helping things, but there are definitely intel mac users that have reported problems as well.

rob
 
yes the 10,000 foot view is correct, i just don't think this is a fault in OSX. i think they are solvable problems in PI. also i think getting an ARM-native version would go a long way toward helping things, but there are definitely intel mac users that have reported problems as well.

rob
Oh, I'm not saying it's the fault of OSX. Not really the "fault" of anybody. Just a consequence of a combination of things that comes down mostly on Mac users.
 
John Pane has been kind enough to upload the entire data set to one of our working servers. We have been carrying out many tests today with this data set on a MacBook Pro 16-inch, 2021, Apple M1 Max, 64 GB, macOS 15.2 (24C101).

We have experienced zero issues. This data set can be preprocessed without problems with WBPP 2.8.2 on PixInsight 1.9.2 Lockhart:

Screenshot 2025-01-04 at 02.20.03.jpg


Note that the only image that cannot be registered is of very poor quality. Here is the relevant section of the log file:

Code:
* Loading target file: /Volumes/src/test/02/debayered/Light_BIN-1_9576x6388_EXPOSURE-120.00s_FILTER-NoFilter_CFA/vdb_15_Light_120_secs_047_c_d_R.xisf
[2025-01-03 19:52:41] Loading image: w=9576 h=6388 n=1 Gray Float32
[2025-01-03 19:52:41] 5 image properties
[2025-01-03 19:52:41] 95 FITS keyword(s) extracted
[2025-01-03 19:52:41] Noise reduction:
[2025-01-03 19:52:41] Structure map:
[2025-01-03 19:52:41] Local maxima map:
[2025-01-03 19:52:41] Detecting stars: done
[2025-01-03 19:52:41] Fitting 3434 stars:
[2025-01-03 19:52:41] 1105 PSF fits.
[2025-01-03 19:52:41] Minimum star size: 25 px
[2025-01-03 19:52:41] * Computing global linear transformation.
[2025-01-03 19:52:41] * Reference image: limiting to 1381 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 470 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #2
[2025-01-03 19:52:41] * Reference image: limiting to 250 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 250 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 86 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #3
[2025-01-03 19:52:41] * Reference image: limiting to 125 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 125 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 44 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #4
[2025-01-03 19:52:41] * Reference image: limiting to 60 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 60 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 25 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #5
[2025-01-03 19:52:41] * Reference image: limiting to 30 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 30 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 8 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #6
[2025-01-03 19:52:41] * Reference image: limiting to 15 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 15 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] ** 0 star pair matches found - need at least 8 matched stars.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #7
[2025-01-03 19:52:41] * Reference image: limiting to 8 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 8 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] ** 0 star pair matches found - need at least 8 matched stars.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #8
[2025-01-03 19:52:41] * Reference image: limiting to 500 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 500 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 163 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #9
[2025-01-03 19:52:41] * Reference image: limiting to 1000 brightest stars.
[2025-01-03 19:52:41] * Target image: limiting to 1000 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 338 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #10
[2025-01-03 19:52:41] * Reference image: limiting to 1381 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 470 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #11
[2025-01-03 19:52:41] * Reference image: limiting to 1381 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 470 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #12
[2025-01-03 19:52:41] * Reference image: limiting to 1381 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 470 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41] * Previous attempt failed - this is try #13
[2025-01-03 19:52:41] * Reference image: limiting to 1381 brightest stars.
[2025-01-03 19:52:41] Matching stars: done
[2025-01-03 19:52:41] 470 putative star pair matches.
[2025-01-03 19:52:41] Performing RANSAC: done
[2025-01-03 19:52:41] ** RANSAC: Unable to find a valid set of star pair matches.
[2025-01-03 19:52:41]
[2025-01-03 19:52:41] *** Error:
[2025-01-03 19:52:41] *** Error: Unable to find an initial linear transformation.
[2025-01-03 19:52:42]
[2025-01-03 19:52:42] * Applying error policy: Continue on error.

This is a comparison of the detected stars on the registration reference frame and the frame that has failed:

Desktop2.jpg


So the failure is perfectly justified. There is an obvious calibration problem in the red channel of this data set, besides the fact that the frame in question has poor tracking and focus. We have repeated this test twice on macOS 15.2 and also twice on one of our Linux servers limited to 80 processor cores, with the same result.

These problems are machine-specific and cannot be reproduced. I suggest trying with the latest build 1633 that we released a few hours ago. If there are issues with Qt 6.8.1 on macOS (as the recent reports seem to indicate), the problems experienced here could be a weird consequence.
 
Back
Top