Author Topic: Reproducible crash in DrizzleIntegration  (Read 4020 times)

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Reproducible crash in DrizzleIntegration
« on: 2019 January 06 07:46:45 »
PixInsight Core 01.08.06.1448 Ripley (x64)

DrizzleIntegration has crashed three times today while I was trying to integrate 182 subframes. The PixInsight UI disappears completely so I did not see what the status was the first two times. The third time, I ran a screen recorder and the last frame before the UI disappeared showed that DrizzleIntegration was 34% finished "integrating pixels" of frame 106 of 182. A screen shot is attached.

Subsequently, I integrated a smaller set of subframes including frame 106 and there was no crash, suggesting it is not a corruption of that one file.

I saved two crash logs and they did not show identical traces.

One reported:
    Crashed Thread:        0  CrBrowserMain  Dispatch queue: com.apple.main-thread

    Exception Type:        EXC_BAD_ACCESS (SIGABRT)
    Exception Codes:       EXC_I386_GPFLT
    Exception Note:        EXC_CORPSE_NOTIFY

The other:
    Crashed Thread:        25  Dispatch queue: sync queue: vRefNum = 0

    Exception Type:        EXC_BAD_ACCESS (SIGABRT)
    Exception Codes:       KERN_INVALID_ADDRESS at 0x00007fa21f9b7388
    Exception Note:        EXC_CORPSE_NOTIFY
 
Full crash dumps attached.

And, here is some information about the computer and OS:

Model Name:   MacBook Pro
  Model Identifier:   MacBookPro15,1
  Processor Name:   Intel Core i7
  Processor Speed:   2.2 GHz
  Number of Processors:   1
  Total Number of Cores:   6
  L2 Cache (per Core):   256 KB
  L3 Cache:   9 MB
  Memory:   32 GB
System Software Overview:
  System Version:   macOS 10.14.2 (18C54)
  Kernel Version:   Darwin 18.2.0

More than 50% of the 32GB memory was reported to be free throughout the integration process, according to a tool I run in my menubar.


Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #1 on: 2019 January 06 08:04:00 »
I forgot to include the instance source code in the prior post. It is attached here.

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Reproducible crash in DrizzleIntegration
« Reply #2 on: 2019 January 06 10:19:21 »
Hi John,

EXC_I386_GPFLT is a general protection fault, which is a very vague concept. Unfortunately, this does not help at all since this error may be caused by many things. The backtraces don't help either, since they are completely incoherent.

I can't reproduce anything similar to this with our test data sets. I need a data set where this is reproducible. Can you upload it to something like Dropbox for example? I realize this is a very large data set, but please realize that I need a way to reproduce the same issue in order to understand and fix it. Sorry for the inconvenience!
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #3 on: 2019 January 06 10:32:53 »
Ok, I am uploading the xdrz files and the image files they reference. Will that be sufficient?

That will take 3-4 hours and I'll send a link after the upload is complete.

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Reproducible crash in DrizzleIntegration
« Reply #4 on: 2019 January 06 10:54:13 »
John, thank you so much. I understand this is a real pain, but believe me we have no other way to analyze these problems.

Yes, I think the .drz and image files will be enough. The data you are going to upload must be able to reproduce the problem. If I can reproduce it, then you can be sure I'll work as hard as necessary to understand and fix it.

By the way, the other issue with white balancing of DSLR raw frames (along with other problems with the new RAW module) is now completely fixed. The fix will be included in the next version of PixInsight, which I'll release in a few days.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #5 on: 2019 January 06 14:35:14 »
Great news about the white balance in the new RAW module!

You should be access the data to reproduce this problem at:
    https://drive.google.com/open?id=1OvLhAoyb03JKN9N-6c-DWWxU1u9iD5F1

Please let me know once you have downloaded them so I can recover the storage space.

Thanks,
John

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Reproducible crash in DrizzleIntegration
« Reply #6 on: 2019 January 06 16:05:50 »
Hi John,

Thank you for uploading this data set. Bug confirmed: I have been able to reproduce the issue on macOS, so I'll work to fix it immediately. Tomorrow I'll try to reproduce it on Linux. You can remove the files from your Google Drive account when you want.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #7 on: 2019 January 06 16:31:07 »
Thanks so much, Juan!

Offline Andres.Pozo

  • PTeam Member
  • PixInsight Padawan
  • ****
  • Posts: 927
Re: Reproducible crash in DrizzleIntegration
« Reply #8 on: 2019 January 07 02:37:35 »
Thanks so much, Juan!
Thanks to you John for helping Juan. This is really useful for the rest of us, users of PI.  :)

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Reproducible crash in DrizzleIntegration
« Reply #9 on: 2019 January 07 08:52:37 »
Hi John,

I've got good and bad news. The good news first: There is no bug. I have carried out this drizzle integration of 182 CFA frames without problems several times on the following machines:

- Red Hat Enterprise Linux 7.4 workstation (Xeon E5-2695 v2 @ 2.40GHz, 64 GB RAM)

- MacBook Pro (15-inch, 2018) with macOS 10.14.2 (Core i9-8950HK @ 2.90GHz, 32 GB RAM)

- iMac 27" 5K late 2015 with Mac OS X 10.11.6 (Core i7-6700K @ 4.00GHz, 32 GB RAM)

- iMac 27" 5K late 2015 with Windows 10 Pro running on a BootCamp partition (Core i7-6700K @ 4.00GHz, 32 GB RAM)

- iMac 27" late 2012 with macOS 10.14.1 (Core i7-3770 @ 3.40GHz, 32 GB RAM)

In all cases the same process has been performed without any problems with the data you have uploaded. However, the MacBook Pro has been problematic. When I ran the process yesterday it failed in a way very similar to what you have reported. The same happened this morning, when I noticed that the machine became quite hot during the process, while the fan was running at high speed. That made me suspect of some thermal issue. Indeed, overheating seems to be the culprit.

In my previous two failed tests the laptop was placed directly over a wooden table. Obviously, this makes heat dissipation difficult for a machine with such a thin case. As you can see in the attached screenshots, I placed a book under the machine in order to allow for some space for the air to circulate. Then the process executed without problems.

So the bad news now. In my opinion, it is rather obvious that the new MacBook Pro models, especially those with powerful processors, have thermal dissipation issues. These issues may become serious during long processes that execute on all processor cores intensively. Apparently, when excessive heat becomes problematic, some sort of instability arises that may cause very strange problems like the one we have reproduced here. Definitely, these little pretty machines are not the most appropriate to perform these heavy tasks.

If you wish, I can upload a project with the result of drizzle integration performed with your 182 CFA frames. Nice shot of the Hearth Nebula, by the way!

So the bottom line is: keep your machines cool!  8)

Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #10 on: 2019 January 07 09:29:17 »
Wow, amazing, and very disappointing regarding the hardware fault. My primary motivation for getting such a powerful laptop was for image processing.

I will experiment with allowing more airflow and adjusting some of the multithreading settings.

In the meantime, yes, I would like a copy of the integrated image since I might not be able to accomplish it easily myself.

Thanks Juan!

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #11 on: 2019 January 07 09:36:56 »
P.S. I wonder if we should file some kind of collective bug report to Apple? If it is just one person it might be hard to convince them there is a problem but if we document multiple cases maybe Apple will fix. They have already made some adjustments to the thermal throttling settings (due to excess throttling when these machines were first released) and they may be able to tune these further. (Otherwise, they should offer a replacement program for affected owners.)

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #12 on: 2019 January 07 18:58:48 »
Hi Juan,

I tried elevating the computer with a book, like you showed in your photo. The integration proceeded further, to about frame 172 of 182, but then it crashed again. I wonder if you have ideas of how best to slightly tweak PI settings? Do you think going from 12 threads (on a 6-core processor) to some lesser number, or decreasing thread priority from "Highest" or something else might slightly reduce the thermal load without compromising performance too much?

Also, I am curious if you think this problem might be related to a small subset of PI processes? It is apparent that DrizzleIntegration parallelizes within subframe (where there may be less I/O or other callouts to the kernel) while many other processes parallelize across subframes. Do you think that can be related to where the problem is most likely to appear?

Finally, are you aware of anyone else reporting problems with DrizzleIntegration?

Thanks,
John

Offline ppeake

  • Newcomer
  • Posts: 30
Re: Reproducible crash in DrizzleIntegration
« Reply #13 on: 2019 January 08 11:38:16 »
John, my old MacBook Pro died on me, so I will be replacing it with something similar to yours. I was hoping to put this off until they saw sense and offered a version without the touch bar, a real keyboard, SD-slot etc. I may opt for the 2.6GHz processor rather than the 2.9 given your experience.

Not much point in having a fast processor if you can't really use it.
The Apple hack of throttling the CPU is just that - a hack. Unfortunately, they seem to be firmly in the form over function camp where thin is the overriding design principle.

I don't think changing the priority is going to help. Unless you have something else running.
You don't seem to be able to define CPU thread groups specific to individual processes, they take the approach that the kernel knows best.

It *may* be worth opening a support ticket with Apple.

Offline johnpane

  • PixInsight Enthusiast
  • **
  • Posts: 93
Re: Reproducible crash in DrizzleIntegration
« Reply #14 on: 2019 January 08 12:08:49 »
@ppeake, look at the specs at the bottom of my original post. My machine with this issue has a 2.2 GHz processor.