PixInsight Forum (historical)
PixInsight => Bug Reports => Topic started by: johnpane on 2019 January 06 07:46:45
-
PixInsight Core 01.08.06.1448 Ripley (x64)
DrizzleIntegration has crashed three times today while I was trying to integrate 182 subframes. The PixInsight UI disappears completely so I did not see what the status was the first two times. The third time, I ran a screen recorder and the last frame before the UI disappeared showed that DrizzleIntegration was 34% finished "integrating pixels" of frame 106 of 182. A screen shot is attached.
Subsequently, I integrated a smaller set of subframes including frame 106 and there was no crash, suggesting it is not a corruption of that one file.
I saved two crash logs and they did not show identical traces.
One reported:
Crashed Thread: 0 CrBrowserMain Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGABRT)
Exception Codes: EXC_I386_GPFLT
Exception Note: EXC_CORPSE_NOTIFY
The other:
Crashed Thread: 25 Dispatch queue: sync queue: vRefNum = 0
Exception Type: EXC_BAD_ACCESS (SIGABRT)
Exception Codes: KERN_INVALID_ADDRESS at 0x00007fa21f9b7388
Exception Note: EXC_CORPSE_NOTIFY
Full crash dumps attached.
And, here is some information about the computer and OS:
Model Name: MacBook Pro
Model Identifier: MacBookPro15,1
Processor Name: Intel Core i7
Processor Speed: 2.2 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 9 MB
Memory: 32 GB
System Software Overview:
System Version: macOS 10.14.2 (18C54)
Kernel Version: Darwin 18.2.0
More than 50% of the 32GB memory was reported to be free throughout the integration process, according to a tool I run in my menubar.
-
I forgot to include the instance source code in the prior post. It is attached here.
-
Hi John,
EXC_I386_GPFLT is a general protection fault, which is a very vague concept. Unfortunately, this does not help at all since this error may be caused by many things. The backtraces don't help either, since they are completely incoherent.
I can't reproduce anything similar to this with our test data sets. I need a data set where this is reproducible. Can you upload it to something like Dropbox for example? I realize this is a very large data set, but please realize that I need a way to reproduce the same issue in order to understand and fix it. Sorry for the inconvenience!
-
Ok, I am uploading the xdrz files and the image files they reference. Will that be sufficient?
That will take 3-4 hours and I'll send a link after the upload is complete.
-
John, thank you so much. I understand this is a real pain, but believe me we have no other way to analyze these problems.
Yes, I think the .drz and image files will be enough. The data you are going to upload must be able to reproduce the problem. If I can reproduce it, then you can be sure I'll work as hard as necessary to understand and fix it.
By the way, the other issue with white balancing of DSLR raw frames (along with other problems with the new RAW module) is now completely fixed. The fix will be included in the next version of PixInsight, which I'll release in a few days.
-
Great news about the white balance in the new RAW module!
You should be access the data to reproduce this problem at:
https://drive.google.com/open?id=1OvLhAoyb03JKN9N-6c-DWWxU1u9iD5F1 (https://drive.google.com/open?id=1OvLhAoyb03JKN9N-6c-DWWxU1u9iD5F1)
Please let me know once you have downloaded them so I can recover the storage space.
Thanks,
John
-
Hi John,
Thank you for uploading this data set. Bug confirmed: I have been able to reproduce the issue on macOS, so I'll work to fix it immediately. Tomorrow I'll try to reproduce it on Linux. You can remove the files from your Google Drive account when you want.
-
Thanks so much, Juan!
-
Thanks so much, Juan!
Thanks to you John for helping Juan. This is really useful for the rest of us, users of PI. :)
-
Hi John,
I've got good and bad news. The good news first: There is no bug. I have carried out this drizzle integration of 182 CFA frames without problems several times on the following machines:
- Red Hat Enterprise Linux 7.4 workstation (Xeon E5-2695 v2 @ 2.40GHz, 64 GB RAM)
- MacBook Pro (15-inch, 2018) with macOS 10.14.2 (Core i9-8950HK @ 2.90GHz, 32 GB RAM)
- iMac 27" 5K late 2015 with Mac OS X 10.11.6 (Core i7-6700K @ 4.00GHz, 32 GB RAM)
- iMac 27" 5K late 2015 with Windows 10 Pro running on a BootCamp partition (Core i7-6700K @ 4.00GHz, 32 GB RAM)
- iMac 27" late 2012 with macOS 10.14.1 (Core i7-3770 @ 3.40GHz, 32 GB RAM)
In all cases the same process has been performed without any problems with the data you have uploaded. However, the MacBook Pro has been problematic. When I ran the process yesterday it failed in a way very similar to what you have reported. The same happened this morning, when I noticed that the machine became quite hot during the process, while the fan was running at high speed. That made me suspect of some thermal issue. Indeed, overheating seems to be the culprit.
In my previous two failed tests the laptop was placed directly over a wooden table. Obviously, this makes heat dissipation difficult for a machine with such a thin case. As you can see in the attached screenshots, I placed a book under the machine in order to allow for some space for the air to circulate. Then the process executed without problems.
So the bad news now. In my opinion, it is rather obvious that the new MacBook Pro models, especially those with powerful processors, have thermal dissipation issues. These issues may become serious during long processes that execute on all processor cores intensively. Apparently, when excessive heat becomes problematic, some sort of instability arises that may cause very strange problems like the one we have reproduced here. Definitely, these little pretty machines are not the most appropriate to perform these heavy tasks.
If you wish, I can upload a project with the result of drizzle integration performed with your 182 CFA frames. Nice shot of the Hearth Nebula, by the way!
So the bottom line is: keep your machines cool! 8)
-
Wow, amazing, and very disappointing regarding the hardware fault. My primary motivation for getting such a powerful laptop was for image processing.
I will experiment with allowing more airflow and adjusting some of the multithreading settings.
In the meantime, yes, I would like a copy of the integrated image since I might not be able to accomplish it easily myself.
Thanks Juan!
-
P.S. I wonder if we should file some kind of collective bug report to Apple? If it is just one person it might be hard to convince them there is a problem but if we document multiple cases maybe Apple will fix. They have already made some adjustments to the thermal throttling settings (due to excess throttling when these machines were first released) and they may be able to tune these further. (Otherwise, they should offer a replacement program for affected owners.)
-
Hi Juan,
I tried elevating the computer with a book, like you showed in your photo. The integration proceeded further, to about frame 172 of 182, but then it crashed again. I wonder if you have ideas of how best to slightly tweak PI settings? Do you think going from 12 threads (on a 6-core processor) to some lesser number, or decreasing thread priority from "Highest" or something else might slightly reduce the thermal load without compromising performance too much?
Also, I am curious if you think this problem might be related to a small subset of PI processes? It is apparent that DrizzleIntegration parallelizes within subframe (where there may be less I/O or other callouts to the kernel) while many other processes parallelize across subframes. Do you think that can be related to where the problem is most likely to appear?
Finally, are you aware of anyone else reporting problems with DrizzleIntegration?
Thanks,
John
-
John, my old MacBook Pro died on me, so I will be replacing it with something similar to yours. I was hoping to put this off until they saw sense and offered a version without the touch bar, a real keyboard, SD-slot etc. I may opt for the 2.6GHz processor rather than the 2.9 given your experience.
Not much point in having a fast processor if you can't really use it.
The Apple hack of throttling the CPU is just that - a hack. Unfortunately, they seem to be firmly in the form over function camp where thin is the overriding design principle.
I don't think changing the priority is going to help. Unless you have something else running.
You don't seem to be able to define CPU thread groups specific to individual processes, they take the approach that the kernel knows best.
It *may* be worth opening a support ticket with Apple.
-
@ppeake, look at the specs at the bottom of my original post. My machine with this issue has a 2.2 GHz processor.
-
Hi John,
No, this problem has not been reported before. However, I can easily reproduce it on my MBP placed directly on a table and allowing the process to use all available processor cores.
I have just completed a new test. I have integrated the 182 images again without problems after limiting the maximum number of processor cores to 4 (Edit > Global Preferences > Parallel Processing and Threads > Maximum number of processors used = 4, then Apply Global or F6). With just 4 processor cores, the task has required 1 hour and 18 minutes to complete. With 12 cores it required 52 minutes. This is a 50% time increase. During the drizzle integration, the machine has not perceptibly heated up and the fan has operated audibly only at times. I have placed a small wooden wedge of one centimeter in the back to facilitate ventilation. I'll repeat the same test with 6 processor cores (which is the number of physical cores in my machine).
-
Thank you for running these tests! I will be very curious what you get with 6 cores. Most of the performance penalty should be recovered but maybe specifying 6 (instead of the default of 12) cores will reduce the load enough to let the processor run slightly cooler.
-
Juan,
I just ran DrizzleIntegration with number of processor cores set to 6. The integration completed successfully and I did not do anything to enhance cooling. The computer was just sitting directly on my desk.
I noticed that the CPU frequency was actually higher than it was yesterday when I tried running this with a book propping up the back side. Yesterday, the CPU frequency peaked at about 3.08 GHz for each subframe, then throttled back to about 2.80 GHz. Today, it peaked around 3.55 GHz and throttled back to slightly over 3.00 GHz. In both cases, the temperature peaked at about 100 C each subframe, but today it settled lower once throttling took effect.
The integration completed in 1:08:06, for a rate of 22.45 seconds per frame. I did not formally measure the rate when I had cores set to 12, but had informally estimated it to be about 20 seconds. Therefore, there was not a tremendous speed hit from decreasing cores from 12 to 6.
I am guessing that when the computation is cpu-bound, using hyperthreading actually increases the thermal load due to the necessity to swap back and forth between the two processes per core. On the other hand, if the computation has an I/O bottleneck this might not be the case. It is possible the performance hit from reducing the cores by half will be seen in other processes that may not be so cpu-bound.
John
-
100C! generally commercial silicon (non-milspec) would be designed for the slow-slow corner to still meet timing at a junction temperature of 85C. obviously there is always some headroom since the silicon is probably more typical than slow-slow, and intel may not follow the same methodology, but i'm not surprised to hear that some path failed @ 100C.
sounds like apple may still have some bugs in the thermal management stuff.
rob
-
Hi folks,
I gave up trying to rely on laptop machines for any serious kind of number-crunching - in my experience they have all failed (eventually) due to thermal issues.
Deskop machones - with planty pf internal air space and lots of forced ventilation have been what I have been relying on now for almost a decade. Laptops? A total waste of money, and energy, in my opinion - unless your needs are of the simplest variety.
As to raising a help ticket with Apple - I'd love to hear how you get om with that (you are probably 100 times more likely to get a result from Apple as you would from Microsoft - but 100 times 0 is stil 0 :police: )
-
i have an 18 core desktop the exhibits the same drizzle crash, but not on all drizzle integrations. I'm going to try to throttle to 6 cores and see what happens
Brian
-
UPDATE: reducing PI to 6 cores seems to correct it here too
Brian
-
Hi,
I have the same problem, I have crashes with LocalNormalization and Drizzle Integration, 150 frames
I am running PI on an i7 8700k with 12 logical processors.
It's a desktop PC with water cooling, max temp is 80°C at full load
One thing I have noticed when I launche LN the console shows "13 threads" instead of the standard 12
I will try limiting the threads to 10
I can share my dataset if needed
-
you might try using process lasso, which is an application that enables you to easily change CPU assignments, etc.
i should have said 6 logical processors, but i think it's really 3 cores, 6 threads
still works every time, although it's much slower :)
Brian
-
Limited to 10 cores, console shows 12 working threads, crashing
(https://i.postimg.cc/vZ4DDHwV/lnorm-10.jpg) (https://postimages.org/)
Limited to 8 cores, console shows 10 working threads, it works through the end
(https://i.postimg.cc/rmsqZfyt/8cores.jpg)
I believe some optimizations are needed , hopefully I didn't buy this PC to run it at reduced potential... (https://postimages.org/)
-
I understand believe me -
I built a computer just to do pixinsight, it's the #1 fastest on Pixinsight Benchmark PI Core 01.08.06.1457 (x64) (105)
Sometimes it requires a little patience :) at least we have a workaround for now
Brian
-
have you tried prime95 to see if that crashes your machine as well? PI is very taxing on a computer's resources.
how about memtest86+ ? maybe you have a bad dimm or something?
rob
-
have you tried prime95 to see if that crashes your machine as well? PI is very taxing on a computer's resources.
how about memtest86+ ? maybe you have a bad dimm or something?
rob
Yes, and OCCP as well, I have no crashes
-
This is the dataset if anyone wants to test it:
https://www.dropbox.com/s/ku2e0chmfux52hr/LNORM%20CRASH.rar?dl=0
-
This case is interesting because it generalizes the problem beyond MacOS, laptops, and high CPU temperatures. It begins to suggest there may be an issue with the i7 processor itself.
-
yes the strange thing is that i have very suboptimal cooling on my i7-8700k and cpu temps regularly reach 90C under load and yet i have never had a crash.
rob