cannot reproduce Image integration efficiency with huge stacks

I am having the same issue as post #39. As we speak, I have been running a ~3200 image set from a OSC Asi294mc pro. WBPP with all defaults except split RGB channels. The script runs acceptably fast until it gets to the first normalization (the R channel). From there, it takes a day or so. When it finally gets to integration and after three days, the console shows R channel integration at 21%, CPU usage at about 5% and disks at 0%. Ram usage very low too. At that point PI is frozen and have to kill it. The PC is not frozen though.

My PC specs.
Dell Precision Win 10 Pro
Dual Xeon E5 V2, 2x12C 48 threads total
128G ECC RAM
SSD disks all
Swap folders on separate NVMEs
PI benchmark ~ 15400 total

Edit: Removed text left behind by mistake.
 
Last edited:
You definitely need a more powerful machine to preprocess 3200 large OSC frames, especially if you want to include all features available in the WBPP script. Try with the following settings (Lights panel):

- Subframe Weighting: disabled.
- Local Normalization: disabled.
- Image Integration -> Pixel rejection algorithm = Winsorized sigma clipping.
- Image Registration -> Maximum stars = 500, distortion correction = disabled.

Other parameters with default values. This will apply a simplified preprocessing pipeline, similar to what is normally available in other applications. You'll get suboptimal but reasonable results.
 
You definitely need a more powerful machine to preprocess 3200 large OSC frames, especially if you want to include all features available in the WBPP script. Try with the following settings (Lights panel):

- Subframe Weighting: disabled.
- Local Normalization: disabled.
- Image Integration -> Pixel rejection algorithm = Winsorized sigma clipping.
- Image Registration -> Maximum stars = 500, distortion correction = disabled.

Other parameters with default values. This will apply a simplified preprocessing pipeline, similar to what is normally available in other applications. You'll get suboptimal but reasonable results.


Actually, I disabled local Normalization and Image Integration I WBPP with the rest the same on the same machine, and all went fast and smooth. As I am writing this, I am running NSG on the Green channel. The R channel already integrated.

For me, speed/time to integration is not the problem. Stability/freezing and not completing the task is. My machine has processed these large stacks many times before. I just wanted to give your new algorithms a chance.

I see plenty of complaints from paying customers around the same problem: not completing the task even if slowly. You and your team might want to consider those a litter deeper. You might have heard the phrase "when there is smoke, there is fire".

Blaming customers is never a good thing in business.

Thank you.
 
I am about to try the trial version of Pixinsight and have the same behavior. Stacking ~2000 images took over 60h with a CPU usage of just 5%. It seems that the problem occurs in combination of Windows and AMD Ryzen CPU`s. Is there already a solution?
 
It would be very helpful to write more about your computer system.
Then can narrow down the issue.

Cheers
Tom
 
I would like to add this thread here:
as it‘s also about processing large amounts of images and as Juan suggests to disable
Local Normalization, so that I assume LN is eating up a large amount of processing time.
There seems to be some smoke as Miguel said. ;)

I‘m still happy to help to improve this great piece of software if I can.

Best
Christof
 
It would be very helpful to write more about your computer system.
Then can narrow down the issue.

Cheers
Tom
AMD Ryzen 9 5900HX, 16GB Ram, NVIDIA GeForce RTC3050TTi an 1Tb SSD.

I would like to add this thread here:
as it‘s also about processing large amounts of images and as Juan suggests to disable
Local Normalization, so that I assume LN is eating up a large amount of processing time.
There seems to be some smoke as Miguel said. ;)

I‘m still happy to help to improve this great piece of software if I can.

Best
Christof
It`s not about the processing time itself. But somehow during stacking it only uses <5% of CPU power. Seems there is a bug in PI as there are a lot of threads where Ryzen users complain :)
 
But somehow during stacking it only uses <5% of CPU power. Seems there is a bug in PI as there are a lot of threads where Ryzen users complain :)
I observed the same with LN. It starts with higher CPU usage but after some time I see only a few percent utilization.

As I wrote in the other tread, it can become a challenge handling larger number of parallel threads. In my own company we learned that with data sets >128GB and Workstations with more than 64 cores and therefore more than 128 threads. There can be conflicts in accessing or allocation memory, we used special memory allocation strategies besides the standard operating system routines, …. We developed our software under WIN, MacOS and Linux but this was mainly an WIN issue. Mac did not play a role here as the number of cores hasen‘t been so high and Linux seems to handle larger core numbers better than WIN. Under Windows you have to program in a special way to even be able to access more than 64 cores.
I‘m sure the PI team knows about all this.

If complains arise mainly from AMD CPUs it’s maybe related to the in average larger number of cores they have. Would be interesting to know if it’s not a problem on Intel multi CPU systems with similar large core numbers?
 
If complains arise mainly from AMD CPUs it’s maybe related to the in average larger number of cores they have. Would be interesting to know if it’s not a problem on Intel multi CPU systems with similar large core numbers?
Could be but I don`t have another system with Intel running. I have found about 5 or 6 threads complaining since 2019 (I think there was an update from x.8 to x.11 or so) and all are exclusively running AMD Ryzen CPU`s on Windows. I am wondering why nobody from PI team is debugging this error.
 
Could be but I don`t have another system with Intel running. I have found about 5 or 6 threads complaining since 2019 (I think there was an update from x.8 to x.11 or so) and all are exclusively running AMD Ryzen CPU`s on Windows. I am wondering why nobody from PI team is debugging this error.
We don't know they are not. Nor that this should really be characterized as an "error". But that said... a lot of PI processes would be better served by GPU support than by added CPU cores, so maybe there's more focus there? And realistically, 8-16 core machines represent the overwhelming majority of users, so in prioritizing tasks, support for a large number of threads (which might be very difficult to implement) is perhaps lower than support for important new features? I know I'd much rather see enhancements to WBPP, better photometry tools, and a new background removal system (all of which have been stated to be in the works) than I would support for efficient use of many threads. Pleiades isn't Microsoft, with a development staff of thousands!
 
AMD and Intel processors have different architectures. The Windows API provides a multi-core, multi-thread API that "virtualises" (hides) the details of these hardware-specific interfaces. Assuming PixInsight uses the standard Windows API this could be a Windows / AMD issue, not a PI / AMD issue.
 
… than I would support for efficient use of many threads.
My company has been in the same situation. Features over principle architectural improvements has been the preferred choice however sooner or later architectural shortcomings hit you even harder. In our case we luckily reacted soon enough and this was a time when we also haven’t had thousands of developers but 10. In our case typical data set sizes started at 2GB when we still used 32 Bit hardware only. Then data sets became 16GB and finally 128GB in size and customers wanted to process several of them together. Things are similar here. Not too long ago we came from 1Mpixel and now we now see 60Mpixel or even 100Mpixel marketed as predecessors of the famous 16803 cameras and even more of a problem, we see shorter and shorter integration time and therefore larger numbers of frames. If 8-16 cores is standard today users will buy more cores with their next hardware update very soon as this is what we saw over many years.
Processing already takes longer than generating the data. Nobody wants to wait 48 h to process his data he generates in one night. So I would not wait too long looking into architecture.
 
My company has been in the same situation. Features over principle architectural improvements has been the preferred choice however sooner or later architectural shortcomings hit you even harder. In our case we luckily reacted soon enough and this was a time when we also haven’t had thousands of developers but 10. In our case typical data set sizes started at 2GB when we still used 32 Bit hardware only. Then data sets became 16GB and finally 128GB in size and customers wanted to process several of them together. Things are similar here. Not too long ago we came from 1Mpixel and now we now see 60Mpixel or even 100Mpixel marketed as predecessors of the famous 16803 cameras and even more of a problem, we see shorter and shorter integration time and therefore larger numbers of frames. If 8-16 cores is standard today users will buy more cores with their next hardware update very soon as this is what we saw over many years.
Processing already takes longer than generating the data. Nobody wants to wait 48 h to process his data he generates in one night. So I would not wait too long looking into architecture.
I agree that the shift will primarily involve changes in hardware more than anything. But whether we actually see wide adoption of core counts above 16 or so remains to be seen. These very high core machines are intended mainly for servers, and only provide limited improvements for the vast majority of desktop applications (which usually don't parallelize very well). I wouldn't be surprised to see the focus continue to be on GPU utilization for such tasks, meaning that large core count CPUs may never be very common. Time will tell.
 
I don`t care if they are using all cores in parallel or just a single one. But stacking 2000 images with just 5% of the cpu takes ages
 
I observed the same with LN. It starts with higher CPU usage but after some time I see only a few percent utilization.

As I wrote in the other tread, it can become a challenge handling larger number of parallel threads. In my own company we learned that with data sets >128GB and Workstations with more than 64 cores and therefore more than 128 threads. There can be conflicts in accessing or allocation memory, we used special memory allocation strategies besides the standard operating system routines, …. We developed our software under WIN, MacOS and Linux but this was mainly an WIN issue. Mac did not play a role here as the number of cores hasen‘t been so high and Linux seems to handle larger core numbers better than WIN. Under Windows you have to program in a special way to even be able to access more than 64 cores.
I‘m sure the PI team knows about all this.

If complains arise mainly from AMD CPUs it’s maybe related to the in average larger number of cores they have. Would be interesting to know if it’s not a problem on Intel multi CPU systems with similar large core numbers?

My system is a Dell T7610 with dual, 12-core Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz, 128Gb of ECC RAM, and Nvidia Geforce GTX 1660 SUPER running CUDA 11.8. All in PI and Win 10 up to date.
It has been almost a year since I posted here and the problem persists. This weekend (running as we speak) I am trying to integrate a stack of 1400 images. Everything goes well but in the middle of LN and after 4 hours, WBPP crashes PI with no error and nothing in the log file about it.

Earlier this week, I integrated two other projects with ~800 images on the same computer and it all worked just fine.

In both cases, RAM usage never exceeded 50Gb out of 128Gb installed.
 
I would like to add this thread here:
as it‘s also about processing large amounts of images and as Juan suggests to disable
Local Normalization, so that I assume LN is eating up a large amount of processing time.
There seems to be some smoke as Miguel said. ;)

I‘m still happy to help to improve this great piece of software if I can.

Best
Christof
I now have 500GB of ECC RAM installed, and the problem persists with Image Integration even on the latest version. LN seems to have been fixed.
 
Don't have anything to add other than I'm struggling with this too! I've noticed it on both AMD and Intel large core count systems on large number of subs.
 
Back
Top