I've been working on some CPU power demanded projects under PixInsight recently.
Processing of shots from 16 or even 31 MP cameras, mosaics built of 10 MP color shots, etc
My previous rig built around Intel Core 2 Quad Q9650 (4 cores, no hyperthreading, 3 GHz speed, 8Gb of memory under Win 7 64 bit) has been felt "slow" on such tasks and I've decided to upgrade to the CPU that will bring more massive multiprocessing without bankrupting me...
I've upgraded to the system built around
Intel I7 970 (6 cores, hyperthreading == 12 threads or "cores" available simultaneously, spin out to
3.9 GHz) couple of weeks ago.
12 Gb DDR3 RAM,
Win 7 64bit.
First, without any magic, I've measured PixInsight (PI hereafter) performance with new system - I've grabbed some "test" results from Q9650 in advance for comparison.
New i7 970 system was faster by... ~20...25% only

... 12 cores vs 4 cores, higher speed, more memory running at higher speed...

Advantage has been mainly due to higher CPU speed of I7 970...
Second, I've decided to make some tests and understand how PI's performance changes with more CPU cores available for processing.
For this I've measured time needed to perform 6 typical processing actions, using the same standard image(s) set(s) as input.
I've controlled number of cores available for PI via
Edit - Preferences - Parallel Processing and Threads - Maximum number of processors used parameter.
All other "Enable.." options at that preference sheet has been set to ON, priority was set to "Time Critical".
Time spent by each process has been measured for 1,2,3,4,5,6 then - 9 and 12 cores (threads) enabled.
Real load of CPU cores has been controled via Resource Monitor of Win 7. Typically, PI has been used as many cores as has been allowed...
Results could be found on the graph below.
It shows relations of time spent on related process with several CPU cores utilized to the time shown by 1 core.
Value below 1 means that PI has spent less time and make calculations faster using several cores, value above 1 means that it has spent
MORE time in multi core runs comparing to the baseline run made by single CPU core...
Red line represents "ideal scalability (1/N)" curve - ideal case when calculations and other operations could be distributed equally among all available core, ideal parallelizm. It's unachiveable on practice 'cause there are some read/write operations, etc that are not parallelized, - but cache, etc. has to help even here, right?

Results are very
dissapointing strange, to say the least...

Almost no advantage going from single core to two cores utilized...
There is a strange process (Image Integration) that spends more time with number of used cores growing...
There are some other strange processes (MaskedStretch, FFT, HDRW) that, after advantage due to increas of cores' number then suddenly start to loose this advantage and adding new cores leads to inrease of execution time...
At the same time, there are some "normal process(es)" (StarAlignment, LocalHistogramEqualization) that, as expected and as desired, consistently benefit from more and more cores added.
That's it.

I don't know if this is something easily fixable via settings and preferences (please, give me some tips) - or the real issue with multicore scalability.
Anybody could do his/her own expiriments measuring time spent by PI on favorite processes using different number of cores enabled in PI.
For me it's the real issue 'cause ivestment in the new rig hasn't brought expected gain in performance.
Speed is important 'cause faster executions allows to spend less time processing or let you try more combinations of process'es parameters, try more processing scenarious, etc.
Parallel processing optimization is important for (astronomical) image processing software.
I believe that skilled PI team will be able to solve this puzzle and will give at least 5...7 times acceleration for 12 threads CPU vs the single core one...
