New Computer System: Ryzen 9750 and Benchmarks

Newbie question, I am also considering a new desktop computer optimized for working with Pixinsight. What attributes are most important? More memory? More cores? Where is the marginal benefit in processing speed? Does the graphics card matter?
I bought a Dell computer in 2017 which worked great with PI. I just got StarXterminator and it brings my computer to its knees. Many minutes to remove stars from a single frame. Am I being to hard on myself and expecting too much?
I'd welcome any feedback.
Thanks,
Jay Landis

StarNet and the XTerminator tools take minutes to run even on a fast CPU. But with a CUDA capable GPU you can cut it down to half a minute. It doesn't even have to be the latest and greatest; check out the capability link in my tutorial:
https://rikutalvio.blogspot.com/2023/02/pixinsight-cuda.html

You kind of need it all for PixInsight; a fast CPU will help you the most, but for preprocessing you also need lots of RAM and fast storage (nvme) if you have large datasets.
 
Last edited:
RT, thank you. I've spent the better part of three days trying various process steps, in different order, to get GPU acceleration working. I followed your workflow, and it worked perfectly. I was trying to use the newest version of each of the files. Not a good idea. When I used the versions from your blog, it worked. Success on my desktop. Not so much on my laptop. Strange, the GPU performance in Task Manager shows it is working, but for some reason, there is no change in speed for StarXTerminator at all. Are there other possible files, linkages, etc. to fix in Pixinsight?
Jay Landis
 
Newbie question, I am also considering a new desktop computer optimized for working with Pixinsight. What attributes are most important? More memory? More cores? Where is the marginal benefit in processing speed? Does the graphics card matter?
I bought a Dell computer in 2017 which worked great with PI. I just got StarXterminator and it brings my computer to its knees. Many minutes to remove stars from a single frame. Am I being to hard on myself and expecting too much?
I'd welcome any feedback.
Thanks,
Jay Landis
As RT mentioned, PI requires a bit of everything throughout the workflow. It really depends on what types or projects you like to do. Early image integration tasks tend to benefit from lots of cores and RAM. Post image integration, it varies, but high clock speeds are likely more valuable. For now, only StarXtermintor uses the GPU.

I will say that the PI benchmark isn’t challenging enough for these large core CPUs. It barely get’s all 64c/128t spun up on my EPYC before that part of the bench is over and then the single core clocks are lower, so scores tend to be suppressed relative to the value it brings to the early image integration tasks.
 
RT, thank you. I've spent the better part of three days trying various process steps, in different order, to get GPU acceleration working. I followed your workflow, and it worked perfectly. I was trying to use the newest version of each of the files. Not a good idea. When I used the versions from your blog, it worked. Success on my desktop. Not so much on my laptop. Strange, the GPU performance in Task Manager shows it is working, but for some reason, there is no change in speed for StarXTerminator at all. Are there other possible files, linkages, etc. to fix in Pixinsight?
Jay Landis
What laptop, CPU, and GPU? A slow GPU might not be any faster than a fast CPU. Also, did you plug it in? Most laptops have a fraction of their performance when not plugged in, especially current Intel systems.
 
Newbie question, I am also considering a new desktop computer optimized for working with Pixinsight. What attributes are most important? More memory? More cores? Where is the marginal benefit in processing speed? Does the graphics card matter?
I bought a Dell computer in 2017 which worked great with PI. I just got StarXterminator and it brings my computer to its knees. Many minutes to remove stars from a single frame. Am I being to hard on myself and expecting too much?
I'd welcome any feedback.
Thanks,
Jay Landis

for now the graphics card doesnt matter much - there are 3 or 4 processes (all 3rd-party - StarNet (v1/v2) and Russ Croman's ...Xterminator modules) that can use CUDA for acceleration. so i guess an NVidia card is what you want, but no reason to really stress out about how many cores, etc.

i think the amount of memory you need depends a lot on what camera you are using and if it is an OSC. these huge, new CMOS sensors really tax main memory during WBPP processing. personally i have 128GB.

more threads/cores is always good, but there's a limit of course. my last machine had 12Core/24 threads and my current one has 20 cores, single threaded. you can look at the pixinsight benchmark database to see what performance is like by thread count and see if it's worth the money to you for high thread count machines:


the newer intel processors have these efficiency/performance core architecture and it's not clear what really happens there. it looks like PI threads get dispatched to the efficiency cores by Windows, but not clear if that only happens when all the performance cores are subscribed, or the OS just treats them all as equals.

depending on what graphics card you have you might be able to set up GPU acceleration for StarX even on your current computer. it's worth looking in to. it is however a DIY thing, i don't think the RC astro tools support it out of the box. i don't know what's necessary since i'm using OSX and russ supports GPU/Neural Engine on OSX automatically. this thread on cloudynights does seem to have some useful info:


rob
 
For the record, I got a 3070 ti card for the system for my birthday to run a videogame (Microsoft Flight Simulator). It did not affect the benchmarks.
 
I just built a Ryzen 7950x server for PI. I run Ubuntu 22.04, not Windows. I bought 128 GB of fast RAM and a pair of fast NVMe4 SSDs. The machine cost just over $2000, and benchmarks a little over 50,000 total in PI 1.8.9-1.

I included a low-end GPU (3060) just for the Russ Croman XTerminator utilities, and sure enough those few utilities are 5x to 10x faster with CUDA acceleration. You could skip the GPU and build this machine for $1650, about the price of a minimally-specced MacBook Pro.

Normally I use Macs. I have an older Intel desktop Mac in my study with a nice big 5K monitor, and a MacBook Air that I carry around. I'm running the new machine as a server in the basement. No monitor, keyboard, mouse, or desk. I don't need to see it or hear it. It's protected by a UPS and my big image archives are served by a NAS right next to the new server. I have it on a TP-Link power plug so I don't even need to go down there to toggle power. I just remote into the server from the Macs using NoMachine, which seems to work just fine over gigabit in-home ethernet.

The best news isn't the benchmarks, it's the performance on WBPP! I just processed an image this morning with 827 subs collected on an ASI 2600 MM PRO. I chose the "no compromise" options in WBPP. All in, soup-to-nuts from 827 huge raw frames to finished masters was 95 minutes wall clock. This is at least 10x faster than my Intel desktop Mac.

WBPP.jpg
.
 
Last edited:
@airscottdenning This is interesting as I have a 7950x based machine too and the Autocrop phase is always the one that takes the longest time yet yours is way faster. Any thoughts on why this might be?
 
Just ran a small set and here is my report. Wonder if I have some setting not quite right???
 

Attachments

  • Screenshot 2023-07-03 135518.jpg
    Screenshot 2023-07-03 135518.jpg
    200 KB · Views: 54
Just ran a small set and here is my report. Wonder if I have some setting not quite right???
You're just processing a small number of frames, not even as many per filter as the number of threads in the CPU. So the parallel steps are extremely fast. I bet if you do 1000 frames the big time sinks are measurement, alignment, LNorm, and integration. That's just where the arithmetic lives.
 
You're just processing a small number of frames, not even as many per filter as the number of threads in the CPU. So the parallel steps are extremely fast. I bet if you do 1000 frames the big time sinks are measurement, alignment, LNorm, and integration. That's just where the arithmetic lives.
Sorry yes I get that but what I meant was your autocrop takes 26 seconds and mine took over two minutes. This is for three files in both cases so for that stage I'd assume the actual time taken would be similar?
 
Sorry yes I get that but what I meant was your autocrop takes 26 seconds and mine took over two minutes. This is for three files in both cases so for that stage I'd assume the actual time taken would be similar?
Ah I see. Autocrop needs to plate solve each integrated image. Do you have your Gaia XPSD star catalogs stored locally? I put mine (DR3 and DR3/SP) on a fast SSD in the Ryzen server, about 100 GB total. Maybe yours are on the network someplace and PixInsight has to fetch the data through a slow pipe?
 
Hi, nope they're on the same SSD as the data and the app. It's a Samsung 980 Pro as it happens. I'll double check the config but they're either there or nowhere :)
 
Autocrop needs to plate solve each integrated image.
I confess I don't know how Autocrop works (I don't see how plate solving would help). It could just try and estimate the cropping boundary on the basis of low SNR at the edges of the (final integrated) image, or it could process each registered sub accurately mapping the "zero" areas left after registation, and accumulating a crop that covers them all. One of these processes one image, the other processes all n images, but is likely to be much more robust. A third alternative would be for StarAlignment to generate a cropping region for each registered image and store it in the image metadata. Each of these options would have quite different processing statistics.
 
Last edited:
Ah I see. Autocrop needs to plate solve each integrated image. Do you have your Gaia XPSD star catalogs stored locally? I put mine (DR3 and DR3/SP) on a fast SSD in the Ryzen server, about 100 GB total. Maybe yours are on the network someplace and PixInsight has to fetch the data through a slow pipe?
I have a smaller subset of the files, probably 20GB worth so i wonder if that has any bearing on things?
 
I confess I don't know how Autocrop works (I don't see how plate solving would help). It could just try and estimate the cropping boundary on the basis of low SNR at the edges of the (final integrated) image, or it could process each registered sub accurately mapping the "zero" areas left after registation, and accumulating a crop that covers them all. One of these processes one image, the other processes all n images, but is likely to be much more robust. A third alternative would be for StarAlignment to generate a cropping region for each registered image and store it in the image metadata. Each of these options would have quite different processing statistics.
Of course I have no idea how it works but on the assumption it uses all or 3 files then in the latter case mine and airscottdennings autocrop time should be similar which they're not. If it's the former, and given we have "similar" machines then his autocrop should take longer which it doesn't.
I can't help feeling it's a config issue on my side so I guess I need to download all of the DR3 files (rather than my subset which I believed to be an acceptable choice) to find out
 
I doubt if the number of DR3 / DR3/SP files will make much difference to execution times. One obvious essential for benchmark comparisons is that the WBPP cache is purged before each run.
 
I confess I don't know how Autocrop works (I don't see how plate solving would help). It could just try and estimate the cropping boundary on the basis of low SNR at the edges of the (final integrated) image, or it could process each registered sub accurately mapping the "zero" areas left after registation, and accumulating a crop that covers them all. One of these processes one image, the other processes all n images, but is likely to be much more robust. A third alternative would be for StarAlignment to generate a cropping region for each registered image and store it in the image metadata. Each of these options would have quite different processing statistics.
Or it just uses the maximum +/- x and y deviations to the reference frame from alignment.
 
Or it just uses the maximum +/- x and y deviations to the reference frame from alignment.
Remember, alignment can rotate and distort each registered frame - it's not just an x, y shift. Also, I don't know if autocrop (a post-integration process) has access to the alignment data. StarAlignment is certainly the best place to determine an accurate crop for each aligned image, but how this can be passed to the later Autocrop process I don't know. Perhaps someone more familiar with the design of the WBPP workflow can tell us.
 
When in doubt, look at the code:
Code:
// ----------------------------------------------------------------------------
/**
 * Returns the auto crop region for an image.
 * The autocrop region is computed analyzing the low rejection map and the crop region represents the
 * the largest rectangle that includes pixels with low rejection values greather than 0.5.
 * The method assumes to find the low rejection map store in the image file, if this is not the case
 * it attempts to load the file with the postfix "_low_rejection". If none is found then no crop region
 * is computed.
 *
 * @param {*} filePath the file path of the image
 * @param {*} keepTheImageOpen if TRUE, the main image window remains open once the function returns
 * @returns the Rect defining the crop region of the image, undefined in case of errors.
 */
I recall now that this is why the rejection maps are automatically enabled if you select Autocrop. I can't see any configuration-based reason why this should run atypically slowly.
 
Back
Top