Author Topic: PixInsight Benchmark (Read 64246 times)

chemstock1 · « **Reply #75 on:** 2014 June 01 15:05:43 »

Good test, maybe the reason why my coffee does not get cold on some processes.

I have three i7 computers I have tried this on. Some surprising results (at least to me)
The machine results, all within the Window world (sorry, my machines have to do other tasks as well)
S/N T6LKPZ1VLR7V42LSKZC640623447J74V i7 950, 8 processors ssd; 12Gb RAM ; : overall 3149 - oldest of the three.
S/N R5X52Z6E6KA6LQ4F563847423G0SIRW3 i7 3540M 4 processors ssd; 8gb RAM overall 2353
S/N B839X51JGV68W5STK1Q02UYM57NJDI9J i7 Q740 8 processors ssd; 16Gb RAM overall 1225 -near bottom of list

The last two are Dell Precision workstations that I use on the road. I was able to tweak the slow machine to an overall benchmark by adding another swap drive which raised the benchmark to 1589, better but wanting. I added another swap drive (2 Baracuda's @ RAID 0) to the i7 950 and found the results degraded

Looking into some of the detail (on the overall list as well) , not all SSD's are created equal and i7 performance is not just a function of clock speed.

I note by its absence that there is no GPU processing capability within PI, with the video cards now, adding a few Gb DDR5 would be a big help, maybe that is not easy to implement.

PI uses parallel processors within a chip, can it take advantage of distinct parallel chips? There are several mother boards out there that allow two or more processors - and not that much coin.

I process much larger files - I assume that this will put more emphasis on the swap drives.

Next time I build a PC, I will have to do a bit more research into the processor and the drive architecture.
CAH

slang · « **Reply #76 on:** 2014 June 01 16:35:07 »

Hi.

Quote from: Bob Andersson on 2014 June 01 02:07:54

Quote from: NGC7789 on 2014 May 17 04:05:12
... ram disk is based on the assumption that the OS and/or the application are not using the ram effectively. A ram disk helps swap performance but you are taking the ram away from the OS and the app.
The only reason I can think of for PI to perform better when available RAM is sectioned off as a RAM Disk is that PI is not using RAM effectively. Is this because one of the target OSes has restrictions on how much, or even how, an application can use available RAM with the result that all target OSes have to be limited by the common code base of PI?

Bob.

I guess with any operating system design, there are tradeoffs. CPU, memory, Disk, I/O speed all mean different things to different workloads. Whilst an application can understand the broad nature of what it requires (and PI now uses extended instructions to speed up math stuff), even then PI is going to have different requirements depending on number and resolution of images. I think it's unreasonable to expect a cross-platform application to be able to compensate for _every_ possible operating system variant, as the core functionality must come first.

That said, any hard configuration or partitioning of RAM as disk cache has the potential to limit future use and impede a not yet known use case. A soft-config (where amount of RAM can dynamically be allocated as disk cache, as a secondary priority to application requirements) would appear to help mitigate issues of speed and RAM for application use.

Just my opinion...

Cheers -

NGC7789 · « **Reply #77 on:** 2014 June 01 21:22:57 »

Quote from: slang on 2014 June 01 16:35:07

Hi.

Quote from: Bob Andersson on 2014 June 01 02:07:54
Quote from: NGC7789 on 2014 May 17 04:05:12
... ram disk is based on the assumption that the OS and/or the application are not using the ram effectively. A ram disk helps swap performance but you are taking the ram away from the OS and the app.
The only reason I can think of for PI to perform better when available RAM is sectioned off as a RAM Disk is that PI is not using RAM effectively. Is this because one of the target OSes has restrictions on how much, or even how, an application can use available RAM with the result that all target OSes have to be limited by the common code base of PI?

Bob.

I guess with any operating system design, there are tradeoffs. CPU, memory, Disk, I/O speed all mean different things to different workloads. Whilst an application can understand the broad nature of what it requires (and PI now uses extended instructions to speed up math stuff), even then PI is going to have different requirements depending on number and resolution of images. I think it's unreasonable to expect a cross-platform application to be able to compensate for _every_ possible operating system variant, as the core functionality must come first.

That said, any hard configuration or partitioning of RAM as disk cache has the potential to limit future use and impede a not yet known use case. A soft-config (where amount of RAM can dynamically be allocated as disk cache, as a secondary priority to application requirements) would appear to help mitigate issues of speed and RAM for application use.

Just my opinion...

Cheers -

What does in a RAM disk is that unless you have a lot of RAM and several additional fast swap drives you are severely limiting your total swap. As is demonstrated in my 4GB ram disk experiment 8GB total swap (ram disk plus single SSD) was sufficient to perform well on the benchmark but not in the real world.

Also an OS like Mavericks has different priorities. I think that to Apple it's important that the system still feels nice if someone decides to check their mail or look at Facebook so they reserve system resources to be sure this is possible. Maximum performance for one app is not the priority.

Note however what happens for me on Fedora 20. I finally got it running on my Hackintosh and just posted benchmark N16GYTM632IDJALP1O105LSU0WDXQKH7. Looking at the difference with 28RAU3770CO79MD1OW8O0BJ3TG11D914 (both CPU and swap) it looks like I'll have to rejigger my system to use Fedora for PI. When comparing these results note that the Mavericks configuration is overclocked but the Fedora is not (I couldn't get it stable) and Fedora still outperforms in CPU.

-Josh

fulatoro · « **Reply #78 on:** 2014 July 04 12:46:54 »

I would like to address a couple of things. I have seen several mentions of 2500MiB/s+ of swap performance using client SATA SSDs on linux. I would like to respectfully state that there are NO SATA/SAS SSDs that can hit that time of write performance. The SATA interfaces maxes out at 6Gb/s which is less than than 750MB/s assuming maximum link utilisation...So no dice, 2GB/s on a SATA SSD is not possible...(Unless you use a RAID array of 4-8 Drives, even then you will probably hit high 1.5GB/s)

Some PCIe SSDs when doing large block sequential IO can hit 2GB/s+ in certain conditions. By contrast, a Samsung EVO 840 1TB writes at a maximum bandwidth of around 500MB/s for 2MB blocks in a sequential write scenario. When you switch to small random writes (4KB), it drops to less than 50MB/s (see http://www.storagereview.com/samsung_840_evo_ssd_review )

A couple of observations:

1-In linux the default config of PixInsight writes the swap files to /tmp. /tmp is tmpfs filesystem which is essentially a RAM disk...Hence the very high write bandwidth. The windows tmp directory and Mac OS X /tmp on the other hand are actually on disk, hence the discrepancy in swap performance. I was very puzzled because I used a PCIe SSD that can write at 1GB/s and saw that my puny Intel SSD was outperforming it by a 2x margin. That is when I remembered /tmp is not going to hit the drive (not that simple but for all intents and purposes, it is just RAM...).

2-As has been stated before, Linux uses its RAM as a page cache where filesystem writes are buffered (not always, depends on how the file is opened...see O_DIRECT flag). This essentially means that the data can be flushed periodically. Somebody posted som sysctl parameters which were essentially delaying the flush.

One thing you can try to confirm the above, change the location of the swap files in Edit->global Prefrences to NOT point to /tmp but to some location on /home/username//...... You will notice a significant decrease in swap performance in Linux.

Conclusion is that RAM is still king for swapping...If you cannot afford a lot of RAM, usding an SSD is still a good bet as you get good sequential write performance. However, if you use the multi swap setup in PIxInsight, I suspect what you will see is a decrease in performance as you will get smaller random writes...

Below some Results

Swap on Sata SSD with PixInsight configured to swap on /tmp
http://pixinsight.com/benchmark/benchmark-report.php?sn=47VTKXF14E50FJ7B3P8T3R7V692WB3S4

Swap on Sata SSD with PixInsight configured to swap on /home
http://pixinsight.com/benchmark/benchmark-report.php?sn=YVE7U2X3MOOOQL74LH6E2T4YS3EVILYS

Swap on PCIe SSD - Not using /tmp swapping on actual drive
http://pixinsight.com/benchmark/benchmark-report.php?sn=WMGIY4SUY5TC66OPRLHZ7W754W3H9Y4K

Swap

Moussa

Juan Conejero · « **Reply #79 on:** 2014 July 05 11:29:09 »

Hi Moussa,

Quote

One thing you can try to confirm the above, change the location of the swap files in Edit->global Prefrences to NOT point to /tmp but to some location on /home/username//...... You will notice a significant decrease in swap performance in Linux.

Yes, but the Linux kernel is still much better at caching massive file I/O operations, especially sequential file access operations. It depends on the amount of RAM available. See the following benchmarks on a Linux workstation with 64 GB of RAM. In all cases the swap directories have been configured on normal filesystems (ext4), not under /tmp (approximate transfer rates between parentheses).

A single Samsung SSD EVO 840 1TB (1600 MiB/s):
http://pixinsight.com/benchmark/benchmark-report.php?sn=GR2PN8NW232241U6OPJ0W8YS8U0F0P7B

A single HGST Ultrastar 7K4000 4TB (rotational disk) (1600 MiB/s):
http://pixinsight.com/benchmark/benchmark-report.php?sn=ZMX5FLSFDJCU7DDEA260XMBNHTCRTM1M

Two Samsung SSD EVO 840 1TB configured for parallel swap I/O (2600 MiB/s):
http://pixinsight.com/benchmark/benchmark-report.php?sn=BX6Y7B6WI3PP131V5RMH96PRXPEJ46IS

Two Samsung SSD EVO 840 1TB + one HGST Ultrastar 7K4000 4TB for parallel swap I/O (3100 MiB/s):
http://pixinsight.com/benchmark/benchmark-report.php?sn=58HJNNWO9CEN3P0VY6QO0H634IS3XNP4

The first two benchmarks show that about the same transfer speeds can be achieved with SS and rotational disks, since most file I/O operations are being cached to RAM. The second and third benchmarks demonstrate the performance benefits of using two or more physical disk drives (with an adapter capable of the necessary bandwidth) configured for parallel swap I/O in PixInsight. Finally, this is the "normal" benchmark using tmpfs on the same machine:

A single Samsung SSD EVO 840 1TB with the default swap directory set to /tmp (2400 MiB/s)
http://pixinsight.com/benchmark/benchmark-report.php?sn=2IR809WZRVO3I37IG5GH4VM2VT9653T5

fulatoro · « **Reply #80 on:** 2014 July 05 22:46:21 »

Juan,

I agree totally with you. The Linux kernel uses the memory much better for caching file accesses. My main point is that for people trying to speed up their IO, they should not confused the high transfer speeds afforded by the Linux Page cache with the actual drive performance. Obviously, depending on how much RAM you have, you will see less or more benefit from the caching. For most people, an SSD will be cheaper than upgrading RAM from let's say 8GB to 64GB.

It would be interesting however to break down the transfer speed into READ and WRITE. The read will be much more indicative of the drive performance especially on first access that the writes. In addition, not sure if this is the place to ask, but I noticed that in the BPP process or any file intensive operation, it seems like fits files are read in a single thread. I t would be awesome if this could be done multi-threaded such that you can really take advantage of the SSD read speeds. I was kind of disappointed to see that during the BPP that my PCIe SSD (Capable of up to 3.2 GB/s reads) would register just about 50MB/s reads which is consistent with single threaded file IO.

Again, I am in total agreement with you. I use Linux as my work machine, and being employed by an SSD maker, we deal a lot with trying to squeeze out every MB/s out of our SSDs. I was just surprised at the numbers being quoted for drives that I know cannot generate such bandwidth. Being a big PixInsight fan, I was trying to evaluate the impact of drive performance on the actual application performance. Which is where I saw that the read bandwidth was under utilized.

Thanks for all the hard work though relatively new to PixInsight, I have been enjoying it immensely.

Moussa

geomcd1949 · « **Reply #81 on:** 2014 July 10 19:24:33 »

Quote from: slang on 2014 May 12 18:36:35

Hi.

I noticed this as well. I managed to solve it by unchecking the 'secure connections' option (as well as selecting the force image input download).

I suspect that the file may be available over http, but not https, or something like that ;-)

Cheers -

Could you please say exactly where to uncheck the 'secure connections' option and how to select the force image input download? Thank you very much.

~George

Juan Conejero · « **Reply #82 on:** 2014 July 11 10:33:31 »

Hi Moussa,

Quote

It would be interesting however to break down the transfer speed into READ and WRITE.

Indeed. My intention is to add this feature to the next version of the official PixInsight benchmark.

Quote

I noticed that in the BPP process or any file intensive operation, it seems like fits files are read in a single thread. I t would be awesome if this could be done multi-threaded such that you can really take advantage of the SSD read speeds.

I completely agree with you, this is an important pending task. Large applications like PixInsight have a relatively long development story. We are here since 2004, which is an eternity in PixInsight's relativistic time scale (you'll understand this soon as you become a PixInsight freak

). SSDs are here as affordable components since a couple of years, perhaps less if we consider SSDs of considerable storage capacity, necessary for effective storage of large imaging projects. With rotational disks, parallel I/O access on the same physical device is not feasible. However, this is the way to go with solid state devices. One problem is that many third-party format support libraries are not thread-safe (in fact, some of them can be considered as "legacy code"), which complicates the transition considerably. But it is clear that we need to overcome this limitation to implement parallel I/O at least for the FITS format.

Note that PixInsight already uses parallel I/O access for swap files since version 1.4 (some 7 years ago).

Juan Conejero · « **Reply #83 on:** 2014 July 11 10:41:29 »

Quote from: geomcd1949 on 2014 July 10 19:24:33

Quote from: slang on 2014 May 12 18:36:35
I noticed this as well. I managed to solve it by unchecking the 'secure connections' option (as well as selecting the force image input download).

I suspect that the file may be available over http, but not https, or something like that ;-)

Could you please say exactly where to uncheck the 'secure connections' option and how to select the force image input download? Thank you very much.

Both options are readily available on the Benchmark's main dialog window.

The Benchmark source image is available over both HTTP and HTTPS. Note that if you have to disable secure connections to download this image, then there's something probably wrong in your machine's network configuration. It should work without problems on all platforms.

geomcd1949 · « **Reply #84 on:** 2014 July 11 11:11:59 »

Quote from: Juan Conejero on 2014 July 11 10:41:29

Quote from: geomcd1949 on 2014 July 10 19:24:33
Quote from: slang on 2014 May 12 18:36:35
I noticed this as well. I managed to solve it by unchecking the 'secure connections' option (as well as selecting the force image input download).

I suspect that the file may be available over http, but not https, or something like that ;-)

Could you please say exactly where to uncheck the 'secure connections' option and how to select the force image input download? Thank you very much.

Both options are readily available on the Benchmark's main dialog window.

The Benchmark source image is available over both HTTP and HTTPS. Note that if you have to disable secure connections to download this image, then there's something probably wrong in your machine's network configuration. It should work without problems on all platforms.

Thanks very much, Juan. Unchecking "Secure connections" and checking "Force input image downloads" worked perfectly! If you have the time and inclination, could you suggest what could be the problem in the machine's network configuration? It works perfectly in every other operation.

~Geo.

RobF2 · « **Reply #85 on:** 2014 November 19 04:19:12 »

Adding an 8gb ramdisk really made a huge difference for my Win8.1 machine (32GB total RAM).
Took swap time down from 48secs to 5secs, after a suggestion from an aussie local. Adding an additional 3 threads sped things up a bit more too.
http://www.iceinspace.com.au/forum/showthread.php?t=128487

NGC7789 · « **Reply #86 on:** 2014 November 19 06:28:19 »

While ram disks are great for speed remember that you may be limiting total swap space. Your total swap space is limited to the size of your smallest swap source times the number of swaps sources. So if you have a 120GB SSD for swap and add an 8GB ram disk you will be much faster but limited to 16GB of swap (not to mention essentially wasting 112GB of SSD!). Depending on the size of your files this may never be a problem but if you run out of swap then I believe PI will have problems and/or become much slower.

If you have 32GB you may want to consider a larger ram disk. I also have 32GB total ram and use a 16GB ram disk along with an SSD. Things run very nicely and I have not seen any ill effects of running out of swap space.

-Josh

mcgillca · « **Reply #87 on:** 2014 November 25 05:20:35 »

Quote from: fulatoro on 2014 July 05 22:46:21

In addition, not sure if this is the place to ask, but I noticed that in the BPP process or any file intensive operation, it seems like fits files are read in a single thread. I t would be awesome if this could be done multi-threaded such that you can really take advantage of the SSD read speeds. I was kind of disappointed to see that during the BPP that my PCIe SSD (Capable of up to 3.2 GB/s reads) would register just about 50MB/s reads which is consistent with single threaded file IO.

Hi - I was considering buying a Mushkin Scorpion Deluxe to boost my BPP performance, which often takes a significant time to work through - but it sounds as though I may not get the speed boost I was expecting?

This PCIe has a RAID controller on board - does that overcome the single thread constraint since it should transparently read data at 2GB/s?

Colin

pja · « **Reply #88 on:** 2015 January 30 12:04:38 »

Excited by the RAM disk performance boost, I did a quick test but found out higher Benchmark has nothing to do with my real-work performance.

So I tested on two machines, A: Intel core i5 4 cores + 16G ram with Win7 and most recent PI, and B: Intel Xeon E5506 2.13GHz 2 cpu 4 cores each + 24G ram with Win7 and recent PI.

Machine A benchmark result= Total:1895, CPU: 4043, Swap: 594, Trans: 107.25 MiB/s

Machine B has 8G RAM disk (softperfect) and the benchmark results:

No Ram disk= Total:1672, CPU: 3975, Swap: 494, Trans: 89 MiB/s
8GB Ram disk, listed x1 as Swap space = Total:4081, CPU: 4020, Swap: 4368, Trans: 788 MiB/s
8GB Ram disk, listed x2 as Swap space = Total:4369, CPU: 4017, Swap: 6848, Trans: 1236 MiB/s
8GB Ram disk, listed x4 as Swap space = Total:4610, CPU: 4148, Swap: 8550, Trans: 1544 MiB/s
8GB Ram disk, listed x6 as Swap space = Total:4448, CPU: 3962, Swap: 9060, Trans: 1636 MiB/s

So it seems that Machine B will easily out-perform machine A with increasing margin as RAM disk being added in and also starting using the parallel swap space feature. However, when I did a real-world test with my image (a Ha frame shoot by QSI683wsg), just doing normal MLT, multi-iteration HDR, and StarMask, I found that Machine A ALWAYS finish the task faster (eg. 3s vs 5 s, 7s vs 9s, 10s vs 13, 5s vs 9s) no matter I use parallel swap space or not. So the real-world test shows the similar benchmark result when NO ram disk was added.

Since I was about to build another new machine so I was hoping to use this test as a guideline for CPU, memory, and SSD decision, but now I am really confused.

Does that mean at core i5 speed, the CPU performance is still dominating? And benchmark may not be really relevant in my configuration?

georg.viehoever · « **Reply #89 on:** 2015 January 31 02:01:26 »

Please see http://pixinsight.com/forum/index.php?topic=7083.msg48021#msg48021 . A benchmark measures the ability to run the benchmark, nothing else. How this transforms into real world performance depends on what you are doing. Many of PIs operations are dominated by the CPU performance. Other, such as ImageIntegration or Undo, benefit a lot from I/O...
Georg

This forum is closed since 5 March 2020

PixInsight Forum is now available at:

https://pixinsight.com/forum/

News:

Author Topic: PixInsight Benchmark (Read 64246 times)

chemstock1

Re: PixInsight Benchmark

slang

Re: PixInsight Benchmark

NGC7789

Re: PixInsight Benchmark

fulatoro

Re: PixInsight Benchmark

Juan Conejero

Re: PixInsight Benchmark

fulatoro

Re: PixInsight Benchmark

geomcd1949

Re: PixInsight Benchmark

Juan Conejero

Re: PixInsight Benchmark

Juan Conejero

Re: PixInsight Benchmark

geomcd1949

Re: PixInsight Benchmark

RobF2

Re: PixInsight Benchmark

NGC7789

Re: PixInsight Benchmark

mcgillca

Re: PixInsight Benchmark

pja

Re: PixInsight Benchmark

georg.viehoever

Re: PixInsight Benchmark