Author Topic: PixInsight Benchmark  (Read 64244 times)

Offline slang

  • Member
  • *
  • Posts: 60
Re: PixInsight Benchmark
« Reply #45 on: 2014 May 18 05:02:17 »
Hey.

I see there are several configuration options with varying cost/benefit. I'd love to hear the bigwigs weigh in on the best way to go.

1. Give all ram to OS and PI. Swap to SSD.

2. Use some ram for swap ram disk along with SSD.

3. If we are upgrading ram should we give it all to OS and PI or devote some to ram disk

4. If we are upgrading should we be adding second SSD or ram for ram disk (or OS/PI).

I'm no big-wig at all, but see my other post. From my point of view, RAM is king. Get some more, then some more, and top it off with some more.
A RAM disk should help a lot, although I don't know how PI will manage filling up a ram disk, and how it will spool over to physical (slower) disks. Configuring a system so that files stay in RAM (where there is no ram disk) seems a good approach, but a little risky.
A good SSD (I didn't document the difference) has a hugely better read/write performance compared to physical, well worth it, but good ones are expensive, and real cheap ones are just not quick.
Filesystem tuning can make a material difference, turning off features (or using a filesystem type that is more efficient) can help a fair bit

If I had a recommendation, it would be for as much RAM as you can get on a mobo (at least 12GByte), and use ram disks/kernel tuning. Next would be filesystem tuning (it's free after all!), next SSD's (they're not free, and generally more than mobo+RAM if you get a few good ones).

One thing that I have not yet tested (and now won't bother) is to get some real quick USB3.0 flash drives. Some good ones of those (not _that_ cheap) can have read/write speeds far superior than physical disks. I mean, aliexpress has some 16GByte USB 3.0 flash drives for US$18 each.... 3 or 4 of them could help. Anyone else want to try that option?

Cheers -
--
Mounts: Orion Atlas 10 eq-g, Explore Scientific G11-PMC8
Scopes: GSO RC8, Astrophysics CCDT67, ES FCD100-80, TSFLAT2
Guiding: ST80/QHY OAG/QHY5L-II-M
Cameras: Canon EOS 450D (IR Mod), QHY8L, QHY163m/QHYFW2-US/Astronomik LRGBHaSiiOii

Offline GaryP

  • Member
  • *
  • Posts: 72
    • Astroimaging Log
Re: PixInsight Benchmark
« Reply #46 on: 2014 May 18 12:47:37 »
Hi Gary, not sure if I understand well, but have added a note in the text of the video.

Remember that you can choose only one folder for each physical unit.

Saludos. Alejandro.

Alejandro, thank you. The text was below the bottom of my screen and I missed it entirely.
PI 01.08.01.1092 on 4GB iMac w. Mavericks, Canon T1i DSLR, William Optics 110mm APO FL770, WO focal reducer (at 73.5 mm), CGEM

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #47 on: 2014 May 18 13:59:30 »
If I had a recommendation, it would be for as much RAM as you can get on a mobo (at least 12GByte), and use ram disks/kernel tuning.

On OS X I don't think kernel tuning is an option. Or if it is, I don't know how and would love to hear about it if someone knows. I googled a bit a couldn't find anything. If it were possible I would imagine the info would be easy to find.

I will have to do some "real world" testing of my own to see how far my little 4gb ram disk goes. It's still not clear to me what the better investment would be. Add 16GB as a ram disk ($140) or a second SSD ($90).

-Josh

Offline GaryP

  • Member
  • *
  • Posts: 72
    • Astroimaging Log
Re: PixInsight Benchmark
« Reply #48 on: 2014 May 18 15:06:30 »
Would that second SSD be internal or external, and if external, USB II or III, Firewire, or Thunderbolt? Not that I could answer your question in any case, but it must make a large difference.

I will have to do some "real world" testing of my own to see how far my little 4gb ram disk goes. It's still not clear to me what the better investment would be. Add 16GB as a ram disk ($140) or a second SSD ($90).

-Josh
PI 01.08.01.1092 on 4GB iMac w. Mavericks, Canon T1i DSLR, William Optics 110mm APO FL770, WO focal reducer (at 73.5 mm), CGEM

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #49 on: 2014 May 18 16:53:48 »
You are right of course. In my case the second SSD would be SATA III like the first. While drive speed is important to overall swap speed I think the point here is that a ram disk will be MUCH faster than even the fastest drive.

The question is the trade off of that ram not being available for other uses and the limited size of the ram disk relative to physical swap drives.

Offline GaryP

  • Member
  • *
  • Posts: 72
    • Astroimaging Log
Re: PixInsight Benchmark
« Reply #50 on: 2014 May 18 17:27:46 »
You are right of course. In my case the second SSD would be SATA III like the first. While drive speed is important to overall swap speed I think the point here is that a ram disk will be MUCH faster than even the fastest drive.

The question is the trade off of that ram not being available for other uses and the limited size of the ram disk relative to physical swap drives.

That helps to put things in perspective. Thanks.
PI 01.08.01.1092 on 4GB iMac w. Mavericks, Canon T1i DSLR, William Optics 110mm APO FL770, WO focal reducer (at 73.5 mm), CGEM

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #51 on: 2014 May 19 09:57:44 »
Here are my preliminary test results. I say preliminary because, I you will see, I got some inconsistent results.

I ran the test twice: once with a 4GB ram disk and once without. Each test was after a fresh reboot. I ran a large project through the BPP script all the way through integration. This included 100 bias, 100 dark, 60 flats and 80 lights. Without the ram disk this ran in 61 minutes. With the ram disk this ran in 49 minutes. This was good news for me since this improvement (49:61 = 20% improvement) corresponds nicely with the improvement seen in the benchmark (with ram disk overall benchmark 6983: without 5656 = 19% ). This would seem to indicate that even without any ram upgrade from my current 16GB I would benefit from keeping the 4GB ram disk.

Also of note is that Activity Monitor had the disk cache increasing to ~10GB without the ram disk and 1.5GB of page swapping. The swapping would indicate that more ram devoted to the OS/PI would be of benefit.

With the ram disk, the disk cache reported as ~12GB and no page swapping. I don't really understand these numbers as with a 4GB ram disk I only have 12GB available and PI itself is using 3.5GB. This doesn't mention the OS and other consumers (like the integrated graphics using 1GB). How can the cache get that big without swapping? Why would I swap out 1.5GB with 16GB available but not swap with only 12GB?

I couldn't continue use this large test effectively because of the long runtime so I reduced to 10 file each of bias, dark, flat and light. Since I didn't really care about the quality of the result I thought this would still be a good test. But here is where the surprises began. This smaller test ran in 769 second with the ram disk and 607 without. Now the ram disk was performing 27% WORSE!?! Why would this be?

I will have to do more testing to see if this smaller test was an aberration (I sure hope I didn't flip my results!).

My first thought is that for the benchmark to have value we must be able to infer that is at least roughly corresponds to the real world. If adding a ram disk improves the benchmark we should be able to infer that this is a better configuration in the real world. If we cannot then what value is the benchmark?

Second, even if I am able to confirm that performance improved with the 4GB ram disk it doesn't answer my real questions. Would the situation improve even more with a larger ram disk (that is, should I invest in 16GB more ram)? Would I see similar (or better) improvement with a second SSD (which is slightly cheaper than the ram)? What about both a second SSD and a 4GB ram disk? Or even a second SSD and 16GB ram disk?

Lastly, maybe chasing OS X performance is not time/money well spent. Should I just be adding a dual boot Linux and run PI that way?

It seems that if I really want to know I’ll have to just do it and find out.

What do others think?

-Josh

Offline GaryP

  • Member
  • *
  • Posts: 72
    • Astroimaging Log
Re: PixInsight Benchmark
« Reply #52 on: 2014 May 19 10:21:51 »
I can't answer any of your questions, but it makes interesting reading and provides a description of what performance can be obtained with 16 GB with or without a RAM disk and varying sizes of batches. It is useful in anticipating how long a process might take.
PI 01.08.01.1092 on 4GB iMac w. Mavericks, Canon T1i DSLR, William Optics 110mm APO FL770, WO focal reducer (at 73.5 mm), CGEM

Offline Andres.Pozo

  • PTeam Member
  • PixInsight Padawan
  • ****
  • Posts: 927
Re: PixInsight Benchmark
« Reply #53 on: 2014 May 19 10:40:36 »
I think that although swap performance is very important for PI, its weight in the final score of the benchmark is too big. In my computer (which has a SSD) the benchmark returns similar times for the CPU (73 sec) and swap(63 sec). I doubt that in a normal use of PI it spends the same time writing to the swap area as in the CPU.

I would suggest to modify the benchmark for reducing the weight of the swap part so it would be more representative of day-to-day operations.

Also, the benchmark doesn't measure the speed of the data disk. Nobody has the images in a RAM disk, and few have the money for using a SSD for storing multi-gigabytes of images  ;) ;).

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #54 on: 2014 May 19 11:14:09 »
I think that although swap performance is very important for PI, its weight in the final score of the benchmark is too big. In my computer (which has a SSD) the benchmark returns similar times for the CPU (73 sec) and swap(63 sec). I doubt that in a normal use of PI it spends the same time writing to the swap area as in the CPU.

I would suggest to modify the benchmark for reducing the weight of the swap part so it would be more representative of day-to-day operations.

Also, the benchmark doesn't measure the speed of the data disk. Nobody has the images in a RAM disk, and few have the money for using a SSD for storing multi-gigabytes of images  ;) ;).

It's certainly true that the weight of the swap benchmark is driving my interest. Because it has such an impact on the benchmark I am looking to optimize it. This is based on the assumption that optimizing the benchmark will optimize real world performance. That is of course a big assumption. But if the benchmark is not a good simulation of real world performance then what is? And if something else is a better simulation of real world performance why wouldn't that be the benchmark?

I have used BPP as my "real world" test mostly because it is easy for me to implement and repeat. I have no idea how BPP compares with the benchmark or if it's a good test. What's interesting about pre-processing (especially in batch) is that while it is time consuming it is not the critical performance point (in my opinion). I'm happy to set pre-processing going and then do something else for an hour. But when I am trying different parameter iterations during post processing that is when I really care about performance because it's interactive. Perhaps I should be testing my ram disk using TGVDenoise! After all it is TGVDenoise performance (or lack thereof) that caused me to dump my iMac and build a Hackintosh in the first place.

Offline pfile

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 4729
Re: PixInsight Benchmark
« Reply #55 on: 2014 May 19 18:27:57 »
i'd guess that TGVDenoise is almost 100% cpu-bound rather than io bound…

rob

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #56 on: 2014 May 19 21:40:05 »
i'd guess that TGVDenoise is almost 100% cpu-bound rather than io bound…

rob

And you would be correct. Just completed some test and almost no difference but what difference there was favored having the ram disk (2% faster).

Still trying to understand the real world benefits of a ram disk vs more OS/app ram vs dual SSD swap.

Although I am now working on making my hackintosh dual boot with Fedora to see if Linux makes to issue moot.

-Josh

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PixInsight Benchmark
« Reply #57 on: 2014 May 20 02:18:45 »
I see there are several configuration options with varying cost/benefit. I'd love to hear the bigwigs weigh in on the best way to go.

1. Give all ram to OS and PI. Swap to SSD.

2. Use some ram for swap ram disk along with SSD.

3. If we are upgrading ram should we give it all to OS and PI or devote some to ram disk

4. If we are upgrading should we be adding second SSD or ram for ram disk (or OS/PI).

This depends on your hardware and processing requirements. Some worth noting points:

- When you configure several disks for parallel swap storage in PixInsight, each swap file is spread on all disks in equal chunks (to be more precise, swap files are divided into equal chunks larger than 4 KiB). Let Sm be the size of the smallest swap disk (in terms of available disk space), and N be the number of swap disks. Then the total amount of swap data that can be stored by an instance of the PixInsight Core application is N*Sm. This is relevant to using RAM disks for swap file storage, since RAM is generally a scarce resource.

- In practice, you may need 32 or 64 GiB of RAM (depending on the complexity of your projects) to use RAM disks for swap file storage effectively in real-world processing works. For example, with 64 GiB of RAM, you can use a RAM disk of 32 GiB and two disk drives for a total amount of 96 GiB swap storage, while still leaving 32 GiB of RAM for the application. This benchmark is an example with a 16 GiB RAM disk configured in parallel with one SSD drive and a rotational drive (SATA 6 Gb/s), for testing purposes, where I achieved 3391 MiB/s on Linux. The same machine with a single SSD drive achieves 2072 MiB/s. With two SSDs and one rotational drive achieves 2751 MiB/s. In these examples, the SSD drives are connected to the motherboard's SATA interface (Intel X79 chipset) and the rotational drive to a dedicated hardware RAID PCIe card (this is important for the reasons described in the next point).

- Several fast hard drives connected to motherboard SATA ports may not perform as well as expected for parallel I/O. The total bandwidth of the SATA controller has to be divided by the number of drives. When the controller gets saturated, there is no benefit in adding more disks.

- For the best performance, a dedicated RAID PCIe card with several SSD drives is the best option in my opinion. For example, in the workstation used for the benchmarks linked above, I opted for an LSI MegaRAID SAS 9271-4i, which has 4 SATA/SAS internal ports and a transfer rate of 6Gb/s per port. Right now this card is being used for two huge RAID 1 arrays so unfortunately I can't use it for benchmarks with SSDs, but as soon as I can I'll make some tests.

- Each operating system poses its own tradeoffs. On relatively powerful machines, Linux is IMHO the most efficient platform for PixInsight. The benchmarks expose (confirming our day-to-day experience) that the Linux kernel is hard to beat in terms of disk cache management. On Mac OS X and Windows, RAM disks and parallel swap disks provide more performance benefits.

- Irrespective of all of the above, SSDs and more RAM are always good to improve your user experience with PixInsight. The benchmark attempts to provide you with objective insights about the best ways to optimize your hardware resources.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PixInsight Benchmark
« Reply #58 on: 2014 May 20 02:58:15 »
I think that although swap performance is very important for PI, its weight in the final score of the benchmark is too big. In my computer (which has a SSD) the benchmark returns similar times for the CPU (73 sec) and swap(63 sec). I doubt that in a normal use of PI it spends the same time writing to the swap area as in the CPU.

This all depends on the type and complexity of the user's projects. For example, working on a 4x4 frame RGB mosaic (say 16Kx16K RGB pixels) is not the same as working on a single CCD image. The differences between these two projects in terms of machine requirements, and very especially in terms of swap storage requirements, are abysmal.

I have designed this benchmark to reproduce a rather complex image processing scenario, where the user has to work with very large images applying many CPU-intensive tasks successively. It is true that this is probably not the most common scenario for a majority of users, but it is IMO the best way to provide useful information on the performance of a machine. An intensive benchmark tends to expose weak points better, while a less intensive one would be more prone to masking them.

Quote
Also, the benchmark doesn't measure the speed of the data disk. Nobody has the images in a RAM disk...

Typically, the tasks of loading raw images and writing processed images are performed at the beginning and end of a processing work, respectively. In the context of a complex and large processing work, they are practically irrelevant when compared to the total time required to read and write swap data.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Andres.Pozo

  • PTeam Member
  • PixInsight Padawan
  • ****
  • Posts: 927
Re: PixInsight Benchmark
« Reply #59 on: 2014 May 20 03:17:00 »
Typically, the tasks of loading raw images and writing processed images are performed at the beginning and end of a processing work, respectively. In the context of a complex and large processing work, they are practically irrelevant when compared to the total time required to read and write swap data.
However, the data disk speed is very important for the BatchPreprocess script which is one of the slowest steps processing an image. In my usual workflow the two slowest steps are BatchPreprocess and TVGDenoise. The swap speed is irrelevant in both processes.

Reading this thread it seems that using a RAM disk is a recommended practice since improves greatly the benchmark, but in most cases it will do more harm than good. The problem is that the benchmark does not reflect a usual workload of a typical user of PI. I think that the swap operations are overrepresented.