Author Topic: PixInsight Benchmark  (Read 64218 times)

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: PixInsight Benchmark
« Reply #60 on: 2014 May 20 03:30:38 »
...However, the data disk speed is very important for the BatchPreprocess script which is one of the slowest steps processing an image. In my usual workflow the two slowest steps are BatchPreprocess and TVGDenoise. The swap speed is irrelevant in both processes. ...

I think this is a problem with any benchmark (and any metric): It measures the capability to run the benchmark, nothing else. How this translates into the concrete situations you encounter in real life situations needs some intelligent interpretation of the results. This is not a problem specific to this PI benchmark, it is inherent to anything you try to measure. You simply measure what you measure, nothing else.

I think the current PI benchmark may be suitable to get an idea about the speed of certain CPU intense operations, and about the swap speed. It is probably not so helpful for getting an idea about reading/writing of FITS files (a major component of BPP), and about the speed of processes that do not use all CPU cores.

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PixInsight Benchmark
« Reply #61 on: 2014 May 20 03:41:04 »
Quote
However, the data disk speed is very important for the BatchPreprocess script which is one of the slowest steps processing an image.

That's true, the current benchmark tells little about a machine's performance for batch preprocessing tasks. However, measuring performance for these tasks requires a completely different benchmark process. We definitely should have more benchmarks, and the best candidate for a new one is a batch preprocessing benchmark. I'll work on this, time permitting (right now we are working on drizzle and TGV restoration, which are priorities).
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PixInsight Benchmark
« Reply #62 on: 2014 May 20 03:43:21 »
I think this is a problem with any benchmark (and any metric): It measures the capability to run the benchmark, nothing else. How this translates into the concrete situations you encounter in real life situations needs some intelligent interpretation of the results. This is not a problem specific to this PI benchmark, it is inherent to anything you try to measure. You simply measure what you measure, nothing else.

Absolutely!
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline slang

  • Member
  • *
  • Posts: 60
Re: PixInsight Benchmark
« Reply #63 on: 2014 May 20 03:44:13 »
Hi.

- In practice, you may need 32 or 64 GiB of RAM (depending on the complexity of your projects) to use RAM disks for swap file storage effectively in real-world processing works. For example, with 64 GiB of RAM, you can use a RAM disk of 32 GiB and two disk drives for a total amount of 96 GiB swap storage, while still leaving 32 GiB of RAM for the application. This benchmark is an example with a 16 GiB RAM disk configured in parallel with one SSD drive and a rotational drive (SATA 6 Gb/s), for testing purposes, where I achieved 3391 MiB/s on Linux. The same machine with a single SSD drive achieves 2072 MiB/s. With two SSDs and one rotational drive achieves 2751 MiB/s. In these examples, the SSD drives are connected to the motherboard's SATA interface (Intel X79 chipset) and the rotational drive to a dedicated hardware RAID PCIe card (this is important for the reasons described in the next point).

That is one impressively powerful machine, and an extremely nice CPU. \me want ;-)

One question - have you tried any of the kernel tuning like I described in http://pixinsight.com/forum/index.php?topic=7083.msg47979#msg47979 ? That approach (apart from being a little dangerous if no UPS...) seems a good balance between maximising use of speedy RAM without constraining RAM available for applications by specifically allocating to a RAM disk.

I've managed Swap transfer rate ..... 3385.359 MiB/s with considerably simpler cheaper hardware. I've even repeated this test with one, two and three drives with generally little difference, as most files sit entirely in RAM most of the time.

I know that a different processing load will behave quite differently, and any system can be tuned to maximise any specific benchmark score and be not so good for other purposes, but a repeatable benchmark like this is fantastic for helping people (like me) understand the differences in performance from system optimisation. Thank you for this (and everything else w.r.t. PI), yet more to learn and understand, but I now know that I will be spending a lot less time processing than I was before...

Typically, the tasks of loading raw images and writing processed images are performed at the beginning and end of a processing work, respectively. In the context of a complex and large processing work, they are practically irrelevant when compared to the total time required to read and write swap data.

I agree completely with this (not that I am an expert or anything). I locate all my files on Gigabit connected NAS. Whilst image loads are slower than local disk, the performance penalty isn't that great, compared to the overall processing time. Given that I use Linux, I have the NAS NFS mounted in an appropriate local directory, and I get ~300-400Mbit/Sec (it is a relatively low-end NAS...) BPP does grind a little bit, but once images are registered, I don't often need to go back and repeat. However processing and reprocessing and re-integrating does get repeated a lot in my world...
--
Mounts: Orion Atlas 10 eq-g, Explore Scientific G11-PMC8
Scopes: GSO RC8, Astrophysics CCDT67, ES FCD100-80, TSFLAT2
Guiding: ST80/QHY OAG/QHY5L-II-M
Cameras: Canon EOS 450D (IR Mod), QHY8L, QHY163m/QHYFW2-US/Astronomik LRGBHaSiiOii

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PixInsight Benchmark
« Reply #64 on: 2014 May 20 04:06:39 »
One question - have you tried any of the kernel tuning like I described in http://pixinsight.com/forum/index.php?topic=7083.msg47979#msg47979 ? That approach (apart from being a little dangerous if no UPS...) seems a good balance between maximising use of speedy RAM without constraining RAM available for applications by specifically allocating to a RAM disk.

Your approach is great and the performance improvement is terrific. Thanks for that post; I've enjoyed learning new things about Linux, especially from such a practical perspective. Of course we have all of our computers powered by an UPS, but to be sincere, I don't dare doing this kind of tuning on this machine. This is the machine where we have now most of our working source code, test images and projects, development virtual machines, ... :) As soon as I can, I'll try your tuning on another (older) workstation. This is the main reason why I love Linux and FreeBSD: they allow you to really control your machine.

Thank you for this (and everything else w.r.t. PI), yet more to learn and understand, but I now know that I will be spending a lot less time processing than I was before...

Nothing of this would ever happen without our users support, so thank you. Glad to know this benchmark project is being useful.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline NGC7789

  • PixInsight Old Hand
  • ****
  • Posts: 391
Re: PixInsight Benchmark
« Reply #65 on: 2014 May 20 06:01:12 »
When you configure several disks for parallel swap storage in PixInsight, each swap file is spread on all disks in equal chunks


This is very useful info and is the death knell of the small ram disk concept. A 4gb ram disk and one ssd means only 8gb of swap!

Several fast hard drives connected to motherboard SATA ports may not perform as well as expected for parallel I/O. The total bandwidth of the SATA controller has to be divided by the number of drives. When the controller gets saturated, there is no benefit in adding more disks.

This is also extremely useful info. If and when I add more SSDs or drives for swap I will also add a PCI SATA controller.

I'm continuing to get my dual boot Fedora set up. The benchmark making the advantages of Linux so apparent has also been a great help.

As always thanks Juan!

-Josh

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: PixInsight Benchmark
« Reply #66 on: 2014 May 20 07:06:41 »
I'm very puzzled with my results, which shows big differences in CPU ans swap speeds between Win8.1 and Fedora19/20:

 AMD Phenom(tm) II X6 1055T Processor (5) [2887]
#    Serial Number    UTC    Version    Platform    Threads    RAM (GiB)    Total    CPU    Swap    Transfer (MiB/s)    Total time (s)
1    VBGNW50KTOS62JQQYH7O06LENV80704V    2014/05/19 14:53:14    1.00.07    Windows    6    11.998    2887    3661    1545    278.95    162.94 s
2    Q99TO8T6O1RH5JPMWB3UK053198WSV02    2014/05/20 01:16:07    1.00.07    Windows    6    11.998    2439    3677    1022    184.45    192.90 s
3    VE0VQY1LLJ8B0KR0YQOIN4O8U292DZYV    2014/05/19 15:48:13    1.00.07    Linux    6    11.735    1090    899    8958    1617.37    431.46 s
4    E97CSG980I3T2ENE7XU33YP0353TJ1KL    2014/05/20 00:55:18    1.00.07    Linux    6    11.734    1078    891    8008    1445.95    436.38 s
5    3PK7X9C893816MRIRU3S9287QKN7P1Y2    2014/05/12 18:40:58    1.00.07    Linux    6    11.735    948    781    7945    1434.42    496.41 s

I have a mild OC setup, that linux refuses to see, and it is scalling the frequency down to the factory default 2.8GHz. On Win, I get 3.4GHz. Anyhow, I don't understand such a great impact on the run times. Also there is a huge difference in the swap speed, even when the main disk is the same SSD (and I have set 16Gb of virtual memory).

Any insights?
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: PixInsight Benchmark
« Reply #67 on: 2014 May 20 07:42:38 »
Hi Carlos,

Frankly, I also find this difference in swap speed between Windows and Linux a bit surprising (see earlier posts). I googled around for messages that might indicate similar results for other applications, but I did not really find anything.

Maybe someone with a suitable system can do IOMeter or IOZone benchmarks https://www.ibm.com/developerworks/community/blogs/anthonyv/entry/performance_and_bench_marking_tools?lang=en on Windows and Linux, and see the PI results can somehow be reproduced with these tools.

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: PixInsight Benchmark
« Reply #68 on: 2014 May 20 07:53:36 »
I'm more puzzled on the CPU differences... CPU times are 4x longer on linux. That's huge. I cannot understand that much degradation.
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: PixInsight Benchmark
« Reply #69 on: 2014 May 21 03:14:20 »
I'm more puzzled on the CPU differences... CPU times are 4x longer on linux. That's huge. I cannot understand that much degradation.
Yes, this is indeed strange. If you look through the collection of results, the CPU time difference between Windows and Linux is usually only around 10% for otherwise comparable platforms. The BIG difference is usually in swap I/O.
Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline slang

  • Member
  • *
  • Posts: 60
Re: PixInsight Benchmark
« Reply #70 on: 2014 May 21 03:21:56 »
I'm more puzzled on the CPU differences... CPU times are 4x longer on linux. That's huge. I cannot understand that much degradation.
Yes, this is indeed strange. If you look through the collection of results, the CPU time difference between Windows and Linux is usually only around 10% for otherwise comparable platforms. The BIG difference is usually in swap I/O.
Georg

Yeah, it's very very odd indeed. Maybe try something like vmstat 2 in a linux term window while the benchmark is running. Useful things to note are swap in/out and the cpu-wa column (where the CPU is waiting, typically on I/O.)

It's possible that the particular linux drivers for your system (sata?) leave something to be desired, and are hogging cpu, or something.

Cheers -
--
Mounts: Orion Atlas 10 eq-g, Explore Scientific G11-PMC8
Scopes: GSO RC8, Astrophysics CCDT67, ES FCD100-80, TSFLAT2
Guiding: ST80/QHY OAG/QHY5L-II-M
Cameras: Canon EOS 450D (IR Mod), QHY8L, QHY163m/QHYFW2-US/Astronomik LRGBHaSiiOii

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: PixInsight Benchmark
« Reply #71 on: 2014 May 22 08:28:13 »
This is very interesting... from the vmstat, it seems that the cpu usage never goes beyond 20%. There must be a service or something creating the bottleneck.





procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0    104 8910120 153208 969652    0    0    34    31  554  705  7  2 90  0
 0  0    104 8906656 153208 969648    0    0     0     0  537  611  0  0 99  0
 0  0    104 8906648 153208 969648    0    0     8     0  808 1085  1  1 98  0
 1  0    104 8843764 153216 1012184    0    0    24    10 2681 3211 13  2 85  0
 1  0    104 8841152 153216 1012188    0    0     0   100 1485  462 20  0 79  0
 1  0    104 8834688 153216 1012188    0    0     0    20 1387  298 20  0 80  0
 1  0    104 8834804 153216 1012188    0    0     0     0 1247  243 20  0 80  0
 1  0    104 8834680 153224 1012188    0    0     0     6 1249  233 20  0 80  0
 1  0    104 8849452 153224 1012188    0    0     0     0 1270  298 20  0 80  0
 1  0    104 8865460 153224 1012188    0    0     0    14 1403  387 21  0 79  0
 1  0    104 8859092 153224 1012188    0    0     0     0 1275  302 20  0 80  0
 1  0    104 8859092 153224 1012188    0    0     0     0 1206  236 20  0 80  0
 1  0    104 8859092 153224 1012188    0    0     0     0 1184  217 20  0 80  0
 1  0    104 8873220 153224 1012188    0    0     0     0 1250  295 20  0 80  0
 1  0    104 8862804 153224 1012188    0    0     0     0 1306  367 20  0 80  0
 1  0    104 8859580 153224 1012188    0    0     0    14 1251  289 20  0 80  0
 1  0    104 8859572 153224 1012188    0    0     0     0 1206  240 20  0 80  0
 1  0    104 8859588 153224 1012188    0    0     0     0 1180  218 20  0 80  0
 1  0    104 8861688 153232 1031260    0    0     0    24 1523  638 20  1 79  0
 1  0    104 8845700 153240 1059608    0    0     0     8 1643  773 20  1 80  0
 1  0    104 8737448 153240 1069164    0    0     0     0 1279  346 20  0 80  0
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: PixInsight Benchmark
« Reply #72 on: 2014 May 22 08:30:17 »
Found it... Silly me! I had parallel processing disabled :P I did that a couple of weeks ago, when working on a development module that was having an odd behaviour, and forgot to turn it on again!

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0    104 8528892 155256 1036280    0    0    35    29  537  669  8  2 90  0
 0  0    104 8530504 155256 1036280    0    0     0     0 1651 1891  1  1 98  0
 0  0    104 8511256 155264 1037028    0    0     0    28 2836 4232  3  2 95  0
 1  0    104 8530256 155272 1037052    0    0     0    10 1357 1492  1  1 98  0
 1  0    104 8456552 155272 1079660    0    0    78     4 5994 3510 68  2 29  1
 5  0    104 8459576 155272 1079660    0    0     0     0 5457 1442 89  1 10  0
 5  0    104 8469932 155280 1079652    0    0     0     6 5396 1321 90  1  9  0
 5  0    104 8459040 155280 1079660    0    0     0    52 5440 1433 90  1  9  0
 5  0    104 8387368 155280 1136424    0    0     0     0 6073 3115 72  2 26  0
 5  0    104 7813992 155280 1136636    0    0     0     0 5112 1015 93  2  5  0
 5  0    104 7108436 155280 1923072    0    0     0     0 4873 1678 75  5 20  0
 5  0    104 7335972 155280 1972368    0    0    14     0 5200 1467 91  2  7  0
 5  0    104 6735324 155280 1972256    0    0     0     0 5359 1057 97  3  0  0
 5  0    104 6214028 155288 1972248    0    0     0     6 4769  994 85  2 13  0
 5  0    104 5624888 155288 1972256    0    0     0     4 5266  977 98  2  0  0
 5  0    104 5071876 155288 1972256    0    0     0    96 4907  987 89  3  9  0
 5  0    104 4476676 155288 1972256    0    0     0     0 5231  940 98  2  0  0
 5  0    104 1307716 155292 5117840    0    0     0    10 4468 1485 18 15 67  0
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: PixInsight Benchmark
« Reply #73 on: 2014 May 22 09:26:41 »
1    479CUT3MUX55926Y6MJ5Y28W41V0251O    2014/05/22 15:36:53    1.00.07    Linux    5    11.734    4064    3583    9136    1649.58    115.74 s
2    LG1IW49TR40GPW42438V7K765P1SYRG9    2014/05/22 15:45:11    1.00.07    Linux    5    11.734    4046    3568    9078    1639.09    116.27 s
3    65K0UBK0P4V5N3BZ10KIDMVB2MHZK27D    2014/05/22 16:21:19    1.00.07    Linux    5    11.734    4002    3544    8627    1557.64    117.53 s
4    VBGNW50KTOS62JQQYH7O06LENV80704V    2014/05/19 14:53:14    1.00.07    Windows    6    11.998    2887    3661    1545    278.95    162.94 s
5    Q99TO8T6O1RH5JPMWB3UK053198WSV02    2014/05/20 01:16:07    1.00.07    Windows    6    11.998    2439    3677    1022    184.45    192.90 s
6    VE0VQY1LLJ8B0KR0YQOIN4O8U292DZYV    2014/05/19 15:48:13    1.00.07    Linux    6    11.735    1090    899    8958    1617.37    431.46 s


As you can see, due to linux's best swap management, it is now on top :) The cpu is still a bit faster on windows. It could be the frequency scaling... but in my 3rd test (see table above) I disabled the BIOS's Cool & Quiet mode, that enables frequency management by the OS, and it did not impact the result. Anyway, I can live with that. :)


Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline Bob Andersson

  • Member
  • *
  • Posts: 67
Re: PixInsight Benchmark
« Reply #74 on: 2014 June 01 02:07:54 »
... ram disk is based on the assumption that the OS and/or the application are not using the ram effectively. A ram disk helps swap performance but you are taking the ram away from the OS and the app.
The only reason I can think of for PI to perform better when available RAM is sectioned off as a RAM Disk is that PI is not using RAM effectively. Is this because one of the target OSes has restrictions on how much, or even how, an application can use available RAM with the result that all target OSes have to be limited by the common code base of PI?

Bob.
TEC 140 'scope, FLI ML16803 camera, ASA DDM60 Pro mount.