Author Topic: Gradient HDR compression, take 2  (Read 24213 times)

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #15 on: 2011 June 20 11:21:25 »
...My module, with the whole process activated, took ~3secs to process the Belgium House (http://www.cs.huji.ac.il/~danix/hdr/pages/belgium.html).

Hi Carlos,

good and bad news when testing your solver_dct3.h solver:

- The good news first: The PI based solver produces results that are almost identical to the FFTW based one. Max. difference between both images is <6e-8
- The bad new: On  a 10 MPixel image, the PI based solver need 32.9 seconds. The FFTW based one is done in 3.72 seconds (all measured in my Linux virtual machine).

I think the performance of the PI based solver is not yet optimal. A factor of 10 between both solvers indicates that there is room for improvement.

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: Gradient HDR compression, take 2
« Reply #16 on: 2011 June 20 12:02:18 »
Yes, I agree. It may be optimized a lot, but I'll doub that we'll reach those 3.72 secs, specially with the current solver

So, the room for improvement may be stretched down to the following main issues:
- Lack of paralelization in certain key operations. They are of O(N), but may be doing some sort of bottleneck. The 1D FFT operations should be fast enough.
- To calculate the DCT I had to double the size of the row (or column) vector, internally. This means, I'm calculating the FFT of O(2N log 2N) wich is (of course) slower... There should be more efficient ways to implement DCTs, and I'm sure the guys of FFTW are doing that.
The problem is that the DCT is calculated with a basis that has more frecuency than the ones used in the FFT... I found and algorithm that used only N points, but it yielded artifacts around sharp edges.
- Other minor optimizations of the code. Maybe we can rearrange loops, or use pointers to speed up things a bit.

Anyway, isn't that bad :D Thanks for making the comparison.


Changing the subject, I wanted to ask you some things about your implementation:
- How do the gradient clipping parameters work with the images? Not the algorithm, but the results... I see why clipping low gradients may yield posterization, but what happens with the high clipping? Are they really necessary ?
- Now a bit more about the algoritm... about your powering operation. You said that gradients are operated in the Pow(gradient, value) fashion. What happens if the gradient is negative, and you are taking the square root? Or square power? You change the sign of it...
In the paper of Fattal, et al, they use a factor:
alpha/Abs(gradient) * (Abs(gradient)/alpha)^beta
wich is almost the same idea, but operating over the magnitude of the gradient, instead of the individual values, and hence acts as an amplificator or attenuation factor.
- Have you analyzed the gradient field of an HDR astronomical image? Do you have any idea how we may develop a new factor field? I was thinking on steal some basic notions from the HDRWT transform, and operate on the gradient... but I have not formalized that yet. Other ideas?
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline Harry page

  • PTeam Member
  • PixInsight Jedi Knight
  • *****
  • Posts: 1458
    • http://www.harrysastroshed.com
Re: Gradient HDR compression, take 2
« Reply #17 on: 2011 June 20 12:46:03 »
Hi

I am watching this thread with great interest  , haven't got a clue what you are talking about  mind you  :'(

Harry
Harry Page

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #18 on: 2011 June 20 13:32:31 »
Hi Carlos,
just some quick answers before going to bed.

...
Anyway, isn't that bad :D Thanks for making the comparison.
Changing the subject, I wanted to ask you some things about your implementation:
...

FFWT is known to be fast. I dont think that parallelization is an issue here, because I am running on a single core virtual machine (and did not link in the threaded FFTW version). Memory may be an issue, either size or memory bandwidth. Many of these algorithms are memory bound, and if you need to handle twice the number of points, this increases runtime as well. I wonder what Intel MKL would give us on top of it, maybe I will try some time...

Regarding your questions:
- high clipping and pow() work on abs(gradient), but always restore the sign of the result. For example, if(gradient<0) newGradient=-pow(-gradient,exp) else result=pow(gradient,exp). In effect, both pow() and high clipping have similar effects, and I do not yet know what is better. High clipping is just faster to compute.
- high clipping and pow() both reduce the influence of high abs(gradient) values, thus emphasising faint gradients. Thats why I am seeing so much more detail when doing high clipping. Stars tend to vanish (because they have high gradients), faint nebula details are emphasised (because they have tiny gradients).
- I am not sure what your mean by "Have you analyzed the gradient field of an HDR astronomical image". The M42 image is an astronomical one. And I am operating on the gradients (dx, dy).

Have a nice evening,

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #19 on: 2011 June 20 13:33:55 »
...I am watching this thread with great interest  , haven't got a clue what you are talking about  mind you  :'(

Hi Harry,
its all about numerics, performance, and best use of gradient domain information. Hope we get useful modules from this...
Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #20 on: 2011 June 21 03:18:47 »
Regarding the use of FFTW3 as implemented in Intel MKL: The documentation states that "Multidimensional r2r transforms"  are not implemented (http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/appendices/mkl_appG_FFTW3_Using_Wrappers.htm:'( . Just what my implementation would have needed.

Carlos: The 1D r2r transforms are there, so you may try if you like...

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Gradient HDR compression, take 2
« Reply #21 on: 2011 June 21 04:20:02 »
You don't need FFTW because PCL provides you with all you need. For two-dimensional FFT of real data:

http://pixinsight.com/developer/pcl/doc/html/classpcl_1_1GenericRealFFT2D.html

Maybe not the 'fastest in the west', but PCL FFT support is based on the KISS FFT library, which is very fast and accurate. KISS FFT is released under a BSD license that won't taint your module —with FFTW, you simply cannot publish your work as a PixInsight module except for your own private use. And no, I won't pay 7000 USD for that library. I support the idea of open source in general, but open source Talibanism is a serious obstacle for cultural and technological development.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #22 on: 2011 June 21 04:30:32 »
Hi Juan,

You don't need FFTW because PCL provides you with all you need....

As shown earlier in this thread, I did test Carlos implementation of a solver using the PI FFT tools, and it turns out that it is a factor 10 slower than the FFTW implementation (plus it uses more memory). I am not really an FFT expert, but maybe you can help Carlos to further tune his implementation. Maybe an additional interface or something like this is all that is missing.

BTW: I wouldn't call a license fee for software "Talibanism" . Good work deserves good money, and everyone should be free to give his software away for free, or to charge a license fee.

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Gradient HDR compression, take 2
« Reply #23 on: 2011 June 21 04:43:09 »
Unfortunately I don't have the time now to review Carlos' code, but here are a few quick notes to help improve efficiency:

- Complex FFTs are about twice slower than real FFTs.

- Unoptimized matrix dimensions severely slow down FFT performance in PCL.

- For 2D transforms, internal PCL code (GenericRealFFT2D<>) has been parallelized and is quite efficient. It shouldn't be replaced with a 'manual' implementation of a 2D transform.

- Maybe you are using a different algorithm that is more efficient per se?

Quote
Good work deserves good money, and everyone should be free to give his software away for free, or to charge a license fee.

Of course, I agree completely. But, do you think 7000+ USD is reasonable in this case? In my opinion, charging such an amount of money —to everybody without distinction, no matter you are M$ or a tiny emerging company— looks like using an open-source license to hide a de facto software patent. So you're right, Talibanism isn't the right word in this case.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: Gradient HDR compression, take 2
« Reply #24 on: 2011 June 21 07:04:07 »
Thanks Georg for your explanation on the parameters.
Right now, I'm wondering if what we need isn't like a band pass filter, where we enhance medium gradients, and reduce small and great ones. I've toyed with some functions in Maple, and a found a family of gaussian like ones that may do the trick.
Another approach would be a multiscale varying power parameter. It is quite easy to implement from my current algorithm, but the interfase would need to use code from ATWT. The idea is to emphatize low scale gradients, and reduce large scale ones. It would be user fully controllable (or, we may use an analytical dependency of the scale, with a turning point of gradient scales that should be preserved as they are).

The reason I asked you if you had analyzed gradients on an astronomical image, is that we need to understand how the gradients behave, and which ones (do a good segmentation) we want to target. HDRCompression is based on the fact that illumination changes are linked with high frecuency components (just like in the HomomorphicFilter), since it is aimed for daylight images. I guess that this approach should be perfect for lunar HDR, but DSO or widefield is another thing. We should target large scale gradients, and at the same time, recover or emphatize small scale gradients. Of course, stars are another problem, and we may just try to reduce them too... and that's a problem in the small/medium scale. That's why I was wondering about selective functions.
Anyway, I think that this needs a bit more discussion before blindy writing the code.
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline Carlos Milovic

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2172
  • Join the dark side... we have cookies
    • http://www.astrophoto.cl
Re: Gradient HDR compression, take 2
« Reply #25 on: 2011 June 21 07:22:50 »
Juan
Quote
You don't need FFTW because PCL provides you with all you need.
I don't fully agree. If we had the DCT and DST as part of the FFT implementation, we would not having this discussion. Right now I'm manually calculating them from the FFT. There are much better ways to do that.

Quote
Unfortunately I don't have the time now to review Carlos' code, but here are a few quick notes to help improve efficiency:

You never have ;) And the code is accumulating there, rusting.

Quote
- Complex FFTs are about twice slower than real FFTs.

I found no way to perform the inverse FFT with real data, so in that case I'm using the complex one. Maybe I misunderstood the 1D FFT header...

What would be perfect is a Cos-FFT transform, i.e. calculate the real part of the FFT of real data, and viceversa. Then we may use that to get the DCT (and the inverse) with much less penalty.

Quote
- Unoptimized matrix dimensions severely slow down FFT performance in PCL.
Again, I'm always using optimized dimensions. The problem with dimensions is that for the DCT calculation I had to double the size of each vector, and hence there is a memory/operations penalty. The DCT uses only 1/4 of the output data, so it is really a waste of time. I've read somewhere that it is possible to calculate the DCT using both the real and imaginary parts of a trimmed FFT. Anyway, this involves messing with the FFT implementation, and cut down things there. And that something I would not do.

Quote
- For 2D transforms, internal PCL code (GenericRealFFT2D<>) has been parallelized and is quite efficient. It shouldn't be replaced with a 'manual' implementation of a 2D transform.

I'm not sure if I may reproduce all the intermediate steps to get the DCT in a 2D scheme. The recommendation on the site describing the procedure was to get the 2D DCT by row and column 1D separations.
Regards,

Carlos Milovic F.
--------------------------------
PixInsight Project Developer
http://www.pixinsight.com

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #26 on: 2011 June 21 07:34:15 »
Thanks Georg for your explanation on the parameters.
Right now, I'm wondering if what we need isn't like a band pass filter, where we enhance medium gradients, and reduce small and great ones. I've toyed with some functions in Maple, and a found a family of gaussian like ones that may do the trick.
...
The reason I asked you if you had analyzed gradients on an astronomical image, is that we need to understand how the gradients behave, and which ones (do a good segmentation) we want to target. HDRCompression is based on the fact that illumination changes are linked with high frecuency components (just like in the HomomorphicFilter), since it is aimed for daylight images.
...

Hi Carlos,

my implementation never had the goal to work for daylight images. Astronomical images are my taerget here. Maybe HDR Compression is not the right term for the method I have chosen?!?

I dont think that the method I implemented is similar to band filters. I am not really enhancing/reducing certain frequencies. It's about the size of the changes. I am not sure if there is an equivalent in traditional filters. The goal of the method is to hear the birds sing while filtering out the noisy aeroplanes based on the amount of sound they make - and not based on the frequency of the noise.

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: Gradient HDR compression, take 2
« Reply #27 on: 2011 June 21 13:53:33 »
Hi Carlos,

Quote
Quote
You don't need FFTW because PCL provides you with all you need.
I don't fully agree. If we had the DCT and DST as part of the FFT implementation, we would not having this discussion. Right now I'm manually calculating them from the FFT. There are much better ways to do that.

I was referring to Georg's module. He uses FFTs of real data, and this can be done without problems with PCL. Unless I am missing something else, for the tasks he is carrying out PCL routines should require little more than a drop-in replacement. I'll try to post some example code later.

For DCT transforms, take a look at Ooura's source code:

http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html

I am considering embedding this code in PCL. I think you can integrate Ooura's routines quite easily with your code to fit your needs. These routines are fast from what I've read. Here are some comparison (pretty outdated but useful):

http://www.kurims.kyoto-u.ac.jp/~ooura/fftbmk.html

Quote
You never have ;) And the code is accumulating there, rusting.

I do what I can --too many things lie on my shoulders and it is getting worse. Instead of accumulating a large amount of code, perhaps you should focus on finishing one tool at a time completely and writing your code in a more organized and efficient way. That's a known way to prevent rustiness ;)
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Harry page

  • PTeam Member
  • PixInsight Jedi Knight
  • *****
  • Posts: 1458
    • http://www.harrysastroshed.com
Re: Gradient HDR compression, take 2
« Reply #28 on: 2011 June 21 13:59:11 »

Quote
Quote
You don't need FFTW because PCL provides you with all you need.
. Instead of accumulating a large amount of code, perhaps you should focus on finishing one tool at a time completely and writing your code in a more organized and efficient way. That's a known way to prevent rustiness ;)

Oh dear am I going to mention a pot and kettle  >:D

Harry
Harry Page

Offline georg.viehoever

  • PTeam Member
  • PixInsight Jedi Master
  • ******
  • Posts: 2132
Re: Gradient HDR compression, take 2
« Reply #29 on: 2011 June 21 21:22:27 »
Hi Juan,

...I was referring to Georg's module. He uses FFTs of real data, and this can be done without problems with PCL. Unless I am missing something else, for the tasks he is carrying out PCL routines should require little more than a drop-in replacement. I'll try to post some example code later.
....

In fact, I am using DCT/IDCT from the FFTW code, not a full FFT. For the moment, I can live with FFTW (since  I am doing this purely non-commercially), or can use Carlo's solver as a workaround - at the cost of giving up much of the performance that makes realtime preview possible. I also had a look at quick look at Ooura's code: This might be an option, but you have to do much more padding (can handle power of 2 sizes only).

Georg
Georg (6 inch Newton, unmodified Canon EOS40D+80D, unguided EQ5 mount)