Yes, I agree. It may be optimized a lot, but I'll doub that we'll reach those 3.72 secs, specially with the current solver
So, the room for improvement may be stretched down to the following main issues:
- Lack of paralelization in certain key operations. They are of O(N), but may be doing some sort of bottleneck. The 1D FFT operations should be fast enough.
- To calculate the DCT I had to double the size of the row (or column) vector, internally. This means, I'm calculating the FFT of O(2N log 2N) wich is (of course) slower... There should be more efficient ways to implement DCTs, and I'm sure the guys of FFTW are doing that.
The problem is that the DCT is calculated with a basis that has more frecuency than the ones used in the FFT... I found and algorithm that used only N points, but it yielded artifacts around sharp edges.
- Other minor optimizations of the code. Maybe we can rearrange loops, or use pointers to speed up things a bit.
Anyway, isn't that bad
Thanks for making the comparison.
Changing the subject, I wanted to ask you some things about your implementation:
- How do the gradient clipping parameters work with the images? Not the algorithm, but the results... I see why clipping low gradients may yield posterization, but what happens with the high clipping? Are they really necessary ?
- Now a bit more about the algoritm... about your powering operation. You said that gradients are operated in the Pow(gradient, value) fashion. What happens if the gradient is negative, and you are taking the square root? Or square power? You change the sign of it...
In the paper of Fattal, et al, they use a factor:
alpha/Abs(gradient) * (Abs(gradient)/alpha)^beta
wich is almost the same idea, but operating over the magnitude of the gradient, instead of the individual values, and hence acts as an amplificator or attenuation factor.
- Have you analyzed the gradient field of an HDR astronomical image? Do you have any idea how we may develop a new factor field? I was thinking on steal some basic notions from the HDRWT transform, and operate on the gradient... but I have not formalized that yet. Other ideas?