Sander,
I am watching the GP-GPU arena in my professional life as a software developer.
- It has a lot of potential, in particular if you algorithm is CPU bound and has a fairly simple control flow (such as many image processing algorithms)
- Memory bound algorithms can benefit as well, if they have simple access patterns (basically linear access).
- An important bottleneck currently is the data transfer between graphics card and main memory. If you have only short problems to compute on the GPU, this can kill any speedup.
- In some cases, it is necessary to design your implementation specifically for GPUs. Algorithms working well on normal CPUs (often tuned to optimize L2-cache access) sometimes dont work well on GPUs.
If you want to experiment with GPUs, I would stick to CUDA for the moment. The first OpenCL implementations lack maturity, and at the current stage OpenCL does not even have mandatory support for floats (see the spec, or
http://de.wikipedia.org/wiki/OpenCL). Also, CUDA is sufficiently similar to OpenCL, so you should not have much trouble in moving to OpenCL later.
Georg