by dpJudas » Tue Aug 07, 2018 2:40 pm
I did not rule out the possibility texture sampling could be competitive to the array lookup mostly because texture sampling is so common that GPUs have special silicon and instructions dedicated for precisely this purpose. Especially very old GPUs were terrible at anything with integers in it. There a texture sampling could very well beat the array version as it could be done entirely with floating point instructions.
It is sort of like when you compare the performance between clamp(x, 0.0, 1.0) and max(x, 0.0). The clamp version is faster, especially on older hardware, because there's a dedicated saturate instruction. Same kind of thing can apply for texture stuff. But nowadays, with compute shaders and CUDA, I don't think a texture sampling can beat the static array unless the array is very large (won't fit into local workgroup memory). Even if it does still beat it, the speed difference would most likely be insignificant. The static array version is much easier to maintain and thus wins per default if all other things are in the same ballpark.
About the static array initialization, make sure you don't use that second static variable as input into the first one (change it to a define). I wouldn't put it past the dumbest compilers to then conclude it isn't a constant expression and have it initialize the table on each invocation.
I did not rule out the possibility texture sampling could be competitive to the array lookup mostly because texture sampling is so common that GPUs have special silicon and instructions dedicated for precisely this purpose. Especially very old GPUs were terrible at anything with integers in it. There a texture sampling could very well beat the array version as it could be done entirely with floating point instructions.
It is sort of like when you compare the performance between clamp(x, 0.0, 1.0) and max(x, 0.0). The clamp version is faster, especially on older hardware, because there's a dedicated saturate instruction. Same kind of thing can apply for texture stuff. But nowadays, with compute shaders and CUDA, I don't think a texture sampling can beat the static array unless the array is very large (won't fit into local workgroup memory). Even if it does still beat it, the speed difference would most likely be insignificant. The static array version is much easier to maintain and thus wins per default if all other things are in the same ballpark.
About the static array initialization, make sure you don't use that second static variable as input into the first one (change it to a define). I wouldn't put it past the dumbest compilers to then conclude it isn't a constant expression and have it initialize the table on each invocation.