Fuzz style Software is slow. A theoretical cheaper version.

Remember, just because you request it, that doesn't mean you'll get it.

Moderator: GZDoom Developers

Fuzz style Software is slow. A theoretical cheaper version.

Postby RuscoIstar » Wed Dec 22, 2021 11:15 pm

The "Software" fuzz style is great, specially when the effect is scaled up. Although we will never have a perfect recreation of the "Vanilla Doom" or software look on the hardware accelerated engine, it comes really close. There is one problem though, and that is it's so "expensive" on less capable hardware (igpu and mobile) that on legacy builds the fuzz setting defaults to shadow (I know there are other technical reasons to pick shadow too, as explained at viewtopic.php?p=1112677#p1112677 but I'd digress).

It always seemed curious to me that the effect, on the same old hardware, is trivial and not expensive on software mode, so I took a quick look at the code found in https://github.com/coelckers/gzdoom/blo ... oftware.fp and I think I know what's holding it up.

Code: Select allExpand view
// Fuzz effect as rendered by the software renderer

#define FUZZTABLE 50
#define FUZZ_RANDOM_X_SIZE 100
#define FRACBITS 16
#define fixed_t int

int fuzz_random_x_offset[FUZZ_RANDOM_X_SIZE] = int[]
(
   75, 76, 21, 91, 56, 33, 62, 99, 61, 79,
   95, 54, 41, 18, 69, 43, 49, 59, 10, 84,
   94, 17, 57, 46,  9, 39, 55, 34,100, 81,
   73, 88, 92,  3, 63, 36,  7, 28, 13, 80,
   16, 96, 78, 29, 71, 58, 89, 24,  1, 35,
   52, 82,  4, 14, 22, 53, 38, 66, 12, 72,
   90, 44, 77, 83,  6, 27, 48, 30, 42, 32,
   65, 15, 97, 20, 67, 74, 98, 85, 60, 68,
   19, 26,  8, 87, 86, 64, 11, 37, 31, 47,
   25,  5, 50, 51, 23,  2, 93, 70, 40, 45
);

int fuzzoffset[FUZZTABLE] = int[]
(
    6, 11,  6, 11,  6,  6, 11,  6,  6, 11,
    6,  6,  6, 11,  6,  6,  6, 11, 15, 18,
   21,  6, 11, 15,  6,  6,  6,  6, 11, 6,
   11,  6,  6, 11, 15,  6,  6, 11, 15, 18,
   21,  6,  6,  6,  6, 11,  6,  6, 11, 6
);

vec4 ProcessTexel()
{
   vec2 texCoord = vTexCoord.st;
   vec4 basicColor = getTexel(texCoord);

   // Ideally fuzzpos would be an uniform and differ from each sprite so that overlapping demons won't get the same shade for the same pixel
   int next_random = int(abs(mod(timer * 35.0, float(FUZZ_RANDOM_X_SIZE))));
   int fuzzpos = (/*fuzzpos +*/ fuzz_random_x_offset[next_random] * FUZZTABLE / 100) % FUZZTABLE;

   int x = int(gl_FragCoord.x);
   int y = int(gl_FragCoord.y);

   fixed_t fuzzscale = (200 << FRACBITS) / uViewHeight;
   int scaled_x = (x * fuzzscale) >> FRACBITS;
   int fuzz_x = fuzz_random_x_offset[scaled_x % FUZZ_RANDOM_X_SIZE] + fuzzpos;
   fixed_t fuzzcount = FUZZTABLE << FRACBITS;
   fixed_t fuzz = ((fuzz_x << FRACBITS) + y * fuzzscale) % fuzzcount;
   float alpha = float(fuzzoffset[fuzz >> FRACBITS]) / 32.0;

   return vec4(0.0, 0.0, 0.0, basicColor.a * alpha);
}


The first mod operation seems to help the shader update at the correct framerate and is thus not easily negotiable, but the following modulus operators are for array lookups and coordinates. I have found during testing on my project Demake Shaders that this can become a bottleneck in low powered devices. In shader units hardware the % operator seemingly becomes more expensive the bigger the divisor is (for example, the 64 levels of dithering algorithm I have been using is way slower than the 16 levels version).

One solution I haven't been able to implement yet in Demake Shaders (due to health issues) is to use a Bayer Matrix texture with precomputed values baked in for the error dispersion, since texture lookup in the GPU is way faster than array lookups (as reported by PixelEater in viewtopic.php?p=1067486#p1067486 ) and that got me thinking..... Wouldn't such an approach also help the Fuzz effect slowdown? The textures would need to cycle and "animate" to replace the offset (on the x axis and time), sure, so it would take more memory, but such is the nature of pre-computing of exchanging memory usage for execution time.

I'd do the experiment myself but I am not familiar enough with what this shader is exactly doing. When I look at the fuzz effect in the unity port (for a high res "vanilla" example) I see a repeating pattern in the animated noise. Is it maybe possible to generate such a texture beforehand (like an animated error-difussion-like, I mean)? It doesn't need to be exact, but convincing.
User avatar
RuscoIstar
 
Joined: 15 Sep 2019
Location: Mexico City
Discord: RuscoIstar#2063
Twitch ID: rodylg
Github ID: rodylg
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: Intel with Vulkan Support

Re: Fuzz style Software is slow. A theoretical cheaper versi

Postby dpJudas » Thu Dec 23, 2021 12:22 am

As you mention yourself, the performance could be improved a lot if it used a texture for the offset table. The math you list was basically me porting the code directly from the software renderer. Since it ran okay on my computer I didn't really bother try optimize the code any further. Most of it could be done using floating point math:

Code: Select allExpand view
vec4 ProcessTexel()
{
    float fuzz_random_x_size = float(FUZZ_RANDOM_X_SIZE);
    float next_random = fract(timer * 35.0 / fuzz_random_x_size);
    float fuzzpos = float(fuzz_random_x_offset[int(next_random * fuzz_random_x_size]) / 100.0;
    float fuzzscale = 200.0 / uViewHeight;
    float x = floor(gl_FragCoord.x);
    float y = floor(gl_FragCoord.y);
    float scaled_x = fract(x * fuzzscale / fuzz_random_x_size);
    float fuzz_x = float(fuzz_random_x_offset[int(scaled_x * fuzz_random_x_size)]);
    float fuzzcount = float(FUZZTABLE);
    float fuzz = fract((fuzz_x + y * fuzzscale) / fuzzcount);
    float alpha = float(fuzzoffset[int(fuzz) * fuzzcount]) / 32.0f;
    return vec4(0.0, 0.0, 0.0, getTexel(vTexCoord.st).a * alpha);
}

Note that if you change the tables to be floating point tables then some of the int to float casts can be removed as well.

I'd do the experiment myself but I am not familiar enough with what this shader is exactly doing. When I look at the fuzz effect in the unity port (for a high res "vanilla" example) I see a repeating pattern in the animated noise. Is it maybe possible to generate such a texture beforehand (like an animated error-difussion-like, I mean)? It doesn't need to be exact, but convincing.

I haven't ever tried the unity version, but note that I personally believe any port running the original fuzz effect as-is at higher resolutions is rendering the effect wrong. By wrong I mean it doesn't look like the effect was originally intended when they decided to use it (the way it looks at 320x200). The original effect fuzzed only in the Y direction with a fixed offset. When you go to a higher resolution the fuzzing pattern is visually apparent because it failed to scale it up in both the X and Y directions. The fuzz changes way too often in the Y direction and the spill-over offset values used for the next column of pixels changes way too often for the X direction, too.
dpJudas
 
 
 
Joined: 28 May 2016

Re: Fuzz style Software is slow. A theoretical cheaper versi

Postby RuscoIstar » Thu Dec 23, 2021 4:15 am

Coming from a C/C++ background I was very surprised to learn that type casting is a bit tiny expensive in GLSL. Then again, the GPU hardware is optimized for floating point operations (because vectors), which then are the preferred data type, so int operations are slow (not that noticeable unless you do a lot of it tho) as they are turned into a binary representation of floats for the silicon to handle it anyway. I've read that "a good shader compiler should fix these kinds of optimization problems on its own" but driver implementations vary so wildly, specially in the low spec world... 🙄

Probably that snippet will help enough tho. I only have an Intel HD Graphics 620 and a Mali-G71 MP8 as my "low range" test hardware but I think any improvement could be better profiled with more than just a couple devices. Hopefully we can get some more people on board to fix this once. If I have time I'll report back if there are significative gains to add to this profiling. It's funny because even in the high end I have once measured a dip in the fps count. Kill a mob of spectres on stairs and then get real close to them from below (or crouch near them) until they fill the screen. That's my test case.

About the unity version..... Yeeeaaahhh... I see what you are saying now. Didn't know it's rendering it in a buggy way, but then again the effect, as written by John, was most likely not generalized to resolutions higher than 320x200. In the unity port, even at 720p, the columns become visible in your weapon when you have a partial invisibility and they even change and shift if you turn left and right, and that's probably not right.
User avatar
RuscoIstar
 
Joined: 15 Sep 2019
Location: Mexico City
Discord: RuscoIstar#2063
Twitch ID: rodylg
Github ID: rodylg
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: Intel with Vulkan Support


Return to Feature Suggestions

Who is online

Users browsing this forum: No registered users and 1 guest