Projects that have specifically been abandoned or considered "dead" get moved here, so people will quit bumping them. If your project has wound up here and it should not be, contact a moderator to have it moved back to the land of the living.

### Re: [gizdoom] Lazy palette shader

One other thing that could probably improve the algorithm a little bit would be to convert the colors to linear space first before calculating the distance. Doom uses sRGB, which has a gamma of 2.2, but a fast approximation is 2.0. The BestColor function thus ends up like this:

Code: Select all
`int BestColor (const uint32 *pal_in, int r, int g, int b, int first, int num){   // Convert search color to linear (using 2.0 gamma instead of 2.2 for speed reasons):   r = r * r;   g = g * g;   b = b * b;   const PalEntry *pal = (const PalEntry *)pal_in;   int bestcolor = first;   uint32_t bestdist = 0xffffffff;   for (int color = first; color < num; color++)   {      int x = r - (pal[color].r * (int)pal[color].r);      int y = g - (pal[color].g * (int)pal[color].g);      int z = b - (pal[color].b * (int)pal[color].b);      uint32_t dist = (uint32_t)(x*x) + (uint32_t)(y*y) +(uint32_t)(z*z);      if (dist < bestdist)      {         if (dist == 0)            return color;         bestdist = dist;         bestcolor = color;      }   }   return bestcolor;}`

Edit: removed the divide by 255 because it only causes precision loss for this algorithm.
dpJudas

Joined: 28 May 2016

### Re: [gizdoom] Lazy palette shader

Well - again, you're right.

If you want to test this diff, the cvar has been renamed to "gl_palette_tonemap_algorithm", but I think you already know what the results are going to be.

I had an idea about writing an HSV checker, instead of an RGB checker. I think the code for conversion was already available in v_palette.cpp, maybe? However, doing this for 256k colors will be a bit of a hit on the processing time, I'm guessing. Everything else in the algorithm would be the same, it would just input and compare HSV's instead.

Also in this diff I went ahead and implemented the table clear on CVar change, so a restart is no longer required since it seems to automatically rebuild it. What I would like to see is at least a higher precision in selecting the hue and luminescence values, not *as* worried about saturation. It seems by comparing last-values my algorithm was inadvertently correct but only because of the order of the Doom palette. It's the same algorithm by which I generate custom COLORMAPs and TINTTABs (heretic/hexen), though, and I've used it so much in the past 18 years that I have it memorized.
You do not have the required permissions to view the files attached to this post.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Take a look at SLADE's colorimetry options. You can make its color matcher check in RGB, HSV, or Lab spaces. Honestly I feel the RGB matching is what works best for Doom.
Gez

Joined: 06 Jul 2007

### Re: [gizdoom] Lazy palette shader

dpJudas wrote:One other thing that could probably improve the algorithm a little bit would be to convert the colors to linear space first before calculating the distance. Doom uses sRGB, which has a gamma of 2.2, but a fast approximation is 2.0. The BestColor function thus ends up like this:

Oh wow, I didn't even see that post. That implementation has errors (I suspect overflow), so I am going to try and fix it and then try it with GZDoom. If that works it may even work better in software mode, too.

Gez wrote:Take a look at SLADE's colorimetry options. You can make its color matcher check in RGB, HSV, or Lab spaces. Honestly I feel the RGB matching is what works best for Doom.

I'll take your word for it.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Okay, I fixed the new version BestColor using doubles instead of uint_32's. We're taking 256 to the power of 4, while technically it fits inside a uint_32 it does not allow for much else.

For now, I kept all code for comparison using CVars, but that's not going to be suitable for a pull request. If I do such a thing, I will most likely be removing these CVars and there will be only one algorithm available.

Now - onto the implementation:

I like it, but it does have problems of its own. Mostly - it gets the primary colors correct, but when there are colors that are really distant from the palette (have higher/lower saturations than what's available) it doesn't seem to always pick the best "looking" color.

Best place to try this out - Doom 1 - "E3M3" do "warp -800 150" - use "gl_lightmode 8" - also, pretty much all of E3M7.

Here's the diff with changes, along with a pre-compiled exe for others to try (since devbuilds are behind):

https://mega.nz/#!9JkjABwZ!Fh8ci7vKUPZc ... drE4T7Qi5k (reuploaded, was missing gzdoom.pk3 which also changed since latest devbuild)

And here's just the diff:
gzdoom-custom-algorithms.diff.gz

To test this implementation, please use
Code: Select all
`gzdoom +set gl_palette_tonemap_algorithm 2 +set r_colormatcher_algorithm 1`

This only needs to be done once - the CVars will save to your config.

If you change r_colormatcher_algorithm, in both Software and GL mode the tables do not automatically get rebuilt. A "restart" ccmd will fix that. gl_palette_tonemap_algorithm does not have this problem and will take effect immediately. These CVars are in place for comparison and notation purposes.

With all that said - maybe there's a happy medium somewhere? A slightly lower gamma ramp, if available, maybe?

Also - I really do not like the idea of setting a "max" distance. The reason why I initialize the distance on the first iteration of the loop is it allows me to alter the algorithm any way I choose without having a maximum other than what the number structures themselves support. Notice how even you had to set a new max after putting in the gamma ramp? If we fix the v_palette.cpp code, I really would prefer the first iteration to initialize the numbers, rather than allowing it to be done outside the loop as before. I think the code looks cleaner, and it is more flexible. Correct me if I am wrong.

One more thing - if we do put in a floating exponent gamma ramp (such as 2.2 or 1.4 or whatever) - you can still retain the speed in processing simply by first going ahead and applying exponents to (0-255) and storing the results in an array, since these are the only numbers you are going to use, anyway. Then you only have to read the array to get the proper results, rather than recalculating it 768k times. Will speed up processing tremendously. Doom's original Gamma correction system did hard-coded precalculated arrays for exactly this reason.
You do not have the required permissions to view the files attached to this post.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Eruanna wrote:Okay, I fixed the new version BestColor using doubles instead of uint_32's. We're taking 256 to the power of 4, while technically it fits inside a uint_32 it does not allow for much else.

Oops. Yes, my code did overflow a 32 bit integer. For a final version I would probably have divided by two (a very tiny loss of precision in this case), but your double version works too of course.

Eruanna wrote:I like it, but it does have problems of its own. Mostly - it gets the primary colors correct, but when there are colors that are really distant from the palette (have higher/lower saturations than what's available) it doesn't seem to always pick the best "looking" color.

My knowledge of color theory is a bit too poor to suggest anything else than the calculation of distance in linear space I already did. I know that for light calculations linear colors are very important, but for a 'best color' algorithm? I honestly don't know.

I can create a SSE 2 accelerated version of the function (since apparently original version needed speed enough to get a MMX version), but this only makes sense if you'd rather have this color match algorithm over the original.

Eruanna wrote:With all that said - maybe there's a happy medium somewhere? A slightly lower gamma ramp, if available, maybe?

You can certainly experiment with that. Just change the "r = r*r" stuff to "r = pow(r, gamma)" and see what effects you get. Linear comparison is with 2.2 as gamma, and sRGB is with 1.0, which is same as the original algorithm.

Eruanna wrote:Also - I really do not like the idea of setting a "max" distance.

I normally would write the code as you do too, but in this case I think the usage of a "max" distance is to get rid of an extra comparison in the speed critical inner loop.

I agree for general code one should always go for cleaner, readable and more flexible code layout - the only exception I'd say is when you can no longer afford the luxery, which is often the case in any part of zdoom where there's suddenly MMX or SSE involved. Whoever wrote the original function found the entire C function to be too slow and replaced it with MMX.

Eruanna wrote:One more thing - if we do put in a floating exponent gamma ramp (such as 2.2 or 1.4 or whatever) - you can still retain the speed in processing simply by first going ahead and applying exponents to (0-255) and storing the results in an array, since these are the only numbers you are going to use, anyway. Then you only have to read the array to get the proper results, rather than recalculating it 768k times. Will speed up processing tremendously. Doom's original Gamma correction system did hard-coded precalculated arrays for exactly this reason.

If that optimization alone makes it meet the speed requirements, yes. Not sure why this function needs to be so fast, but apparently it does. If it still needs to be faster you can apply the table pre-process, "max" distance, and SSE all at the same time.
dpJudas

Joined: 28 May 2016

### Re: [gizdoom] Lazy palette shader

dpJudas wrote:Not sure why this function needs to be so fast, but apparently it does.

Potentially 67.1~ million loops. That's just for the tonemap part. You know, I am amazed it works as fast as it does with that many iterations.

Here's some other uses of that function:

ZDoom's internal rgb555 table: 8.4~ million iterations
ZDoom creates a new colormap/fogmap (live while playing): 2~ million iterations

I'll try and get something working later on. I'm probably going to remove the CVars from my working copy and just use the defaults I suggested earlier - if it does seem like we need to improve ZDoom's original algorithm, I would prefer my code to be submittable soon.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Eruanna wrote:Potentially 67.1~ million loops. That's just for the tonemap part. You know, I am amazed it works as fast as it does with that many iterations.

You make that sound like it is a lot. That's about 33 frames of pixels at 1920x1080, something zdoom does at 200+ FPS on my computer - on a single core. For a one time calculation of a table this is not a big deal. If that was its only usage I'd personally not optimize the function unless someone noticed slow boot times.

Eruanna wrote:ZDoom creates a new colormap/fogmap (live while playing): 2~ million iterations

Now this is a much better reason. Microstuttering while playing sucks.
dpJudas

Joined: 28 May 2016

### Re: [gizdoom] Lazy palette shader

You may already know this - and sorry if you do, it's not my intent to explain something you already know.

If you want to test what it "feels" like, CPU-wise, on older systems, Windows Vista and later includes an option to dial down the CPU frequency directly in the power options. Just set the CPU frequency to something like 5% minimum 5% maximum (it will only go as low as your CPU actually allows, don't worry), and then force ZDoom to run on a single core. (Using cmd.exe, you can type "start /affinity 0x1 zdoom.exe" to do this) If ZDoom still runs decently after you do this, you've probably optimized it enough.

The TestFade and TestColor CCMD's are great for testing this function - because it will do those "live" iterations I mentioned. If you notice a delay after using them, then others may notice it too.

It's not a "perfect" emulation, per se, I am sure you already know, because modern CPU's have a lot of optimizations and better cache than older ones, but it does help you to get a feel for what's going on.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

People with older systems are used to waiting anyway. More seriously, because the old function used MMX I would personally include a SSE intrinsics version for any updated version of it - with the assumption that it was time critical in the past and therefore might still be.

If my goal was to make zdoom boot times better tho, then I'd focus my attention far more on other areas where there is more to gain. Like the JPEG, PNG, and pk3 loaders.
dpJudas

Joined: 28 May 2016

### Re: [gizdoom] Lazy palette shader

I think you're right - the oldest system running ZDoom right now probably isn't going to notice much difference between any ASM algorithm at all and any C algorithm as far as this function goes simply because of how little it is actually used during live play.

However, I will still pre-calculate the exponent table, because there's no way in hell I am running a floating exponent calculation millions of times for a single table, especially with numbers that are guaranteed to repeat a number of times through said calculations.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

For Engoo I only deal with generation on load time only. It's still somewhat playable on Pentium, though the 18-bit lookup generation on first startup takes too long making it a bit prohibitive on lower specs (486/P5's)

would be faster if that 18bpp table was cached per palette lump's checksum to file, and then all the map-load color table builds go through that table instead of bestcolor

FmodEx and all the new actor code already makes Zdoom inappropriate for the low pentium end anyway, and the more that's brought up, the more useful legacy performance features get deprecated in the spite of it *cough*r_detail*cough*

leileilol
ダークエルフ!!!!!!!!!!

Joined: 30 May 2004
Location: GNU/Hell

### Re: [gizdoom] Lazy palette shader

I didn't get nearly such slow times, even when I crippled my CPU. If that does become a problem we can definitely cache the tables, but I don't think it will be needed.

However, doing pow()'s without tables did increase the load times a lot. It took an additional 2-4 seconds to load the game on V_Init, and then upon entering the game (during tonemap gen) it took another 10 or so seconds. This was on a pretty powerful CPU, too (about 2~ years old). Keep in mind - without pow()'s it was nearly instant before, which is why I think such caching shouldn't be necessary.

I'm going to develop something that creates a pow() table the first time the function is executed only, and then reuses it on every further iteration. That should decrease load times back to normal, again, I think.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Alright. Done. This really looks the best so far, in my opinion. Still uses BestColor. @Lei - if you see this, can you test this with any of your older systems you may have available and let me know if the load times are acceptable? (You will have to compile 32-bit builds yourself, though, sorry)

Diff:
patch3.diff.gz

Precompiled:
https://mega.nz/#!RMV2ST7L!bvefG5ivawEM ... 9hrZiOVgp0
You do not have the required permissions to view the files attached to this post.

Rachael
Webmaster

Joined: 13 Jan 2004
Discord: Rachael#3767
Operating System: Windows 10/8.1/8/201x 64-bit
OS Test Version: No (Using Stable Public Version)
Graphics Processor: nVidia with Vulkan support

### Re: [gizdoom] Lazy palette shader

Can't atm, i'lll mention pcem though since it's a little more reliable for canon instruction cycle timings than most pc emulators/vms, only problem is the usual setup and rom hunts

IIRC In engoo, making a rgb555 table takes 2 seconds on a Pentium 166, and a rgb666 table would take a bit over 20, and that's with BestColor pulled out of qlumpy. On a AM5x86-160 (one of the fastest 486s) this process takes at least a minute

leileilol
ダークエルフ!!!!!!!!!!

Joined: 30 May 2004
Location: GNU/Hell

PreviousNext