I've patched ZDoom's software renderer to use true color (32bpp) for output. The github fork for it can be found at https://github.com/dpjudas/zdoom.
The patch does the following:
[*] Changes the GetBuffer() format from BYTE. to uint32_t. This changes the render target format from palette indices to BGRA8.
[*] Introduces two new rendering variables (dc_light + ds_light) that allows dc_colormap to point at the original palette instead of light remapping with GETPALOOKUP. This effectively allows light levels to be rendered with fixed_t precision.
[*] Updates all the software renderers (R_DrawSpanP_C, vlinec4, etc.) to output RGBA8 instead of a palette index. This means the transparency functions do true alpha blending, and the opaque functions use dc_light/ds_light for true color shading.
I'd like to add the following things before doing a pull request:
[*] Currently only works with the Direct3D 9 target. This is because the GetBuffer format change affects the platform specific frame buffer code.
[*] Replace all the assembly render functions with SSE2 compiler intrinsics. My current patch only tests this for two of the most critical functions.
[*] Maybe evict any palette completely from the renderer by changing the format of GetColumn from BYTE to uint32_t as well. This would enable true color textures, but I'm a bit cautious about this because I'm not sure if Doom is doing any kinds of 'trickery' with dc_colormap beyond GETPALOOKUP.
Since there's a lot of work involved in completely finishing the patch I'm wondering what ZDoom's stance is on all of this. In particular:
[*] Replacing the assembly render functions with compiler SSE intrinsics. My assembly skills aren't remotely good enough to beat a compiler, but I do know to write SSE 2 intrinsics that should beat the performance of any assembly I'd personally be able to write. As a bonus, such a removal should make it significantly easier to refactor the render global variables and functions into classes.
[*] There is a slight performance cost of using 32 bpp for output. In particular for the Direct3D 9 target where the final palette lookup is done by a GPU shader. I'm not sure there will be any real performance loss in the end if GetColumn is also modified to uint32_t, because then there's no palette lookups done at all by the software renderer.
[*] My primary reason for doing all this was to get rid of the banding effects caused by GETPALOOKUP and its ugly palette colors at darker levels. This makes the game look a lot prettier in my opinion (bit more like GZDoom, but diminishing light intact) - yet, if someone has a desire to have the game look exactly like DOS doom they might not like the change.
So what do you think? What would it take to get such changes accepted - if at all?
