[UPDATED] ZDoom32 2.8.6a (ZDoom is undead)

Game Engines like EDGE, LZDoom, QZDoom, ECWolf, and others, go in this forum
Forum rules
The Projects forums are ONLY for YOUR PROJECTS! If you are asking questions about a project, either find that project's thread, or start a thread in the General section instead.

Got a cool project idea but nothing else? Put it in the project ideas thread instead!

Projects for any Doom-based engine are perfectly acceptable here too.

Please read the full rules for more details.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

I've released a new ZDoom32 version with the OpenGL renderer from GZDoom 1.9.1 with later fixes and additions, has shaders for GL 2.0 cards. I've also switched to FMOD Ex 4.36 for sound.
BTW one of the goals of this project is to preserve the old codebase.
I've also updated ZDoom LE for even older hardware. See first post for download and detailed info.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

Out of curiosity i've tried ZDoom32 on a Willamette P4 and it's even slower than on the 1 GHz P3 with truecolor.
I get only 24 fps without SSE2 and 37 with SSE2 (i got 32 fps on the P3). There are some graphics artifacts on truecolor with that old crappy ati driver (radeon 9200 SE).
In GL mode i get the 320x200 resolution with that card and on my 'modern' intel as well. Low res modes even work with GZDoom 3.1.0 on my laptop. This was a surprise since even with the ancient GL 1.2 driver on my Geforce2 i only get 640 and up.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by dpJudas »

The netburst architecture used by the P4 is famous for being quite poor. It doesn't surprise me too much if the P3 could beat it in some situations. For example, a quick google search got me this quote from a different forum:
P3 vs P4 depends on the instruction set that is being used. The P4 introduced new instructions that could operate much more quickly than the set of instructions that you had to use on a P3. Additionally because of the much much bigger pipeline unexpected branches are much more costly. Take for example using MPEG video encoding as a benchmark. When the P4 was released the P3 kicked it's ass. Then intel rewrote the the encoding to optimize it and it was faster (though not by a huge margin) on the P4 than the P3.
Edit: By the way, if your goal is to improve performance on low end, you could try change the loads and stores from being unaligned to aligned, i.e. _mm_loadu_si128 to _mm_load_si128. The catch is that the screen frame buffer needs to be 16-byte aligned, which I actually think Randi made sure it already is (see the alignment code in DSimpleCanvas::Resize). It could provide as much as a 30% performance boost in the right situations. I mainly didn't do this because I had no way of testing it as the i7 haswell architecture no longer raises an exception if you use _mm_load_si128 on an unaligned address.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

Thanks, i've tried to do what you said but it crashes on my machine (athlon64) so no way.
Those performance numbers where for the default resolution (640). Those willamettes were very slow, i guess no problem for northwoods anyway.
However i've found a bad merge of v_video.cpp (i guess tortoisegit merge did something strange again since a blank was removed there) but fortunately it did no harm, i've silently updated the release tough.

Edit: it was a 1.5 GHz Willamette, next i will try a 1.8 GHz Northwood.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

I've swapped the 1.5 GHz Willamette for a 2.4 GHz Northwood and now i get 36 fps for the non SSE2 version and
47 fps with SSE2. It's not that great compared to the P3.

@dpJudas: As i mentioned i disabled capped skies for truecolor (i got no sky since there were no drawers).
I don't know how to make them work in truecolor even looking the same as in paletted mode, any suggestion?
Thanks.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by dpJudas »

drfrag wrote:As i mentioned i disabled capped skies for truecolor (i got no sky since there were no drawers).
I don't know how to make them work in truecolor even looking the same as in paletted mode, any suggestion?
I'm not sure I fully understand the problem. You have a working palette skycap drawer, but lost its companion truecolor drawer? If so, where did you get the palette one from?

By the way, it might be well worth the effort to make sure DSimpleCanvas::Resize allocates 16-byte aligned memory and pitch for the screen frame buffer. All the 4col drawing is already aligned to 4 pixels, which for 8 bit means a dword and for 32 bit a 16-byte aligned position. The only thing required to make sure this is true is that the initial memory address chosen by DSimpleCanvas::Resize must start at a 16-byte alignment, and the pitch chosen must be 4-pixel aligned.

A simple implementation of it might look like this:

Code: Select all

#include <cstdlib>

void DSimpleCanvas::Resize(int width, int height)
{
	Width = width;
	Height = height;

	if (MemBuffer)
	{
		std::free(MemBuffer);
		MemBuffer = nullptr;
	}

	int bytes_per_pixel = Bgra ? 4 : 1;

	// Randi commented that making the pitch a power of 2 is very bad for performance.
	// Not sure which CPU architectures that applies to. But let's just add one extra quad
	// of pixels to the alignment to make it less likely to happen.
	//
	// This pitch will be 4-byte aligned for 8 bit and 16-byte aligned for 32 bit.
	Pitch = (width + 3) / 4 * 4 + 4;

	MemBuffer = (uint8_t*)std::aligned_alloc(4 * bytes_per_pixel, Pitch * height * bytes_per_pixel);
	memset (MemBuffer, 0, Pitch * height * bytes_per_pixel);
}

DSimpleCanvas::~DSimpleCanvas ()
{
	if (MemBuffer)
	{
		std::free(MemBuffer);
		MemBuffer = nullptr;
	}
}
This should allow you to use the aligned load and stores for the SSE drawers. I'd test it myself, but I don't have a CPU old enough to crash if the memory is not aligned.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

After the merge of truecolor ZDoom with the later ZDoom capped skies were there, you added them to ZDoom with the 'Add support for capping sky with a solid color' commit from 10/19/2016. But in truecolor mode there was no sky just HOM so i disabled them. Then you added them for truecolor and softpoly in 'Capped sky rendering' from 12/03/2016 but already using LLVM. I'm using the old C++ drawers.

On the aligment issue:
Cool, thanks very much i will give it a try. But only for the SSE2 drawers? I replaced all ocurrences in r_draw* and tried the SSE2 version when i got the crash. Increasing performance for the non SSE2 version is even more interesting.
What about DataL1LineSize? Will removing it hurt performance for P6 architecture? (P2, P3, PM and original Core). Becouse ZDoom32 is optimized for P6 cpus.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by dpJudas »

drfrag wrote:After the merge of truecolor ZDoom with the later ZDoom capped skies were there, you added them to ZDoom with the 'Add support for capping sky with a solid color' commit from 10/19/2016. But in truecolor mode there was no sky just HOM so i disabled them. Then you added them for truecolor and softpoly in 'Capped sky rendering' from 12/03/2016 but already using LLVM. I'm using the old C++ drawers.
I'm almost sure you can find the companion truecolor version of the skycap C++ drawers somewhere in the QZDoom repository. At least it would surprise me if only wrote a palette version of them - I don't really play Doom with the 8-bit stuff myself after all. :)
Cool, thanks very much i will give it a try. But only for the SSE2 drawers? I replaced all ocurrences in r_draw* and tried the SSE2 version when i got the crash. Increasing performance for the non SSE2 version is even more interesting.
What about DataL1LineSize? Will removing it hurt performance for P6 architecture? (P2, P3, PM and original Core). Becouse ZDoom32 is optimized for P6 cpus.
Yes, just the drawers. As for the DataL1LineSize stuff, I have no idea how this will affect the older architectures. I suppose you could try profile it on those machines if you got them around. Otherwise, you could make it only use the new alignment code for machines that have SSE2 support (there's a bSSE2 field in that CPU global variable afair).

If it still crashes after you made those changes, try change it so it is only for the 4col drawers that uses aligned loads and stores.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

Sorry but i'm afraid there's no such thing, i've found two more commits from the QZDoom side. The first one adds LLVM truecolor sky drawers while the second one adds the paletted drawers. C++ drawers were actually removed before.

QZ 10/04/16 * Remove C++ and SSE drawers
QZ 10/15/16 * Move true color sky drawing to its own drawers and change r_stretchsky to false as the new drawers can fade to a solid color
QZ 10/19/16 * Palette version of sky drawers

So i'd need to know how to use the paletted drawers in truecolor mode even if they wouldn't look that great.

About aligned loads and stores if it's only for SSE2 i could just use the USE_SSE define in DSimpleCanvas::Resize.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by dpJudas »

To port a palette drawer to true color means that the dc_dest and dc_source variables must be cast from uint8_t to uint32_t. This is because both textures and the framebuffer are in BGRA format. You should be able to see how those casts are done by looking at the other RGBA drawers. Besides from this you no longer need to look up the actual BGRA value via the GPalette global. Roughly put the drawer's main loop would look something ala this:

Code: Select all

uint32_t *dest = (uint32_t*)_dest; // Framebuffer in 32 bit bgra format
uint32_t *source = (uint32_t*)_source; // Texture in 32 bit bgra format
int pitch = _pitch;
int count = _count;
fixed_t fracpos = _fracpos;
fixed_t fracstep = _iscale;
while (count > 0)
{
    uint32_t fg = source[fracpos >> FRACBITS]; // Read pixel from texture

    int alpha = 127; // To do: calculate this same way as the pal8 version does
    int inv_alpha = 256 - alpha;

    // Grab the color components from the pixel
    uint32_t red = RPART(fg);
    uint32_t green = GPART(fg);
    uint32_t blue = BPART(fg);

    // Alphablend texture pixel with capcolor:
    red = (red * alpha + capcolor_red * inv_alpha + 127) >> 8;
    green = (green * alpha + capcolor_green * inv_alpha + 127) >> 8;
    blue = (blue * alpha + capcolor_blue * inv_alpha + 127) >> 8;

    // Store result in frame buffer
    *dest = 0xff000000 | (red << 16) | (green << 8) | blue;

    // Move to next write and sampling positions
    fracpos += fracstep;
    dest += pitch;
    count--;
}
The palette version of those drawers do almost exactly the same as I showed in that code snippet. The only difference is that they use the RGB32k table to map from a truecolor alpha blended value back to a palette index. You could start with using them as a base, where you change the int8 pointers to int32 and remove the GPalette + RGB32k table lookups.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

WOW! Thanks very much again. I'll get into it one of these days.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

@dpJudas: Sorry but as expected i'm not capable of doing it, the code you've posted is very different from the paletted drawers.
I've tried to do what you said, at least now i get a screwed sky instead of nothing and it won't crash. There are vertical black stripes and nothing over the top of the sky. Could you please have a look at it? Thanks.

Code: Select all

void R_DrawSingleSkyCol1_rgba(uint32_t solid_top, uint32_t solid_bottom)
{
	uint32_t *dest = (uint32_t*)dc_dest;
	int count = dc_count;
	int pitch = dc_pitch;
	const uint32_t *source0 = (uint32_t*)bufplce[0];
	int textureheight0 = bufheight[0];

	int32_t frac = vplce[0];
	int32_t fracstep = vince[0];

	int start_fade = 2; // How fast it should fade out

	int solid_top_r = RPART(solid_top);
	int solid_top_g = GPART(solid_top);
	int solid_top_b = BPART(solid_top);
	int solid_bottom_r = RPART(solid_bottom);
	int solid_bottom_g = GPART(solid_bottom);
	int solid_bottom_b = BPART(solid_bottom);

	for (int index = 0; index < count; index++)
	{
		uint32_t sample_index = (((((uint32_t)frac) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
		uint32_t fg = source0[sample_index];

		int alpha_top = MAX(MIN(frac >> (16 - start_fade), 256), 0);
		int alpha_bottom = MAX(MIN(((2 << 24) - frac) >> (16 - start_fade), 256), 0);

		if (alpha_top == 256 && alpha_bottom == 256)
		{
			*dest = fg;
		}
		else
		{
			int inv_alpha_top = 256 - alpha_top;
			int inv_alpha_bottom = 256 - alpha_bottom;

			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_top + solid_top_r * inv_alpha_top) >> 8;
			c_green = (c_green * alpha_top + solid_top_g * inv_alpha_top) >> 8;
			c_blue = (c_blue * alpha_top + solid_top_b * inv_alpha_top) >> 8;
			c_red = (c_red * alpha_bottom + solid_bottom_r * inv_alpha_bottom) >> 8;
			c_green = (c_green * alpha_bottom + solid_bottom_g * inv_alpha_bottom) >> 8;
			c_blue = (c_blue * alpha_bottom + solid_bottom_b * inv_alpha_bottom) >> 8;
			*dest = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
		}

		frac += fracstep;
		dest += pitch;
	}
}

void R_DrawSingleSkyCol4_rgba(uint32_t solid_top, uint32_t solid_bottom)
{
	uint32_t *dest = (uint32_t*)dc_dest;
	int count = dc_count;
	int pitch = dc_pitch;
	const uint32_t *source0[4] = { (uint32_t*)bufplce[0], (uint32_t*)bufplce[1], (uint32_t*)bufplce[2], (uint32_t*)bufplce[3] };
	int textureheight0 = bufheight[0];
	int32_t frac[4] = { (int32_t)vplce[0], (int32_t)vplce[1], (int32_t)vplce[2], (int32_t)vplce[3] };
	int32_t fracstep[4] = { (int32_t)vince[0], (int32_t)vince[1], (int32_t)vince[2], (int32_t)vince[3] };
	uint32_t output[4];

	int start_fade = 2; // How fast it should fade out

	int solid_top_r = RPART(solid_top);
	int solid_top_g = GPART(solid_top);
	int solid_top_b = BPART(solid_top);
	int solid_bottom_r = RPART(solid_bottom);
	int solid_bottom_g = GPART(solid_bottom);
	int solid_bottom_b = BPART(solid_bottom);
	uint32_t solid_top_fill = solid_top;
	uint32_t solid_bottom_fill = solid_bottom;
	solid_top_fill = (solid_top_fill << 24) | (solid_top_fill << 16) | (solid_top_fill << 8) | solid_top_fill;
	solid_bottom_fill = (solid_bottom_fill << 24) | (solid_bottom_fill << 16) | (solid_bottom_fill << 8) | solid_bottom_fill;

	// Find bands for top solid color, top fade, center textured, bottom fade, bottom solid color:
	int fade_length = (1 << (24 - start_fade));
	int start_fadetop_y = (-frac[0]) / fracstep[0];
	int end_fadetop_y = (fade_length - frac[0]) / fracstep[0];
	int start_fadebottom_y = ((2 << 24) - fade_length - frac[0]) / fracstep[0];
	int end_fadebottom_y = ((2 << 24) - frac[0]) / fracstep[0];
	for (int col = 1; col < 4; col++)
	{
		start_fadetop_y = MIN(start_fadetop_y, (-frac[0]) / fracstep[0]);
		end_fadetop_y = MAX(end_fadetop_y, (fade_length - frac[0]) / fracstep[0]);
		start_fadebottom_y = MIN(start_fadebottom_y, ((2 << 24) - fade_length - frac[0]) / fracstep[0]);
		end_fadebottom_y = MAX(end_fadebottom_y, ((2 << 24) - frac[0]) / fracstep[0]);
	}
	start_fadetop_y = clamp(start_fadetop_y, 0, count);
	end_fadetop_y = clamp(end_fadetop_y, 0, count);
	start_fadebottom_y = clamp(start_fadebottom_y, 0, count);
	end_fadebottom_y = clamp(end_fadebottom_y, 0, count);

	// Top solid color:
	for (int index = 0; index < start_fadetop_y; index++)
	{
		*((uint32_t*)dest) = solid_top_fill;
		dest += pitch;
		for (int col = 0; col < 4; col++)
			frac[col] += fracstep[col];
	}

	// Top fade:
	for (int index = start_fadetop_y; index < end_fadetop_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			uint32_t fg = source0[col][sample_index];

			int alpha_top = MAX(MIN(frac[col] >> (16 - start_fade), 256), 0);
			int inv_alpha_top = 256 - alpha_top;
			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_top + solid_top_r * inv_alpha_top) >> 8;
			c_green = (c_green * alpha_top + solid_top_g * inv_alpha_top) >> 8;
			c_blue = (c_blue * alpha_top + solid_top_b * inv_alpha_top) >> 8;

			output[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
			frac[col] += fracstep[col];
		}
		*((uint32_t*)dest) = *((uint32_t*)output);
		dest += pitch;
	}

	// Textured center:
	for (int index = end_fadetop_y; index < start_fadebottom_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			output[col] = source0[col][sample_index];

			frac[col] += fracstep[col];
		}

		*((uint32_t*)dest) = *((uint32_t*)output);
		dest += pitch;
	}

	// Fade bottom:
	for (int index = start_fadebottom_y; index < end_fadebottom_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			uint32_t fg = source0[col][sample_index];

			int alpha_bottom = MAX(MIN(((2 << 24) - frac[col]) >> (16 - start_fade), 256), 0);
			int inv_alpha_bottom = 256 - alpha_bottom;
			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_bottom + solid_bottom_r * inv_alpha_bottom) >> 8;
			c_green = (c_green * alpha_bottom + solid_bottom_g * inv_alpha_bottom) >> 8;
			c_blue = (c_blue * alpha_bottom + solid_bottom_b * inv_alpha_bottom) >> 8;
			output[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;

			frac[col] += fracstep[col];
		}
		*((uint32_t*)dest) = *((uint32_t*)output);
		dest += pitch;
	}

	// Bottom solid color:
	for (int index = end_fadebottom_y; index < count; index++)
	{
		*((uint32_t*)dest) = solid_bottom_fill;
		dest += pitch;
	}
}
Edit: okay i made a minor modification and now the top is red but still stripes everywhere and some HOM above.
I could upload the changes to GitHub if that helps.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

I know the vertical stripes are due to only one byte of four being displayed, what i don't know is how to fix it. Close but not there.
As _mental_ suggested i've uploaded the preliminary drawers to a new branch at https://github.com/drfrag666/gzdoom/commits/tccapsky
@dpJudas: I think you could fix this in a few minutes since this time is your code so i've added you as collaborator thru GitHub. I'd be grateful, of course when you have time. I could fix compilation later. :3:
To any other devs if someone wants to go ahead and fix this you're welcome as well. Thanks. :)
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by dpJudas »

Sorry, forgot to reply on this as I was at work at the time I read it.

The reason you're getting the black stripes is because it is writing to a temporary output buffer and then only the first column is copied to the destination. Basically, change it from this:

Code: Select all

      for (int col = 0; col < 4; col++)
      {
         ...
         output[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
         ...
      }
      *((uint32_t*)dest) = *((uint32_t*)output);
To this:

Code: Select all

      for (int col = 0; col < 4; col++)
      {
         ...
         ((uint32_t*)dest)[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
         ...
      }
As for the 'nothing over the top of the sky', I assume you mean it isn't filling it with the correct solid cap color. That is most likely because solid_top is a palette index. You can fix that either by assigning a BGRA color instead where solid_top is assigned, or you can cheat a bit by looking up the BGRA color for the palette index: GPalette.BaseColors[solid_top].d.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Post by drfrag »

Thanks again it's mostly fixed now, with the exception of the top part.
I don't think solid_top is a palette index, GPalette.BaseColors[solid_top].d causes a crash. It's the solid top part the one that is wrong. I get vertical stripes again (one column out of four is bad this time) and as you can see on the image there's some HOM.
Screenshot_Doom_20170924_012033.jpg

Code: Select all

void R_DrawSingleSkyCol1_rgba(uint32_t solid_top, uint32_t solid_bottom)
{
	uint32_t *dest = (uint32_t*)dc_dest;
	int count = dc_count;
	int pitch = dc_pitch;
	const uint32_t *source0 = (uint32_t*)bufplce[0];
	int textureheight0 = bufheight[0];

	int32_t frac = vplce[0];
	int32_t fracstep = vince[0];

	int start_fade = 2; // How fast it should fade out

	int solid_top_r = RPART(solid_top);
	int solid_top_g = GPART(solid_top);
	int solid_top_b = BPART(solid_top);
	int solid_bottom_r = RPART(solid_bottom);
	int solid_bottom_g = GPART(solid_bottom);
	int solid_bottom_b = BPART(solid_bottom);

	for (int index = 0; index < count; index++)
	{
		uint32_t sample_index = (((((uint32_t)frac) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
		uint32_t fg = source0[sample_index];

		int alpha_top = MAX(MIN(frac >> (16 - start_fade), 256), 0);
		int alpha_bottom = MAX(MIN(((2 << 24) - frac) >> (16 - start_fade), 256), 0);

		if (alpha_top == 256 && alpha_bottom == 256)
		{
			*dest = fg;
		}
		else
		{
			int inv_alpha_top = 256 - alpha_top;
			int inv_alpha_bottom = 256 - alpha_bottom;

			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_top + solid_top_r * inv_alpha_top) >> 8;
			c_green = (c_green * alpha_top + solid_top_g * inv_alpha_top) >> 8;
			c_blue = (c_blue * alpha_top + solid_top_b * inv_alpha_top) >> 8;
			c_red = (c_red * alpha_bottom + solid_bottom_r * inv_alpha_bottom) >> 8;
			c_green = (c_green * alpha_bottom + solid_bottom_g * inv_alpha_bottom) >> 8;
			c_blue = (c_blue * alpha_bottom + solid_bottom_b * inv_alpha_bottom) >> 8;
			*dest = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
		}

		frac += fracstep;
		dest += pitch;
	}
}

void R_DrawSingleSkyCol4_rgba(uint32_t solid_top, uint32_t solid_bottom)
{
	uint32_t *dest = (uint32_t*)dc_dest;
	int count = dc_count;
	int pitch = dc_pitch;
	const uint32_t *source0[4] = { (uint32_t*)bufplce[0], (uint32_t*)bufplce[1], (uint32_t*)bufplce[2], (uint32_t*)bufplce[3] };
	int textureheight0 = bufheight[0];
	int32_t frac[4] = { (int32_t)vplce[0], (int32_t)vplce[1], (int32_t)vplce[2], (int32_t)vplce[3] };
	int32_t fracstep[4] = { (int32_t)vince[0], (int32_t)vince[1], (int32_t)vince[2], (int32_t)vince[3] };

	int start_fade = 2; // How fast it should fade out

	int solid_top_r = RPART(solid_top);
	int solid_top_g = GPART(solid_top);
	int solid_top_b = BPART(solid_top);
	int solid_bottom_r = RPART(solid_bottom);
	int solid_bottom_g = GPART(solid_bottom);
	int solid_bottom_b = BPART(solid_bottom);
	uint32_t solid_top_fill = solid_top;
	uint32_t solid_bottom_fill = solid_bottom;
	solid_top_fill = (solid_top_fill << 24) | (solid_top_fill << 16) | (solid_top_fill << 8) | solid_top_fill;
	solid_bottom_fill = (solid_bottom_fill << 24) | (solid_bottom_fill << 16) | (solid_bottom_fill << 8) | solid_bottom_fill;

	// Find bands for top solid color, top fade, center textured, bottom fade, bottom solid color:
	int fade_length = (1 << (24 - start_fade));
	int start_fadetop_y = (-frac[0]) / fracstep[0];
	int end_fadetop_y = (fade_length - frac[0]) / fracstep[0];
	int start_fadebottom_y = ((2 << 24) - fade_length - frac[0]) / fracstep[0];
	int end_fadebottom_y = ((2 << 24) - frac[0]) / fracstep[0];
	for (int col = 1; col < 4; col++)
	{
		start_fadetop_y = MIN(start_fadetop_y, (-frac[0]) / fracstep[0]);
		end_fadetop_y = MAX(end_fadetop_y, (fade_length - frac[0]) / fracstep[0]);
		start_fadebottom_y = MIN(start_fadebottom_y, ((2 << 24) - fade_length - frac[0]) / fracstep[0]);
		end_fadebottom_y = MAX(end_fadebottom_y, ((2 << 24) - frac[0]) / fracstep[0]);
	}
	start_fadetop_y = clamp(start_fadetop_y, 0, count);
	end_fadetop_y = clamp(end_fadetop_y, 0, count);
	start_fadebottom_y = clamp(start_fadebottom_y, 0, count);
	end_fadebottom_y = clamp(end_fadebottom_y, 0, count);

	// Top solid color:
	for (int index = 0; index < start_fadetop_y; index++)
	{
		*((uint32_t*)dest) = solid_top_fill;
		dest += pitch;
		for (int col = 0; col < 4; col++)
			frac[col] += fracstep[col];
	}

	// Top fade:
	for (int index = start_fadetop_y; index < end_fadetop_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			uint32_t fg = source0[col][sample_index];

			int alpha_top = MAX(MIN(frac[col] >> (16 - start_fade), 256), 0);
			int inv_alpha_top = 256 - alpha_top;
			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_top + solid_top_r * inv_alpha_top) >> 8;
			c_green = (c_green * alpha_top + solid_top_g * inv_alpha_top) >> 8;
			c_blue = (c_blue * alpha_top + solid_top_b * inv_alpha_top) >> 8;

			((uint32_t*)dest)[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;
			frac[col] += fracstep[col];
		}
		dest += pitch;
	}

	// Textured center:
	for (int index = end_fadetop_y; index < start_fadebottom_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			((uint32_t*)dest)[col] = source0[col][sample_index];

			frac[col] += fracstep[col];
		}
		dest += pitch;
	}

	// Fade bottom:
	for (int index = start_fadebottom_y; index < end_fadebottom_y; index++)
	{
		for (int col = 0; col < 4; col++)
		{
			uint32_t sample_index = (((((uint32_t)frac[col]) << 8) >> FRACBITS) * textureheight0) >> FRACBITS;
			uint32_t fg = source0[col][sample_index];

			int alpha_bottom = MAX(MIN(((2 << 24) - frac[col]) >> (16 - start_fade), 256), 0);
			int inv_alpha_bottom = 256 - alpha_bottom;
			int c_red = RPART(fg);
			int c_green = GPART(fg);
			int c_blue = BPART(fg);
			c_red = (c_red * alpha_bottom + solid_bottom_r * inv_alpha_bottom) >> 8;
			c_green = (c_green * alpha_bottom + solid_bottom_g * inv_alpha_bottom) >> 8;
			c_blue = (c_blue * alpha_bottom + solid_bottom_b * inv_alpha_bottom) >> 8;
			((uint32_t*)dest)[col] = 0xff000000 | (c_red << 16) | (c_green << 8) | c_blue;

			frac[col] += fracstep[col];
		}
		dest += pitch;
	}

	// Bottom solid color:
	for (int index = end_fadebottom_y; index < count; index++)
	{
		*((uint32_t*)dest) = solid_bottom_fill;
		dest += pitch;
	}
}
Post Reply

Return to “Game Engines”