[RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Software projects like source ports (3DGE, Eternity, etc), launchers like ZDL, and other useful utilities belong in this forum.
Forum rules
The Projects forums are ONLY for YOUR PROJECTS! If you are asking questions about a project, either find that project's thread, or start a thread in the General section instead.

Got a cool project idea but nothing else? Put it in the project ideas thread instead!

Projects for any Doom-based engine (especially 3DGE) are perfectly acceptable here too.

Please read the full rules for more details.

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Fri Sep 08, 2017 4:30 am

I've released a new ZDoom32 version with the OpenGL renderer from GZDoom 1.9.1 with later fixes and additions, has shaders for GL 2.0 cards. I've also switched to FMOD Ex 4.36 for sound.
BTW one of the goals of this project is to preserve the old codebase.
I've also updated ZDoom LE for even older hardware. See first post for download and detailed info.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Tue Sep 12, 2017 6:07 pm

Out of curiosity i've tried ZDoom32 on a Willamette P4 and it's even slower than on the 1 GHz P3 with truecolor.
I get only 24 fps without SSE2 and 37 with SSE2 (i got 32 fps on the P3). There are some graphics artifacts on truecolor with that old crappy ati driver (radeon 9200 SE).
In GL mode i get the 320x200 resolution with that card and on my 'modern' intel as well. Low res modes even work with GZDoom 3.1.0 on my laptop. This was a surprise since even with the ancient GL 1.2 driver on my Geforce2 i only get 640 and up.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby dpJudas » Tue Sep 12, 2017 7:02 pm

The netburst architecture used by the P4 is famous for being quite poor. It doesn't surprise me too much if the P3 could beat it in some situations. For example, a quick google search got me this quote from a different forum:

P3 vs P4 depends on the instruction set that is being used. The P4 introduced new instructions that could operate much more quickly than the set of instructions that you had to use on a P3. Additionally because of the much much bigger pipeline unexpected branches are much more costly. Take for example using MPEG video encoding as a benchmark. When the P4 was released the P3 kicked it's ass. Then intel rewrote the the encoding to optimize it and it was faster (though not by a huge margin) on the P4 than the P3.


Edit: By the way, if your goal is to improve performance on low end, you could try change the loads and stores from being unaligned to aligned, i.e. _mm_loadu_si128 to _mm_load_si128. The catch is that the screen frame buffer needs to be 16-byte aligned, which I actually think Randi made sure it already is (see the alignment code in DSimpleCanvas::Resize). It could provide as much as a 30% performance boost in the right situations. I mainly didn't do this because I had no way of testing it as the i7 haswell architecture no longer raises an exception if you use _mm_load_si128 on an unaligned address.
dpJudas
 
Joined: 28 May 2016

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Wed Sep 13, 2017 5:59 am

Thanks, i've tried to do what you said but it crashes on my machine (athlon64) so no way.
Those performance numbers where for the default resolution (640). Those willamettes were very slow, i guess no problem for northwoods anyway.
However i've found a bad merge of v_video.cpp (i guess tortoisegit merge did something strange again since a blank was removed there) but fortunately it did no harm, i've silently updated the release tough.

Edit: it was a 1.5 GHz Willamette, next i will try a 1.8 GHz Northwood.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Sun Sep 17, 2017 4:03 am

I've swapped the 1.5 GHz Willamette for a 2.4 GHz Northwood and now i get 36 fps for the non SSE2 version and
47 fps with SSE2. It's not that great compared to the P3.

@dpJudas: As i mentioned i disabled capped skies for truecolor (i got no sky since there were no drawers).
I don't know how to make them work in truecolor even looking the same as in paletted mode, any suggestion?
Thanks.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby dpJudas » Sun Sep 17, 2017 7:18 am

drfrag wrote:As i mentioned i disabled capped skies for truecolor (i got no sky since there were no drawers).
I don't know how to make them work in truecolor even looking the same as in paletted mode, any suggestion?

I'm not sure I fully understand the problem. You have a working palette skycap drawer, but lost its companion truecolor drawer? If so, where did you get the palette one from?

By the way, it might be well worth the effort to make sure DSimpleCanvas::Resize allocates 16-byte aligned memory and pitch for the screen frame buffer. All the 4col drawing is already aligned to 4 pixels, which for 8 bit means a dword and for 32 bit a 16-byte aligned position. The only thing required to make sure this is true is that the initial memory address chosen by DSimpleCanvas::Resize must start at a 16-byte alignment, and the pitch chosen must be 4-pixel aligned.

A simple implementation of it might look like this:

Code: Select allExpand view
#include <cstdlib>

void DSimpleCanvas::Resize(int width, int height)
{
   Width = width;
   Height = height;

   if (MemBuffer)
   {
      std::free(MemBuffer);
      MemBuffer = nullptr;
   }

   int bytes_per_pixel = Bgra ? 4 : 1;

   // Randi commented that making the pitch a power of 2 is very bad for performance.
   // Not sure which CPU architectures that applies to. But let's just add one extra quad
   // of pixels to the alignment to make it less likely to happen.
   //
   // This pitch will be 4-byte aligned for 8 bit and 16-byte aligned for 32 bit.
   Pitch = (width + 3) / 4 * 4 + 4;

   MemBuffer = (uint8_t*)std::aligned_alloc(4 * bytes_per_pixel, Pitch * height * bytes_per_pixel);
   memset (MemBuffer, 0, Pitch * height * bytes_per_pixel);
}

DSimpleCanvas::~DSimpleCanvas ()
{
   if (MemBuffer)
   {
      std::free(MemBuffer);
      MemBuffer = nullptr;
   }
}

This should allow you to use the aligned load and stores for the SSE drawers. I'd test it myself, but I don't have a CPU old enough to crash if the memory is not aligned.
dpJudas
 
Joined: 28 May 2016

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Sun Sep 17, 2017 11:37 am

After the merge of truecolor ZDoom with the later ZDoom capped skies were there, you added them to ZDoom with the 'Add support for capping sky with a solid color' commit from 10/19/2016. But in truecolor mode there was no sky just HOM so i disabled them. Then you added them for truecolor and softpoly in 'Capped sky rendering' from 12/03/2016 but already using LLVM. I'm using the old C++ drawers.

On the aligment issue:
Cool, thanks very much i will give it a try. But only for the SSE2 drawers? I replaced all ocurrences in r_draw* and tried the SSE2 version when i got the crash. Increasing performance for the non SSE2 version is even more interesting.
What about DataL1LineSize? Will removing it hurt performance for P6 architecture? (P2, P3, PM and original Core). Becouse ZDoom32 is optimized for P6 cpus.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby dpJudas » Sun Sep 17, 2017 3:25 pm

drfrag wrote:After the merge of truecolor ZDoom with the later ZDoom capped skies were there, you added them to ZDoom with the 'Add support for capping sky with a solid color' commit from 10/19/2016. But in truecolor mode there was no sky just HOM so i disabled them. Then you added them for truecolor and softpoly in 'Capped sky rendering' from 12/03/2016 but already using LLVM. I'm using the old C++ drawers.

I'm almost sure you can find the companion truecolor version of the skycap C++ drawers somewhere in the QZDoom repository. At least it would surprise me if only wrote a palette version of them - I don't really play Doom with the 8-bit stuff myself after all. :)

Cool, thanks very much i will give it a try. But only for the SSE2 drawers? I replaced all ocurrences in r_draw* and tried the SSE2 version when i got the crash. Increasing performance for the non SSE2 version is even more interesting.
What about DataL1LineSize? Will removing it hurt performance for P6 architecture? (P2, P3, PM and original Core). Becouse ZDoom32 is optimized for P6 cpus.

Yes, just the drawers. As for the DataL1LineSize stuff, I have no idea how this will affect the older architectures. I suppose you could try profile it on those machines if you got them around. Otherwise, you could make it only use the new alignment code for machines that have SSE2 support (there's a bSSE2 field in that CPU global variable afair).

If it still crashes after you made those changes, try change it so it is only for the 4col drawers that uses aligned loads and stores.
dpJudas
 
Joined: 28 May 2016

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Mon Sep 18, 2017 3:35 am

Sorry but i'm afraid there's no such thing, i've found two more commits from the QZDoom side. The first one adds LLVM truecolor sky drawers while the second one adds the paletted drawers. C++ drawers were actually removed before.

QZ 10/04/16 * Remove C++ and SSE drawers
QZ 10/15/16 * Move true color sky drawing to its own drawers and change r_stretchsky to false as the new drawers can fade to a solid color
QZ 10/19/16 * Palette version of sky drawers

So i'd need to know how to use the paletted drawers in truecolor mode even if they wouldn't look that great.

About aligned loads and stores if it's only for SSE2 i could just use the USE_SSE define in DSimpleCanvas::Resize.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby dpJudas » Tue Sep 19, 2017 4:09 pm

To port a palette drawer to true color means that the dc_dest and dc_source variables must be cast from uint8_t to uint32_t. This is because both textures and the framebuffer are in BGRA format. You should be able to see how those casts are done by looking at the other RGBA drawers. Besides from this you no longer need to look up the actual BGRA value via the GPalette global. Roughly put the drawer's main loop would look something ala this:

Code: Select allExpand view
uint32_t *dest = (uint32_t*)_dest; // Framebuffer in 32 bit bgra format
uint32_t *source = (uint32_t*)_source; // Texture in 32 bit bgra format
int pitch = _pitch;
int count = _count;
fixed_t fracpos = _fracpos;
fixed_t fracstep = _iscale;
while (count > 0)
{
    uint32_t fg = source[fracpos >> FRACBITS]; // Read pixel from texture

    int alpha = 127; // To do: calculate this same way as the pal8 version does
    int inv_alpha = 256 - alpha;

    // Grab the color components from the pixel
    uint32_t red = RPART(fg);
    uint32_t green = GPART(fg);
    uint32_t blue = BPART(fg);

    // Alphablend texture pixel with capcolor:
    red = (red * alpha + capcolor_red * inv_alpha + 127) >> 8;
    green = (green * alpha + capcolor_green * inv_alpha + 127) >> 8;
    blue = (blue * alpha + capcolor_blue * inv_alpha + 127) >> 8;

    // Store result in frame buffer
    *dest = 0xff000000 | (red << 16) | (green << 8) | blue;

    // Move to next write and sampling positions
    fracpos += fracstep;
    dest += pitch;
    count--;
}

The palette version of those drawers do almost exactly the same as I showed in that code snippet. The only difference is that they use the RGB32k table to map from a truecolor alpha blended value back to a palette index. You could start with using them as a base, where you change the int8 pointers to int32 and remove the GPalette + RGB32k table lookups.
dpJudas
 
Joined: 28 May 2016

Re: [RELEASED] ZDoom32 2.8.2 (ZDoom is undead)

Postby drfrag » Wed Sep 20, 2017 4:58 am

WOW! Thanks very much again. I'll get into it one of these days.
User avatar
drfrag
ZDoom32 developer.
 
Joined: 23 Apr 2004

Previous

Return to Software and Ports

Who is online

Users browsing this forum: No registered users and 3 guests