View Full Version : Plugin testing software
Cyberman
September 17th, 2001, 05:24
Hello all..
anyone know of PSXemuPro plugin testing software of any sort?
IE polygon sprite memory to GPU GPU to memory
texture testing..etc.
Cyb
Lewpy
September 17th, 2001, 13:52
Err, what's wrong with using it in an emulator? :)
Seriously though, that's the only testing environment I have ever used.
Cyberman
September 17th, 2001, 20:49
That's what I'm using now.. the big impedance is the amount of time I have to wait to see if one function (in this case gaurad shaded textured 3 vertice polygon) works properly.
However I'm not sure what I'm doing wrong with it.
I think texture caching should be my next priority after getting that to work. The thing is rather slow with textures at the moment.
After that the addition affects like subtractive filtering (shadows) etc. Anyhow.. I was hoping someone had a standard test suite I could use to see if the basics worked right :)
Cyb
Lewpy
September 18th, 2001, 14:09
Believe me, I know where you are coming from :)
The amount of times I've had to load up games, and take them through to a certain point, just so I can check one small item in a DMA Chain :/
From that point-of-view, save-states are a God-send :) You can create a save-state just before the point you want to check, and you can get straight back to it time and time again :) Wonderful stuff.
As to your problems, what is your problem? I may be able to help, and if I can't I am sure Pete is lurking here watching ;)
I always put the emphasis on getting things looking right, before trying to tackle speeding things up. The way I see it, you've got to get the algorithm right before you optimise it. If you try to optimise a half-finished algorithm, you are probably wasting time as you will need to re-write it to get the algorithm correct. Well, something like that, anyway :) My point is, until you know the full scope of what you need to do, be careful spending time optimising what you have, as you may have to alter it later: altering optimised code is much harder than un-optimised ... in general.
Cyberman
September 18th, 2001, 17:34
Hmmm good idea.. I need a game I can test that won't conflict with my actual game playing states ;)
My current problem is gourad shading a texture.
I am modifying Dudies old soft GPU and I have the textures working mostly now the biggest problem I have is with shading the textures.
Pete Bernert
September 18th, 2001, 18:57
Mmm... g-shaded textured polys are not much harder to do than solid colored
textured ones...
You just have to calculate the final pixel out of the g-shaded color and the
texel color instead of the solid color and the texel... that's all!?!
Cyberman
September 19th, 2001, 02:30
Only if you don't have the vertices all backwards.. I'm getting transparent edges too, that isn't good. However I fixed the thing. Still too dark but it works.. VERY SLOW (sigh) and some of the textures are just wrong oh well time will fix that stuff. I noticed that if the textures go off the edge of the screen then there is a problem (IE the cliping area).
On thing at a time I guess.
Next is to fix the artifacts in the cliping area.
NickK
September 19th, 2001, 04:31
Good luck in software cyb! I can give you some tips if you need 'em :D
Cyberman
September 19th, 2001, 19:59
My current problem is semi transparency mode.
I know how to read the mode specified for the textures but I don't know how to know if they are on or off. IE if I have to do semitransparency applications to EVERYthing?
There doesn't seem to be an OFF transparency mode I just have the following listed in my docs:
ABR
0 0.5xB+0.5 x F Semi transparency mode
1 1.0xB+1.0 x F
2 1.0xB-1.0 x F
3 1.0xB+0.25 x F
Yet there is no NON transparency mode.. it's like it's always on.. this normal or am I missing something? :)
NickK
September 19th, 2001, 20:19
For some primitive headers, there will be flags to tell if the primitive uses semi trans mode or not. You should check that out. You'll also need to do a little bit of working inside of the primitive code too :)
NickK
September 19th, 2001, 20:21
This should help!
--------------------------------------------------------------------------
Command Packets, Data Register.
--------------------------------------------------------------------------
Primitive command packets use an 8 bit command value which is present in
all packets. They contain a 3 bit type block and a 5 bit option block of
which the meaning of the bits depend on the type. Layout is as follows:
Type:
000 GPU command
001 Polygon primitive
010 Line primitive
011 Sprite primitive
100 Transfer command
111 Environment command
Configuration of the option blocks for the primitives is as follows:
Polygon:
| 7 6 5 | 4 | 3 | 2 | 1 | 0 |
| 0 0 1 |IIP|3/4|Tme|Abe|Tge|
Line:
| 7 6 5 | 4 | 3 | 2 | 1 | 0 |
| 0 1 0 |IIP|Pll| 0 |Abe| 0 |
Sprite:
| 7 6 5 | 4 3 | 2 | 1 | 0 |
| 1 0 0 | Size |Tme|Abe| 0 |
IIP 0 Flat Shading
1 Gouroud Shading
3/4 0 3 vertex polygon
1 4 vertex polygon
Tme 0 Texture mapping off
1 on
Abe 0 Semi transparency off
1 on
Tge 0 Brightness calculation at time of texture mapping on
1 off. (draw texture as is)
Size 00 Free size (Specified by W/H)
01 1 x 1
10 8 x 8
11 16 x 16
Pll 0 Single line (2 vertices)
1 Polyline (n vertices)
NickK
September 20th, 2001, 00:01
You might also want the rest of the GPU docs :D
NickK
September 20th, 2001, 00:03
I'm actually going through now and documenting all of the possible combonations that can take place... Flat, abr: on ... Flat, abr: off, etc.. It takes a bit of time :)
Cyberman
September 20th, 2001, 00:48
Geeze I feel dumb
Oh well it's encoded into the instruction not the texture info.
this thing is going to be REALLY slow because of all the overhead.
I can see why Pete said it's pretty hard to emulate the PSX GPU with graphics cards made for the PC.
I'm still amazed at how fast Kazzuya's GPU is :)
Cyb
NickK
September 20th, 2001, 00:53
It's not really that hard. Infact, it can be done fast if you setup everything right. For example, I assume that you're using the dma chain procedure and the general primitive setup that was found in Pete's frame work and Duddie's soft GPU, right? Well, all of thoes numbers and function call tables only have to be modified slightly here and there to get the desired effect. This is OK for most primitives and commands, but you'll have to work something special into that processing to get the poly lines to work correctly!
Good luck, cyb! :)
NickK
September 20th, 2001, 04:41
Um, guess I didn't completely think about what I was meaning to say again :D
You CAN check out to see if the different states are set! The command (the last byte of the first dword) contains the same exact information that's in the header!
If you notice, 0x20 - A flat triangle, will have these bits set:
+---+-+-+-+-+-+
|001|0|0|0|0|0| = 0x20
+---+-+-+-+-+-+
Bits 5 - 7 = 001 (Polygon)
Bit 4 = 0 (Flat shading)
Bit 3 = 0 (3 points)
Bit 2 = 0 (No texture)
Bit 1 = 0 (No semi trans mode)
Bit 0 = 0 (Use original color, of course!)
Well I had an idea to have a function call for different states, which would be faster, but much messier and take up some extra memory that could be used for other things. So I've decided to go with the bit operations instead. Shouldn't be too bad on most systems! :)
* Update! *
As a matter of fact, I've been using the command data all of this time. Haha. I make myself laugh soo hard sometimes because I forget these little things :D
Have fun with this!
Cyberman
September 20th, 2001, 06:51
Yeah.. I have transparency working somewhat but it looks weird in some cases (heh)
As for doing a function for each of the primitives in that table.
Yeah.. I'm adding them in that way but now I know why Kazzuya's and Pete's soft gpu's are so BIG hehehe lots of things to put in there.
Furtunately it's not so bad adding new things in I add the name in the table and when I add the actual code in I undefine the define and it works IE
#define primPolyGTS3 primPolyGT3
#ifndef primPolyGTS3
#endif
When I have the code done I just comment the #define there.
And poof it in there. (S for semi transparency).
It seems I need to add a lot of functions still
more fun :)
As for the polyline.. that shouldn't be TOO hard to implement (looks) unless you are doing the shaded one (grin).
Cyberman
September 21st, 2001, 01:27
I am having troubles with some of the transparency modes looks like. Namely some textures don't seem to be too transparent or have edge problems. In fact all my poly's have edge problems.
It seems a popular choice for lighting special affects is using a 4sided poly that's textured and gaurad shaded and has semitransparency.. This is killer to process unfortunately.
The image is horribly dark too.. not much you can do there at the moment so here there are from Tomb Raider 4 last revelation. First one is my GPU the second is from Kaz's
Cyberman
September 21st, 2001, 01:29
This is with Kazzuya's GPU
Almost identical position in the game
NickK
September 21st, 2001, 02:31
Make sure you're processing each pixel's color correctly. Here's some code for doing some (slow) semi transparency:
*colDest is a pointer to the destination RGB colors
*colSrc is a pointer of the screen's RGB color
void processSemiTrans(unsigned char sTransMode, unsigned char *colDest, unsigned char *colSrc)
{
int r, g, b; //For saturation (up/down) SLOW (but not with MMX :)
//Float -> Color value
//1.0f -> 255
//0.5f -> 128
//0.25f -> 64
//Where B is the source (screen) and F is current color that you
//want to apply transparency to.
r = colDest[COL_R];
g = colDest[COL_G];
b = colDest[COL_B];
switch (sTransMode)
{
case 0: //0.5 x B + 0.5 x F
r >>= 1;
b >>= 1;
g >>= 1;
//Add unsigned with saturation
r += colSrc[COL_R] >> 1;
g += colSrc[COL_G] >> 1;
b += colSrc[COL_B] >> 1;
break;
case 1: //1.0 x B + 1.0 x F
r += colSrc[COL_R];
g += colSrc[COL_G];
b += colSrc[COL_B];
break;
case 2: //1.0 x B - 1.0 x F
r -= colSrc[COL_R];
g -= colSrc[COL_G];
b -= colSrc[COL_B];
break;
case 3: //1.0 x B +0.25 x F
r += colSrc[COL_R] >> 2;
g += colSrc[COL_G] >> 2;
b += colSrc[COL_B] >> 2;
break;
}
//Simulated saturation
if (r > 255)
r = 255;
else if (r < 0)
r = 0;
if (g > 255)
g = 255;
else if (g < 0)
g = 0;
if (b > 255)
b = 255;
else if (b < 0)
b = 0;
//Store result to be used when drawing to the screen
colDest[COL_R] = (unsigned char)r;
colDest[COL_G] = (unsigned char)g;
colDest[COL_B] = (unsigned char)b;
}
So I hope this gives you a better idea of what you should do? :)
I could also code this in ASM using MMX :D (Faster!)
Xeven
September 21st, 2001, 02:44
wow, each time i read these emu author posts, i realize how dumb i am... :(
NickK
September 21st, 2001, 03:26
Aww, don't feel too bad Xeven. :( Just pretend it's not there? ... And stop staring at me with those big glassy eyes! :p
NickK
September 21st, 2001, 03:49
Here's a little treat from me Xeven. Please accept this candy, and don't gobble it up all at once :D
Xeven
September 21st, 2001, 03:56
great! (uhh what flavor is it?) see.. im suckling slowly...:D
NickK
September 21st, 2001, 04:04
Well, if I knew what eye candy tasted like... :) Maybe it just tastes like sugar frosted sugar balls? :D
Cyberman
September 21st, 2001, 04:23
You are using 24 bit color?
I'm only using the PSX's 15 bit color model .. maybe that's my problem.
// individual lines done this way for me.
switch(textABR)
{
case 0: // B * 0.5 + F * 0.5
for(j=Xmin[i];j<Xmax[i];j++)
{
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = ((back & 0x1f)+(color & 0x1f))>>1;
cG = ((back & 0x3e0)+(color & 0x3e0))>>1;
cB = ((back & 0x7C00)+(color & 0x7C00))>>1;
psxVuw[i*1024+j] = cR|(cG & 0x3E0)|(cB & 0x7C00);
}
posX+=difX;
posY+=difY;
}
break;
case 1: // B + F
for(j=Xmin[i];j<Xmax[i];j++)
{
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = (back & 0x1f)+(color & 0x1f);
if (cR > 0x1f) cR = 0x1f;
cG = (back & 0x3e0)+(color & 0x3e0);
if (cG > 0x3E0) cG = 0x3E0;
cB = (back & 0x7C00)+(color & 0x7C00);
if (cB > 0x7C00) cB= 0x7C00;
psxVuw[i*1024+j] = cR| cG | cB;
}
posX+=difX;
posY+=difY;
}
break;
case 2: // B - F
for(j=Xmin[i];j<Xmax[i];j++)
{
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
if ((back & 0x1f)> (color &0x1f))
cR = (back & 0x1f)-(color & 0x1f);
else
cR = 0;
if ((back & 0x3e0) > (color & 0x3E0))
cG = (back & 0x3e0)-(color & 0x3e0);
else
cG = 0;
if ((back & 0x7C00) > (back & 0x7C00))
cB = (back & 0x7C00)-(color & 0x7C00);
else
cB = 0;
psxVuw[i*1024+j] = cR | cG | cB;
}
posX+=difX;
posY+=difY;
}
break;
case 3: // B + F/4
for(j=Xmin[i];j<Xmax[i];j++)
{
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = (back & 0x1f)+(color & 0x1f)>>2;
if (cR > 0x1f) cR = 0x1f;
cG = (back & 0x3e0)+(color & 0x3e0)>>2;
if (cG > 0x3E0) cG = 0x3E0;
cB = (back & 0x7C00)+(color & 0x7C00)>>2;
if (cB > 0x7C00) cB= 0x7C00;
psxVuw[i*1024+j] = cR| (cG & 0x3E0 )| (cB & 0x7C00);
}
posX+=difX;
posY+=difY;
}
break;
}
I first get a color from the texture buffer the read the background color. I split the colors up and add substract etc. They are still in 15 bit BGR tooples I wonder if that is making everything so darn dark on me to...
Cyb
Xeven
September 21st, 2001, 04:26
<HEAD>
<SCRIPT LANGUAGE="JavaScript"><!--
var scroll2 = true;
var myDir = 'L';
var scrollingText2 = "AND NOW, BACK TO OUR REGULAR PROGRAMMING.... ";
function scrollit2() {
if (scroll2) {
document.scrolling2.textbox.value = scrollingText2;
scrollingText2 = scrollingText2.substring(1)+ scrollingText2.substring(0,1);
setTimeout('scrollit2()',100);
}
}
//--></SCRIPT>
</HEAD>
<BODY onLoad="scrollit2()">
<FORM NAME="scrolling2" onSubmit="return false">
<INPUT NAME="textbox" TYPE="TEXT" SIZE="40" VALUE="">
</FORM>
:D
NickK
September 21st, 2001, 04:44
You're using the 16-bit R5G5B5 color format. I've made some changes to the code. There were some problems, I think, with the bit modifying and addition as well as the color mixing code at the bottom of each case. So I've fixed it (I hope!) Please try this modified code and see if it solves your problems! :)
// individual lines done this way for me.
switch(textABR)
{
case 0: // B * 0.5 + F * 0.5
for(j=Xmin[i];j {
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = ((back & 0x1f)+(color & 0x1f)) >> 1;
cG = (((back >> 5) & 0x1F) + ((color >> 5) & 0x1F)) >> 1;
cB = (((back >> 10) & 0x1F) +((color >> 10) & 0x1F)) >> 1;
if (cR > 0x1F)
cR = 0x1F;
if (cB > 0x1F)
cB = 0x1F;
if (cG > 0x1F)
cG = 0x1F;
psxVuw[i*1024+j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
case 1: // B + F
for(j=Xmin[i];j {
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = (back & 0x1f)+(color & 0x1f);
if (cR > 0x1f)
cR = 0x1f;
cG = ((back >> 5) & 0x1F) + ((color >> 5) & 0x1F);
if (cG > 0x1F)
cG = 0x1F;
cB = ((back >> 10) & 0x1F) + ((color >> 10) & 0x1F);
if (cB > 0x1F)
cB = 0x1F;
psxVuw[i*1024+j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
case 2: // B - F
for(j=Xmin[i];j {
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = (back & 0x1F) - (color & 0x1F);
if (cR < 0)
cR = 0;
cG = ((back >> 5) & 0x1F) - ((color >> 5) & 0x1F);
if (cG < 0)
cG = 0;
cB = ((back >> 10) & 0x1F) - ((color >> 10) & 0x1F);
if (cB < 0)
cB = 0;
psxVuw[i*1024+j] = ((cB & 0x1F) << 10) | ((cG & 0x1F) << 5) | (cR & 0x1F);
}
posX+=difX;
posY+=difY;
}
break;
case 3: // B + F/4
for(j=Xmin[i];j {
color = textBuf[(posY>>16)*512+(posX>>16)];
if(color)
{
back = psxVuw[i*1024+j];
cR = (back & 0x1F) + ((color & 0x1F) >> 2);
if (cR > 0x1F)
cR = 0x1F;
cG = ((back >> 5) & 0x1F) + (((color >> 5) & 0x1F) >> 2);
if (cG > 0x1F)
cG = 0x1F;
cB = ((back >> 10) & 0x1F) + (((color >> 10) & 0x1F) >> 2);
if (cB > 0x1F)
cB = 0x1F;
psxVuw[i*1024+j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
}
NickK
September 21st, 2001, 04:50
This code is the same as above, but it uses shifts instead of muls.
// individual lines done this way for me.
switch(textABR)
{
case 0: // B * 0.5 + F * 0.5
for(j=Xmin[i];j {
color = textBuf[((posY>>16) << 9) + (posX>>16)];
if(color)
{
back = psxVuw[(i << 10) + j];
cR = ((back & 0x1f)+(color & 0x1f)) >> 1;
cG = (((back >> 5) & 0x1F) + ((color >> 5) & 0x1F)) >> 1;
cB = (((back >> 10) & 0x1F) +((color >> 10) & 0x1F)) >> 1;
if (cR > 0x1F)
cR = 0x1F;
if (cB > 0x1F)
cB = 0x1F;
if (cG > 0x1F)
cG = 0x1F;
psxVuw[(i << 10) + j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
case 1: // B + F
for(j=Xmin[i];j {
color = textBuf[((posY>>16) << 9) + (posX>>16)];
if(color)
{
back = psxVuw[(i << 10) + j];
cR = (back & 0x1f)+(color & 0x1f);
if (cR > 0x1f)
cR = 0x1f;
cG = ((back >> 5) & 0x1F) + ((color >> 5) & 0x1F);
if (cG > 0x1F)
cG = 0x1F;
cB = ((back >> 10) & 0x1F) + ((color >> 10) & 0x1F);
if (cB > 0x1F)
cB = 0x1F;
psxVuw[(i << 10) + j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
case 2: // B - F
for(j=Xmin[i];j {
color = textBuf[((posY>>16) << 9) + (posX>>16)];
if(color)
{
back = psxVuw[(i << 10) + j];
cR = (back & 0x1F) - (color & 0x1F);
if (cR < 0)
cR = 0;
cG = ((back >> 5) & 0x1F) - ((color >> 5) & 0x1F);
if (cG < 0)
cG = 0;
cB = ((back >> 10) & 0x1F) - ((color >> 10) & 0x1F);
if (cB < 0)
cB = 0;
psxVuw[(i << 10) + j] = ((cB & 0x1F) << 10) | ((cG & 0x1F) << 5) | (cR & 0x1F);
}
posX+=difX;
posY+=difY;
}
break;
case 3: // B + F/4
for(j=Xmin[i];j {
color = textBuf[((posY>>16) << 9) + (posX>>16)];
if(color)
{
back = psxVuw[(i << 10) + j];
cR = (back & 0x1F) + ((color & 0x1F) >> 2);
if (cR > 0x1F)
cR = 0x1F;
cG = ((back >> 5) & 0x1F) + (((color >> 5) & 0x1F) >> 2);
if (cG > 0x1F)
cG = 0x1F;
cB = ((back >> 10) & 0x1F) + (((color >> 10) & 0x1F) >> 2);
if (cB > 0x1F)
cB = 0x1F;
psxVuw[(i << 10) + j] = (cB << 10) | (cG << 5) | cR;
}
posX+=difX;
posY+=difY;
}
break;
}
I think I'm giving you too much help, eh? ;)
Cyberman
September 22nd, 2001, 18:05
One can never be TOO helpful..
Anyhow about sentance fragments... ;)
Brightness, this has been fixed as you can see by the attached image now I have to fix the semi transparency modes.
What was wrong with the brightness is that brightness goes from 0 to 0x7F not 0 to 0xFF. if it's 0x80 it says it's carried on from the last, I'm not sure what that means (sigh). Anyhow things aren't as dark but I think I have a problem with blue and red as you can see by the pic.
I'm not sure what primitives are being used in my test game for spurious lighting (streaked) but I know this much it doesn't look like pete's or Kazzuya's output so it's likely wrong :) I also noticed that the edges of the polygons are visible, reality is they should blend together not show there edges.. I'm puzzled. Also the polygons aren't rendering properly on small surfaces (look at the legs). The blue problem is obvious by the yellow cast beneath the person, so it's likely in the semi transparency area.
NickK
September 22nd, 2001, 18:28
Well, I noticed that you're modifying Duddie's Soft GPU. To tell you the truth, the only good thing about that plugin (in my opinion) was the DMA chain code :) You're on your own with texture caching, since there are different techniques to accomplish that. You should to some research on how texture pages and such work before jumping into it. I found it useful, because there were a lot of things I didn't know when I started.
Also, here's my opinion (again):
It might be easier to start a new plugin that try to improve on someone elses. It's much easier for you to develop one if you know how all of the code works, and it's good experience.
Well, if you need any help, just ask! :)
gerbilcannon
September 22nd, 2001, 19:46
Hehe, reading all of this is making me want to start coding again. Ayei!!! (stares off into the bleaking future of no sleep and forgetting to go to class again ...)
Cyberman
September 23rd, 2001, 03:32
Texture caching will be a real challenge especially considering the various modes etc that are used with them.
Some of the algorythms are easy to implement some aren't.
I'm not sure but there is probably a few bugs in the texture and gaurad shading algo's used (edge problem for example).
Anyhow it works and compiles (that's what I originally set out to find out is if it compiled or not). :)
The PSX texture cache is kind of odd on the PSX.. it makes sense but it is weird.
I was thinking of 'pre' expanding the cached textures and looking for a repeated use of location into the cache to have pre expanded textures to use in drawing. However probably it's not that simple :)
Cyb
NickK
September 23rd, 2001, 03:36
That might not be a good idea because of the palettized texture pages which use varying palettes on the same page. Oh well, if you could get something like that to work, well them, um, good work :D
Cyberman
September 23rd, 2001, 05:00
Certainly would be a memory hog. Probably need a constant expiring method for it :)
Nice thing though is you can fairly easily integrate it with the primitives because you call the texture generator first, that can put the pointer needed.
Unfortunately 4CLUT and 8CLUT is kind of messy this is what I ment by preexpanding. If you generate the palete already in a 15 bit expanded mode then it will take less time to "load" if it exists already. Theoretically this should work.. :)
Other things like all the blasted bit shifting (that's anoying), colors for gaurad shading are probably wrong (really :) ).
For you .. it's grabing textures and telling OGL what it's rendering then caching them so you don't have to have 30 million of them :)
Cyb
NickK
September 23rd, 2001, 05:05
Yeah, it's impossible to tell what palettes the 4/8-bit pages will use. If you want to "pre expand" something would only work on a 4-bit -> 8-bit expanding, since you can't predict the colors for 8-bit anyway, that'd be the logical thing to do :)
And you'd be surprised as to how much easier programming a software GPU is (after you get the polygon rendering code down) as far as texture caching is concrened compared to a hardware GPU... It sure took me a while to figure out what I was doing, and I'm still working on it! :)
Pete Bernert
September 23rd, 2001, 06:45
Hehe, btw, my soft gpu doesn't use texture caching at all... if you consider that, the speed isn't sooo bad :)
I never bothered to do it in the soft one, because the main cause for its existence is (beside compatibility tests) to have easy offscreen drawing funcs for the hw/accel. ones... and since the offscreen drawing funcs are only kicking in if they are needed (well, if some drawing occurs outside the currently used screen areas... therefore 'offscreen drawing', Lewpy TM), to take care if a soft texture cache is valid would be too costly, I think... well, maybe someday I'll do it.
But in a hw/accel. gpu the texture caching is one of the most important things to get speed... and it's no easy task to do a nicely working one, right.
Cyberman
September 24th, 2001, 21:58
Well yeah.. vhat a mess :)
I noticed Dudies driver uses Intel MMX instructions for copying data to the display. I suppose later I should create a 'universal' MMX approach but right now ..
I'm looking at Intel's MMX isntruction set.. what a mess they have in documentation. Badly organized. You have to read there WHOLE stupid assembly manual and thumb through it to find MMX instructions. What a waste of my time. They would have been far better off releasing a seperate manual with an MMX instruction reference with all the niceties in it instead.
I think if I wanted to improve my speed for texture maping and gaurad shading using MMX is it. (AMD will have to be later).
For shading etc I think a line by line approach is the only thing that is going to work also. Unfortunately there are only 8 MMX registers this might make things hard for masking and merging and multiplying etc. Lots of moving to and from memory anoys me performance wise. Additionally everything needs to be 64 bit aligned, it's not a terrible thing but it does add some difficulties.
I think my approach will be using a line by line approach for each operation, this will make it easier to support MMX (of all varieties) without making implementation overly complex.
One thing is that there aren't really enough MMX registers to do be able to do enough, it's an anoyance but I guess one has to live with it. So lots of careful hand tooled MMX assembly will needed to be used to even implement gaurad shading for example. Add in textures, well it will a pain that's all I can say (grin). It's doable.
I wonder if AMD's MMX style instructions are any better?
Lewpy
September 25th, 2001, 01:16
Don't get confused :) All recent CPUs (Intel & AMD) support MMX.
It is only when you get to the more floating-point orientated instruction sets (SSE[2] & [Extended] 3DNow! [Professional]) that the two makes (Intel & AMD) diverge.
Having said which, the new Athlon-MP supports SSE.
Just remember the golden rule with MMX: don't mix MMX and FP instructions ;)
Personally, I think that if you are going to write an MMX software renderer, you should really think about having a 32bit colour VRAM image. This would make life soooo much more easier, since the colour components would be 8bit, allowing immeadiate usage of packed-byte MMX instructions.
The only real issue is dealing with the following:-
1) VRAM Uploads from CPU
2) VRAM Downloads to CPU
3) Rendering being used as 4/8bit texture
4) 24bit MDECs
What I think you should do is this: have both a 16bit PSX VRAM image, and a 32bit VRAM image. All VRAM uploads/downloads are done to the 16bit VRAM image, so a suitable sync'ing method between the two image banks can be done.
This can be done as follows [very rough idea!!]: a flag bit is used per PSX Tpage in each image, to signify if that image holds the current [correct] data. One or the other bank must contain the most recent VRAM changes, and it is possible for them both to be valid.
So, when a VRAM upload is done, you need to check if the 16bit VRAM image the most current for that TPage? If not, then copy the 32bit image to the 16bit image prior to VRAM upload. Either way though, unset the flag for the 32bit VRAM TPage (it is no longer current).
The same mechanism works for VRAM downloads, except you don't need to mark the 32bit VRAM TPage as invalid, since both the 16bit TPage and 32bit TPage are in-sync, since you've copied the 32bit TPage to the 16bit TPage.
When any primitive is rendered, the corresponding 16bit VRAM TPage needs marked invalid. This can probably be done on the drawing area rather than per primative, to make things quicker/simpler.
For VRAM rendering to 4/8bit textures, again you need to check the validity of the 16bit VRAM TPage before selecting it as a texture source. If it is valid, use it. Otherwise, copy the 32bit TPage to the 16bit TPage first.
For 16bit texture rendering, I would suggest using the 32bit VRAM image as the source, as that makes re-using rendered images simpler, faster and at higher quality.
As for 24bit MDECs, since you only convert from 16bit to 32bit when required, you shouldn't be wasting time wrongly converting 24bit MDEC data to the 32bit VRAM image.
Ideally, the precision of the synchronisation between the 16bit and 32bit VRAM image would be higher, maybe per line of the TPage, or something. A balance needs struck between making the checking too complicated and the amount of data converted too high.
Anyway, it's late and I'm rambling :) I've had ideas for an MMX software renderer for _ages_, just never the time to commit those ideas down, let alone try and start coding them ;)
Also, if you are going to use Duddie's software GPU as a source, PLEASE PLEASE PLEASE junk the use of edge-buffers!!
Scan-convert the triangle dynamically, and remember this: du/dv is CONSTANT for every scanline in the triangle. So calculate it once, not per-scan line. Then you only need to iterate x, u & v down the left-hand side, and x down the right. We are talking non-perspective correct texturing, a la PSX ;)
Cyberman
September 25th, 2001, 19:42
Now that would be interesting
24 bit color soft GPU (yikes that could be strange if it was as fast as a 16 bit soft GPU)
I found a good MMX reference, It's the AMD data book on it, it's more therough and more importantly everything is in ONE PLACE! No mixing of XMMX-QOTY or whatever ;)
you would still have the problem of converting BGR555 components to RGBA8888 components (textures colors etc.) lots of shifting I suppose there. As for handling the semi transparency modes the 32 bit color would be easier though (in MMX move data from VRAM to MMX0 move texture data to MMX1 add and it automagically supports saturation so it's either 0 or FF for the RGB pixel) and you can do 2 pixels at once.
Lets see expanding BGR555 to RGB888 would be something like
XOR EBX,EBX
MOV BX,[ESI]
SHL EBX,3
MOV EAX,0FFh
AND EAX,EBX
SHL EAX,16
SHL EBX,3
AND BX,0F800h
OR AX,BX
SHR EBX,11
OR EAX,EBX
might be a bit slow as data is loaded into the VRAM texture memory for that. but it is likely doable. Anyhow .. I digress
One would still have to do something about caching 4/8 bit textures as much as possible as well. Those are a pain to deal with :)
I ramble.. yeah it's doable but I'm not sure what performance gain you get. Then also comes the question to you put the 32bit VRAM data to the display or the 16bit VRAM to the display. I suppose you could decide that on the current display mode. :)
So the edge buffers just get in your way is what you are saying?
This happen for both gaurad and texture rendering?
How dudie does de textures is 'draw' lines into a virtual buffer and then use the max and mins from that buffer to get the endpoints of each line and the intensity endpoints. So what you are saying is that for a given triangle (or 4 point poly) the ends of these are dynamically calculatable and don't need this extra processing step? Or are you talking about the changes in the position in the texture buffer when you change the position you are reading from the texture buffer? (which might be causeing my polygons edges to show up too)
I get this bizare squishing of textures on the display I suppose that's where it's coming from since it would change the diffX and diffY depending on the edges of the display (hmm) :)
Anyhow so all you really need is Xmin Xmax and diffX and diffY to be calculated once and TXmax TYmax TXmin TYmin are basically a waste of time to make.
Lewpy
September 25th, 2001, 23:52
I don't know if the 32bit system would be faster, but it would make the inner routines more ellegant :) And the quality of the output image should be noticeably better (yes, always output from the 32bit image).
As for the colour expansion: why do one pixel at a time? ;) Either load 2 with a full 32bit load (I don't like the mix of 32bit and 16bit operations you have there, btw), or even better use MMX and load 4 pixels at a time, and write back 2 pixels on each write.
Also, you expansion is not precise enough :) For one thing, you have only expanded BGR555 to RGB888, not ABGR1555 to ARGB8888. The other is that you haven't used the full precision of the RGB888 fields: you have taken bbbbbgggggrrrrr, and converted it into rrrrr000ggggg000bbbbb000. So if BGR was 0xffff, you would convert it to RGB 0xf8f8f8, when it shoudl be 0xffffff. Do you follow? The lower 3 bits of the 8bit colour channels will never be used. To counter this, you take the top 3 bits of the colour channel, shift them down, and OR them back in to the colour. Then the full dynamic range of the 8bit colour channel is used.
I am not saying this is perfect, but here is my code for colour expanding the PSX CLUT for use in the Voodoo palette, so it is basically ABGR1555 expanded to ARGB8888, with a tweak on the chroma-key colour.
// I did some timing, and this routine creates a 256 entry palette 33% faster than using
// a 64k lookup table. And it doesn't damage the cache like a lookup table does :)
// [tests done on a P2-333 512kb L2]
//
// I've done some optimising, but there is more that could be done to get the pairing right
// at the end of each iteration. The loop is only un-rolled 2 times.
void CreatePaletteEntries (int OffsetAmount, unsigned int NumEntries)
{
__asm
{
push ebx
push esi // Save ESI, according to C rules
push edi // Save EDI, according to C rules
mov edx, OffsetAmount
lea edi, CLUTDownload.data
// mov edi, PalBuf
shl edx, 2
mov ecx, NumEntries
mov esi, PalStart
add edi, edx
shr ecx, 1
OuterLoop:
mov eax, [esi] // Retrieve double CLUT entry
add esi, 4
push ecx
push eax // Store double CLUT entry
mov ebx, eax // Let EBX be R1
mov edx, eax // Let EDX be B1
shl ebx, 19 // Shift R1 into position
and eax, 03e0h // Mask G1
shl eax, 6 // Shift G1 into position
and ebx, 0f80000h // Mask R1
shr edx, 7 // Shift B1 into position
or eax, ebx // Add R1 to G1 making RGx1
and edx, 00f8h // Mask B1
pop ebx // Recover double CLUT entry
or eax, edx // Add B1 to RGx1 making RGB1
mov ecx, ebx // Let ECX be double CLUT entry
mov edx, eax // Copy RGB1 to EDX
shl ecx, 3 // Shift R2 into position
and edx, 0e0e0e0h // Mask top 3 bits of each colour
shr edx, 5 // Shift these right by 5, to line them up with the missing bits in each colour
and ecx, 0f80000h // Mask R2
or eax, edx // Add these extra bits to the existing bits
mov edx, ebx // Let EDX be B2
test ebx, 08000h // Check the STP bit
jz AlphaDone1
or eax, 0ff000000h // Set all the Alpha bits
test eax, 000ffffffh // Check for ChromaKey fix
jnz AlphaDone1
mov eax, GlideDisplay.ChromaKeyFixValue // ChromaKey fix value
AlphaDone1:
mov [edi], eax // Save the double pixel
mov eax, ebx
shr edx, 23 // Shift B2 into position
and ebx, 03e00000h // Mask G2
and edx, 00f8h // Mask B2
shr ebx, 10 // Shift G2 into position
or edx, ecx // Add R2 to B2 to make RxB2
pop ecx // Recover the loop counter
or ebx, edx // Add RxB2 to G2 to make RGB2
mov edx, ebx // Copy RGB2 to EDX
and edx, 0e0e0e0h // Mask top 3 bits of each colour
shr edx, 5 // Shift these right by 5, to line them up with the missing bits in each colour
or ebx, edx // Add these extra bits to the existing bits
test eax, 080000000h // Check the STP bit
jz AlphaDone2
or ebx, 0ff000000h // Set all the Alpha bits
test ebx, 000ffffffh // Check for ChromaKey fix
jnz AlphaDone2
mov ebx, GlideDisplay.ChromaKeyFixValue // ChromaKey fix value
AlphaDone2:
mov [edi+4], ebx // Save the double pixel
add edi, 8
dec ecx
jnz OuterLoop
pop edi // Restore EDI, according to C rules
pop esi // Restore ESI, according to C rules
pop ebx
}
}It was designed with the pentium in mind (as that was the only architecture I knew back then!), so there is thought given to u/v pairing rules. I've read up on the P6 architecture, so there are things I would do differently now: conditional loads in place of the test functions would be nice, etc. Of course, an MMX routine would probably be much better ;)
Anyway, that was really to show the extra bits you need to look at: alpha channel, and full usage of the 8bit colour channels.
I am not sure what you have against the 4/8bit texture modes. All I see is you have to make one more look-up to actually retrieve the texture data, which doesn't seem to be too bad. Am I missing something crucial?
Have a look at this document (http://lewpy.psxemu.com/fatmap.txt). Although it was done 5 years ago, it is still pretty relevant to a PSX renderer I think. It explains the details of the optimisations you can make. Of course, it is also targetting the pentium, which is not so relevant these days. But the details are the same :)
Cyberman
September 26th, 2001, 23:36
I think it may be doable... Though I'm not all to sure what the deal with ARGB 8888 and ARGB 1555 means ;)
here is a simple MMX routine that converts BGR555 to ARGB8888 This is what I was thinking for a nice looking soft GPU (I even added the color boost of +7 just for you ;) )
long BOOST_FILL[2] = {0x30303,0x30303};
long R5MSK = 0x1F001F;
void BGR555_to_ARGB8888(long Count, unsigned *_src, long *_dest)
{
asm {
PUSH EDI
PUSH ESI
PUSH EAX
PUSH ECX
MOV ECX,Count
MOV ESI,_src
MOV EDI,_dest
MOVQ MM7,R5MSK ; Red
MOVQ MM6,MM7
PSLLQ MM6,5 ; Green
MOVQ MM5,MM7
PSLLQ MM5,10 ; Blue
MOVQ MM4,BOOST_FILL
XOR EAX,EAX
LOOP_RGB:
MOVD MM0,[ESI] ; read 2 16 bit values
MOVQ MM1,MM0
MOVQ MM2,MM0
PAND MM0,MM5 ; Blue
PAND MM1,MM6 ; Green
PAND MM2,MM7 ; Red
PUNPCK MM0,EAX ; zero fill Blue to 32 bit
PUNPCK MM1,EAX ; zero fill Green to 32 bit
PUNPCK MM2,EAX ; zero fill Red to 32 bit
PSRLQ MM0,7 ; 8 - 15 ; Blue
PSLLQ MM1,6 ; 16 - 10 ; Green
PSLLQ MM2,19; 24 - 5 ; Red
POR MM0,MM1
POR MM0,MM2
POR MM0,MM4 ; fill lower bits of each color to max
MOVQ [EDI],MM0 ; Store 2 32 bit values
ADD ESI,4
ADD EDI,8
SUB ECX,2
JNZ LOOP_RGB
EMMS
POP ECX
POP EAX
POP ESI
POP EDI
}
}
It works with 2 values at a time (small limitation perhaps for now) having it work with ODD values of numbers is possible just will require some additional code.
This would be good for quick conversion of the BGR pallete values for the 32 bit VRAM display. Now as for MDEC decoding, I am not sure how to let that work with the display personally. However just in doing this simple operation alone it's very efficient.
Unless the game uses the 24 bit color mode (that would be a headache) this looks to work Ok dokey for the 16 bit BGR mode.
With MMX a lot of the semi transparency modes would work a LOT faster (heh). As for overall speed I think it would be comparable to 16 bit RGB in a soft GPU. Might be worth doing just for the purtyness of it ;)
By the way, what does ePSXe use to transfer the MDEC output to the module? I would like to fix the displaying of the MDEC data so it doesn't look like RGB hell :)
I haven't been able to think of a big downside yet, probably will take time ;)
Caveats:
Reading any data from the display buffer (VRAM to CPU memory operation) does have a downside.
Writting data to the display buffer might require calling the above routine to fill the 32bit VRAM.
Handling odd numbered/sized textures might result in some problems.
Would have to check for 32 bit color mode for display.
Would have to check if the current mode is 32 bit color then ask the user if switching modes is OK if it isn't.
Anyhow some more thinking will be involved.
I'm also perplexed at the textures, I'm trying to calculate the textX and textY values once, not sure what to use for that.
NickK
September 26th, 2001, 23:41
Use Texture Page data globally when it's not specified in a primitive. Texture Page X&Y are stored in the 16-bit tpage data. Use TexCoordU + tPageX and TexCoordV + tPageY to get the actual coordinates of the coordinate in the texture page in the VRAM.
Cyberman
September 26th, 2001, 23:59
actually the values for boost fill should be
long BOOST_FILL[2] = {0x70707,0x70707}
I might spend some time making an MMX version of the whole GPU might make for some interesting discoveries in performance (maybe not but HEY :) )
Cyb
Cyberman
September 27th, 2001, 00:07
Ahh.. that's what he was refering too.. I still need to make a bunch of optimizations. I also need to add on ePSXe extensions too :)
Now comes an interesting thought.
If say the thing is in 32 bit color mode and we are using 32 bit VRAM (instead of 16 bit) we probably need to have a 16/32 bit switch in the area were VRAM gets copied to the display. I don't think it would have much overhead, but it would make it easy to select between 2 optimized routines to convert the display ;)
Why is dudie calculate diffTX each line anyhow? isn't the number of pixels per scanline semi invariant? I think this might be causing the odd texture distortions I've been having.
Ciao
Cyb
Lewpy
September 27th, 2001, 14:06
Texture distortion is normal for non-perspective correct texture mapping, hence the use of perspective correct texturing ;)
PSX doesn't do perspective correct texturing, so it's just something we have to live with: no depth data means no perspective calculations are possible :(
As for your "boost" fill, you kinda missed the point ;) What my code was doing was taking the top 3 bits of the 5 bit colour value, and using it as the bottom 3 bits of the 8 bit colour value.
So the channel goes from
abcde
to
abcdeabc
This means colour 0x0000 goes to 0x00000, while 0xffff goes to 0xffffff, with a linear converstion in-between. Precision, baby, precision ;)
And dammit, if they have their desktop set to sucky 16bit colour, just go full-screen 32bit ;)
Alternatively, use D3D and treat the 32bit VRAM image as a texture source, and texture it to the 16bit display surface ;)
Cyberman
September 28th, 2001, 05:08
Fine <sniff> actually I use mine currently in 16 bit mode. I wonder if the other Soft GPU's have 24 bit color textures etc... hmmms.
I have a sucky Diamond Stealth 3d 2000 :)
my other computer has a Voodoo 3 2000 though 32m
Anyhow.. I think I'll mess with a faster soft GPU implementation using MMX I'm thinking I should support the extensions to ePSXe too ;)
Cyb
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.