PDA

View Full Version : PCSX Development


jivera
October 16th, 2003, 17:29
As I am sure you all know, Linuzappz posted about a month ago to PCSX.net that he's given up on maintaining the project in favour of being able to work on PCSX2. I contacted him via IRC and he approved of letting me work on the project.

Apparently he has also added two other programmers, yokota and imrtechnology, to the sourceforge.net account, but I haven't been able to get in contact with them yet.

I have recently uploaded the PCSX 1.6 development tarball into an arch (http://gnuarch.org) archive (http://www.flame.org/~jivera/projects.html) and made the appropriate bug fixes for MinGW to compile. I have a few ideas I would like to get integrated into the code, but just noticed today that this bulletin board is still active and though I would first ask if anyone else has made efforts to reformalize the development process so our work doesn't need to be redundant.

Thank you,
-jivera

zenogais
October 17th, 2003, 06:09
I've actually been looking over the source code, and if you wouldn't mind I would like to recommend a few improvements that you might make to PCSX. There are a few more optimizations that can be done to the recompiler to increase its speed even more, here's a list of possible ideas:


Caching of pre-compiled x86 code so that when loops are being processed the code doesn't have to be repeatedly regenerated.
You could also consider using circular buffers for loop-processing so that when a jump instruction is executed to previous code that has already been recompiled it will be immediately available in this recompiled form, this may take some r3000a binary analysis though.
Just a note, keeping track of these would require a 2D array that is written/read from based on the memory location of the translated instruction. This way you can check to see if you have already compiled the code at that location thus avoiding recompiling it again if it is not necessary.


These are just some thoughts from looking through the source code. Best of luck to you, and hope you make PCSX better than it already is. :thumb:

-zenogais

Xeven
October 17th, 2003, 06:34
instruction is executed to previous code that has already been recompiled it will be immediately available in this recompiled form

that's one of the major precepts of binary translation or "recompilation" in the first place. Otherwise, you'd have to translate each "block" everytime it is accessed, which will make your "recompiler" much slower than the interpreter. So in short, it's already there.

zenogais
October 17th, 2003, 06:42
that's one of the major precepts of binary translation or "recompilation" in the first place. Otherwise, you'd have to translate each "block" everytime it is accessed, which will make your "recompiler" much slower than the interpreter. So in short, it's already there.

Sorry, Thanks Xeven. :D I'm still a "newbie" for the most part to dynarec, thanks for the info.

FLaRe85
October 17th, 2003, 09:26
I have a fairly simple suggestion. It'd be great to finally have some sort of command-line support in PCSX. :)

jivera
October 17th, 2003, 18:28
ShADoWFLaRe85: I have a fairly simple suggestion. It'd be great to finally have some sort of command-line support in PCSX. :)

Can you be a little more specific with regard to what command-line options you would like to see? I believe I saw commented out code that would deal with command line stuff, but I haven't touched it yet.

Most of the features I'm interested in adding include better debugging (memory viewer, instruction stepper, breakpoints, etc.), but I'll try to add any requested features that are within my coding abilities.

zenogais, I took a look at your NeoPSX website and under progress research is listed as 100% -- what research have you done? I have found a document named "Everything You Have Always Wanted to Know about the Playstation But Were Afraid to Ask" and it does a good job at explaining the PSX, but leaves a lot out.

Possibly our projects can benefit from some cross pollination of code (for example, I can see if there are any PCSX-specific dependencies in the XA and mdec decoding routines that can be eliminated so they colud be used as library routines).

-jivera

FLaRe85
October 17th, 2003, 20:07
Can you be a little more specific with regard to what command-line options you would like to see? I believe I saw commented out code that would deal with command line stuff, but I haven't touched it yet.
Sure. I think it'd be beneficial, especially for frontends, if you could perform basic launching options from the command-line that would otherwise only be accessible from the file menu. For example, 'pcsx.exe -loadbin c:\path\to\iso.bin' would auto-start an iso and 'pcsx.exe -runcd' would immediately launch the game in the CD-ROM drive at the time. I'm actually borrowing the syntax of those commands from ePSXe and PSXeven, respectively, but I'm sure you get the idea. :)

jivera
October 17th, 2003, 20:51
ShADoWFLaRe85: Sure. I think it'd be beneficial, especially for frontends, if you could perform basic launching options from the command-line that would otherwise only be accessible from the file menu. For example, 'pcsx.exe -loadbin c:\path\to\iso.bin' would auto-start an iso and 'pcsx.exe -runcd' would immediately launch the game in the CD-ROM drive at the time. I'm actually borrowing the syntax of those commands from ePSXe and PSXeven, respectively, but I'm sure you get the idea. :)

Okay, those sound like fair requests. Right now I'm trying to find a way to profile the dynamic recompiler, but it so far seems I can only do that for the interpreter -- I probably need to figure out what special code gcc inserts when given the -pg flag so I can do the same inside the dynarec code.

I had to reinstall Windows a few weeks ago and lost all the plugins I had for PCSX. I downloaded some zip file with dozens of different plugins, but now cannot seem to find it (and not having a working joypad plugin kinda makes things difficult). If anyone knows where I can find that plugin collection again, I would appreciate it.

-jivera

FLaRe85
October 17th, 2003, 21:17
ShADoWFLaRe85: Sure. I think it'd be beneficial, especially for frontends, if you could perform basic launching options from the command-line that would otherwise only be accessible from the file menu. For example, 'pcsx.exe -loadbin c:\path\to\iso.bin' would auto-start an iso and 'pcsx.exe -runcd' would immediately launch the game in the CD-ROM drive at the time. I'm actually borrowing the syntax of those commands from ePSXe and PSXeven, respectively, but I'm sure you get the idea. :)

Okay, those sound like fair requests. Right now I'm trying to find a way to profile the dynamic recompiler, but it so far seems I can only do that for the interpreter -- I probably need to figure out what special code gcc inserts when given the -pg flag so I can do the same inside the dynarec code.

I had to reinstall Windows a few weeks ago and lost all the plugins I had for PCSX. I downloaded some zip file with dozens of different plugins, but now cannot seem to find it (and not having a working joypad plugin kinda makes things difficult). If anyone knows where I can find that plugin collection again, I would appreciate it.

-jiveraGreat :D
You could download Aldo's plugin pack at http://aldostools.mysite4now.com/psemu/plugins.zip

jivera
October 17th, 2003, 22:02
Okay, I just wrote some (very fragile) proof of concept code and implemented the -runcd command line argument, so it shouldn't be difficult to write the rest of the code. Windows and Linux (or ANSI C rather) use different idioms for passing command line arguments (Windows has a single string, while Linux/ANSI C breaks the string at space boundaries).

When I get out of class and back to my appartment I will try to finish this code and possibly upload a build so people can test it out.

-jivera

Seta-San
October 17th, 2003, 22:12
damned be the evil command line shadowflare. even bill can see it's dead.

Andrew Hruska
October 17th, 2003, 23:05
damned be the evil command line shadowflare. even bill can see it's dead.Oh, so you hate the CLI? DIE, DIE! Rot! hehe.

FLaRe85
October 17th, 2003, 23:08
damned be the evil command line shadowflare. even bill can see it's dead.
You try writing a useful frontend-like application without a command line. :p

They're more helpful than they seem. ;)

zenogais
October 18th, 2003, 01:29
ShADoWFLaRe85: [b]
zenogais, I took a look at your NeoPSX website and under progress research is listed as 100% -- what research have you done? I have found a document named "Everything You Have Always Wanted to Know about the Playstation But Were Afraid to Ask" and it does a good job at explaining the PSX, but leaves a lot out.

Possibly our projects can benefit from some cross pollination of code (for example, I can see if there are any PCSX-specific dependencies in the XA and mdec decoding routines that can be eliminated so they colud be used as library routines).

-jivera

I'd be glad to help out PCSX. :D I wouldn't say I'm exactly the best person to talk to, but I'll answer what I can. As for the research, most of my research has been through reading PSX Hardware docs, looking through some source code, looking at some PSX binaries, and thats about it. Here's a few docs, besides the one you listed, I read just to help me out with programming an emulator as well as Playstation specific stuff:

1964 Recompiling Engine Doc (http://www.andrews.edu/~kutnick/public/1964_recompiler.pdf)

Aemulor Recompiler Docs (http://www.aemulor.com/manual/technical.php)

Padua Playstation Resource (http://psx.rules.org/psxrul2.shtml)

PSX Hardware Docs (http://www.runix.ru/html/Documentation%20kit/)

R3000A Hardware Docs (http://decstation.unix-ag.org/docs/ic_docs/3467.pdf)

Also right now for NeoPSX I'm working on the dynarec, I have about half the PSX instructions coded, and I'm also working on doing optimizations with this code such as some of the ones I mentioned earlier. Hope this helps, just email me if you have any more questions/comments.

Sincerely,
zenogais

M.I.K.e7
October 18th, 2003, 05:09
There are a few more optimizations that can be done to the recompiler to increase its speed even more


I have to admit that it's been a while since I took the last look at the PCSX source, but most current dynarecs have a lot of design flaws, most often direct translation (ie. translated code is gernerated directly after the instruction has been decoded) and especially on x86 the lack of actual register allocation, no matter if static or dynamic.


Caching of pre-compiled x86 code so that when loops are being processed the code doesn't have to be repeatedly regenerated.


As Xeven already said, it is the basic idea to put generated code in the translation cache, otherwise the dynarec would be slower than an interpreter.
The introduction to Embra shows quite well how it works:
http://www-flash.stanford.edu/Embra/bin.trans.html
But in the example at the top you can also see bad register allocation...

My wish list mainly consists of PowerMac compatibility, and I'd be even willing to do the PowerPC dynarec...
Unfortunately I know more about the processor in my new PowerMac than the API of MacOS X...

M.I.K.e7
October 18th, 2003, 05:41
As for the research, most of my research has been through reading PSX Hardware docs, looking through some source code, looking at some PSX binaries, and thats about it. Here's a few docs, besides the one you listed, I read just to help me out with programming an emulator as well as Playstation specific stuff:
1964 Recompiling Engine Doc (http://www.andrews.edu/~kutnick/public/1964_recompiler.pdf)


1964 probably isn't the most ideal example either, but I guess that might be just due to the fact that emulating a 64-bit CPU on x86 one runs out of registers pretty fast...

Here are some other interesting documents:
The DR Emulator (http://devworld.apple.com/technotes/pt/pt_39.html)
Just a short article about the 68K dynarec that was included in MacOS when the switched to the PowerPC. It's written by Eric Traut, who is probably the only dynarec celebrity, since he also was responsible for the dynarecs in Virtual Gamestation and Virtual PC.

Shade (http://www.cs.washington.edu/research/compiler/papers.d/shade.html)
This documentation describes a fast SPARC simulator and a lot of ideas are interesting for dynarecs as well.


Aemulor Recompiler Docs (http://www.aemulor.com/manual/technical.php)


Now I know from where you got the circular buffer idea that you mentioned earlier :-)
There also have been two university projects dealing with ARM emulation via dynarec:
ARMphetamine (http://armphetamine.sourceforge.net/) and Tarmac (http://www.davidsharp.com/tarmac/index.html).
Especially the report on the latter project is highly recommended.

I once wrote a thread hear about dynamic recompilation, but it was a mostly a combination of well-known facts with a little bit of my own ideas thrown in. If you want to take a look anyway:
Dynamic Recompilation - An Introduction


Padua Playstation Resource (http://psx.rules.org/psxrul2.shtml)
PSX Hardware Docs (http://www.runix.ru/html/Documentation%20kit/)


You shouldn't forget the PSX documentation by Joshua Walker (http://www.zophar.net/tech/psx.html)!


Also right now for NeoPSX I'm working on the dynarec, I have about half the PSX instructions coded, and I'm also working on doing optimizations with this code such as some of the ones I mentioned earlier.


I tried the link in your profile but it doesn't seem to work...

Unfortunately Dynarec.com is no more since Neil Bradley shut off the server, otherwise I could have provided much more information.
But the dynarec mailing list migrated to Yahoo, if you are interested in more discussion about dynamic recompilation:
Dynarec Group (http://groups.yahoo.com/group/dynarec)

I guess that's enough for now...

jivera
October 18th, 2003, 07:30
M.I.K.e7: You shouldn't forget the PSX documentation by Joshua Walker (http://www.zophar.net/tech/psx.html)!

That would be the "Everything You Ever Wanted to Know About The Playstation but were Afraid to Ask" paper that I referenced.

-jivera

zenogais
October 18th, 2003, 09:02
Thanks M.I.K.e7 for the awesome docs. Gotta read up :thumb:

M.I.K.e7
October 19th, 2003, 00:29
M.I.K.e7: You shouldn't forget the PSX documentation by Joshua Walker (http://www.zophar.net/tech/psx.html)!
That would be the "Everything You Ever Wanted to Know About The Playstation but were Afraid to Ask" paper that I referenced.


Yes, it is the same indeed. Sorry, I must have overlooked that you already mentioned it.

jivera
October 19th, 2003, 01:06
Late last night I commited a patch that gives the Windows port comand line support for -runcd and -runcdbios. I'll need to do a little refactoring work to add support for any more commands like specifying which file to use. (If anyone's interested, I can upload a build.)

At this point I must admit some ignorance on my behalf - I have only used PCSX to emulate games that I actually own the CD to (i.e. -runcd is sufficient for me), so if there are any additional flags I should support then I will need a brief summary what the equivalent action using menus right now are (or in the case that PCSX does not have the feature at all, explain what needs to be added).

-jivera

M.I.K.e7
October 19th, 2003, 02:13
Correct me if I'm wrong, but I've just taken a look at several MIPS to x86 dynarecs and all seem to have the same unoptimised solution for SLT/SLTI.

All seem to make a comparision between the two registers or the register and the immediate, and then write the set EAX to the result of the comparision like this:

cmp Rs, Rt
setl eax

To make sure that only one or zero is on EAX they either do a XOR EAX, EAX beforehand or an AND EAX, 1 afterwards.

But since comparisions in computers are just subtractions and SLT/SLTI do signed comparisons, it should be possible to just do a subtraction and shift the sign of the result to the bottom of the register like this:

sub Rs, Rt
shr Rs, 31

Since the logical shift to the right only keeps the former sign we don't have to care about what the register contained in addition, thus saving one instruction.
Mind you, I haven't tested it yet, but I guess it should work.

Unfortunately it won't work for SLTU or SLTIU, since these perform an unsigned comparision.

Maybe I'm just a little bit pedantic, saving a single instruction, but I think SLT might be used quite frequently and it's probably not the only instruction that could be optimized.

Not to mention that SLT should often occur in combination with one of the branch instructions, and those two could be combined to generate the appropriate condition code instead.

BTW, does PCSX actually use the dynarec?
Even with grep I seem to be unable to locate where the emulator calls the functions declared in the dynarec...

jivera
October 19th, 2003, 02:32
M.I.K.e7: Correct me if I'm wrong, but I've just taken a look at several MIPS to x86 dynarecs and all seem to have the same unoptimised solution for SLT/SLTI.

I've been noticing things like that too - PCSX at least seems to use a direct mapping of MIPS instructions to x86 instructions with a little bit of constant folding. As far as I can tell, it does not make any attempt to make better use of registers for holding temporary values, for example. However, at the same time, I'm curious how beneficial it would be to devoting much time to adding optimizations like these.

With regard to your specific optimization about SLT/SLTU, yeah, I'll agree that makes sense.

BTW, does PCSX actually use the dynarec?
Even with grep I seem to be unable to locate where the emulator calls the functions declared in the dynarec...

It never directly calls the dynarec functions - the function pointers are stored in a struct called psxRec. When the recompiler is enabled, the pointer psxCpu is set to point to psxRec and otherwise to psxInt. The end result is something similar to C++'s virtual functions.

One of the things I'm interested in right now is finding a good documentation of the x86 machine code layout for all the instructions. I found a decent one in Windows help format, but it doesn't go into much detail about how registers and immediate values are included in the instruction.

-jivera

zenogais
October 19th, 2003, 03:20
Jivera here is the best doc I've found for the encodings, its what I'm using. Its for the NASM assembler:

Nasm Docs (http://home.comcast.net/~fbkotler/nasmdoc0.html)

M.I.K.e7
October 19th, 2003, 03:55
I've been noticing things like that too - PCSX at least seems to use a direct mapping of MIPS instructions to x86 instructions with a little bit of constant folding.


Do you mean by constant folding that it combines a LUI with a following ORI with the same destination register to a single move instruction on x86?


With regard to your specific optimization about SLT/SLTU, yeah, I'll agree that makes sense.


Good, then I'm not totally insane yet.


It never directly calls the dynarec functions - the function pointers are stored in a struct called psxRec. When the recompiler is enabled, the pointer psxCpu is set to point to psxRec and otherwise to psxInt. The end result is something similar to C++'s virtual functions.


Seems I have to take a closer look, thanks.


One of the things I'm interested in right now is finding a good documentation of the x86 machine code layout for all the instructions. I found a decent one in Windows help format, but it doesn't go into much detail about how registers and immediate values are included in the instruction.


Unfortunately instruction endcoding in x86 is a real pain, which is one of the reasons why I'd rather do a code generator on the PowerPC.
But you could check out the following:
The IA-32 Intel® Architecture Software Developer's Manual, Volume 2: Instruction Set Reference (http://www.intel.com/design/pentium4/manuals/245471.htm)

jivera
October 19th, 2003, 04:42
M.I.K.e7: Do you mean by constant folding that it combines a LUI with a following ORI with the same destination register to a single move instruction on x86?

More or less - it's a bit more generic than that. Instructions that load a fixed value into a register set a flag that marks that the register's value is known. As it progresses, anytime it finds an instruction that uses either a constant register and an immediate or two constant registers, it simply stores the result in the destination register. Only if one of the two are unknown does it output instructions.

Unfortunately instruction endcoding in x86 is a real pain, which is one of the reasons why I'd rather do a code generator on the PowerPC.

Yeah, so I've heard.

But you could check out the following:
The IA-32 Intel® Architecture Software Developer's Manual, Volume 2: Instruction Set Reference (http://www.intel.com/design/pentium4/manuals/245471.htm)

Hm, I'll check that out.

jivera
October 19th, 2003, 09:59
I just finished playing around with GCC's built-in support for vector instructions (MMX/SSE/etc.), and I think I may rewrite some of the code to take advantage of it (at the recommendation of shadow). It would give the benefit of making it easier to write the code to do different vector and matrix operations and allow GCC to better optimize for them on processors that may take advantage of it (while still generating backwards compatible code for systems that can't).

I'm wary to make a decision like this that would damage support for Visual C++, but mingw could still provide for the Windows port. (However, at the same time I don't have Visual C++ so I couldn't check changes I make on it anyways, so unless any other developers are interested in compiling using VC++, there's probably no point in me making extra effort on behalf of it.)

-jivera

zenogais
October 19th, 2003, 20:10
I have Visual C++ 6 if you would require someone to check the compatability. While its not the latest version(.NET) it should work fairly well. Also just wanted to pointed out something I've noticed in PCSX:

*The Dynarec(Dynamic Recompiler) engine has no support for a Translation Cache, this cache may very well speed up the dynarec by quite a bit.

Basically if you don't know a Translation Cache functions very much like a CPU cache by checking to see if it has already translated the instruction at that location in memory. In the case of PCSX this is the Playstation's memory. If it does contain a translation cache, since I'm only 98% sure on this, can you please tell me where.

jivera
October 19th, 2003, 20:33
zenogais: I have Visual C++ 6 if you would require someone to check the compatability. While its not the latest version(.NET) it should work fairly well.

Alright, we'll see (I still haven't decided to take advantage of GCC's stuff yet or not, but I'm leaning in favour of it).

*The Dynarec(Dynamic Recompiler) engine has no support for a Translation Cache, this cache may very well speed up the dynarec by quite a bit.

I think you're mistaken. Beginning at line 428, ix86/iR3000A.c defines the function execute which is used to execute a single block of code:

__inline static void execute() {
void (**recFunc)();
char *p;

p = (char*)PC_REC(psxRegs.pc);
if (p != NULL) recFunc = (void (**)()) (u32)p;
else { recError(); return; }

if (*recFunc == 0) {
recRecompile();
}
(*recFunc)();
}

You can see recFunc is a pointer to a function pointer. The first half of the function sets it up to point to where the function pointer should be in memory and if it's zero, then it runs recRecompile to save the function pointer to that memory location. Then, after recRecompile returns, execute may dereference recFunc and call the function pointer.

As has been said before, the dynamic recompiler wouldn't offer much improvement if it didn't save recompilation results.

-jivera

zenogais
October 19th, 2003, 20:43
Thanks jivera, I was just curious if it did or not. Just let me know if you need VC++.

TheGodfather
October 21st, 2003, 07:25
Hello jivera!
Thanks for continue with the development of the best PSX emu available.

Also I really would like to read your progress here or somewhere else, please keep us informed about your progress with the emu.

Thanks again
Bye

jivera
October 21st, 2003, 09:07
TheGodfather: Thanks for continue with the development of the best PSX emu available.

No problem. Thanks for the support. :-)

Also I really would like to read your progress here or somewhere else, please keep us informed about your progress with the emu.

I'll see if I can find some way to easily maintain a list of news articles on my website so I can do just that. For the most part, right now I'm just focusing on learning the ins and outs of the source code as well as reading up on some of the relevant documentation so I can make helpful improvements. Once I start adding improvements, however, I'll be sure to keep updates.

-jivera

jivera
October 21st, 2003, 09:25
In an effort to be able to possibly improve the quality of the dynarec and interpreter, I've been trying to find documentation about some of peculiarities of the R3000A, particularly load and branch delays. I understand the idea of loads being delayed until after the next operation as well as operations after a branch still being executed, but I can't seem to find any documentation that gives precise explanations about this behaviour. Here are some of my questions (obviously the most important answer to these is if they're even relevent):

* Do PSX games use dynamic code generation?
The GameBoy Advance has different access speeds for different segments of RAM, so critical loops and code will get copied by the program into this area for execution rather than the game's ROM. Is there any idiom that parallels this or can it safely be assumed (at least in the common case) that writable and executable memory will never overlap? Are there any games known to either dynamically generate code?

* Which instructions have delay slots and how exactly do they work?
As far as I know, only branch/jump and load instructions have delays. I played around with PCSpim, the emulator my Digital Systems class uses for learning MIPS, to understand delays, but it only seems to support branch delays. I've emailed my professor even to ask about this, but he doesn't seem to know either and hasn't received a response back from the school responsible for writing the program yet.

* What instructions are valid in a delay slot?
I have read that any instruction that does not have its own delay slot may be placed in a delay slot. This would leave me to believe that an efficient solution to emulating this behaviour would be whenever there's a delayed operation, reorder the instructions so the delay slots appear before the current instruction.

Rethinking this idea, however, I cannot imagine multiple consecutive loads would be invalid so maybe that was not a correct explanation. I have not yet thought out the best solution to reordering a series of loads that may be dependant upon each other, but I'll look into it and I'm sure there can't be _too_ many special cases. (I can see a few other flaws with this, but as said, it's still very in the rough.)

Also, what's the rule with multi-cycle instructions or branches in a delay slot?

* Do stores technically have delays too?
A load delay is caused by the fact that an access from memory takes a cycle before it arrives in the register file. Is there a cycle delay before a store affects memory or will it never have a significance? e.g.

sw $t0, 0(t2)
lw $t1, 0(t2)

Is this a valid instruction pattern? If so, afterwards does $t1 hold the original value at 0(t1) or the same value as $t0? How about this:

lw $t0, 0(t1)
sw $t0, 0(t1)

Is this a valid instruction pattern? Would this instruction pattern swap the values stored in $t0 and at 0(t1)? If so, this would force me to add a little extra logic to my idea from earlier about swapping delay slots and their respective instructions (otherwise 0(t1) would simply be updated to hold the value of $t0).

* Branches upon branches
Is a branch instruction valid in another branch's delay slot? I've heard this is invalid, but the more important rule is what's the expected behaviour if it's possible that a PSX game may include code using this?

* Breakpoints
Since one of the features I'm interested in adding is better breakpoint support, I wanted to know if there's any common requirements for placement of breakpoints (such as not allowed to be in delay slots)?

I think that'll be enough for now... :-)

-jivera

Edit: Maybe one way to find the "proper" behaviour for these conditions is to write up some test framework that I can run on other highly compatible PSX emulators and just look at what results they give. (Perhaps even better would be if someone could run this test suite on a Net Yaroze or something to guarantee what the correct behaviour is... dunno how feasible this is though.)

M.I.K.e7
October 21st, 2003, 20:49
* Do PSX games use dynamic code generation?


Generally I have no idea. I guess it might depend highly on the specific game, since all code has to be loaded into the RAM to be executed anyway, unlike the GBA where you have to option of the slow ROM or the faster Work-RAM (I'm no GBA programmer but I guess they even use the 16-bit Thumb instruction set in the ROM and the 32-bit ARM instruction set in WRAM, but that's off-topic anyway...).
Maybe Xeven has more information about what specific games do.


* Which instructions have delay slots and how exactly do they work?
As far as I know, only branch/jump and load instructions have delays.


That's correct for the R3000A, but it has changes a little since, which is probably why your professor isn't sure about it and why your simulator behaves differently.
Originally MIPS was designed to be very simple and had no hardware interlocking in the pipeline whatsoever (MIPS actually stands for Microprocessor without Interlocking Pipeline Stages).

One of the side effects of this is that when you try to access the destination register of a load instruction in the next instruction following it, the result is undefined because the value hasn't been loaded into the register yet, ie. the load operation is delayed, not the instruction in the delay slot! Most other processors would produce a so-called interlock until the data has been loaded. In fact from MIPS II onwards the processors actually do have interlocking for load instructions, which is probably the cause of your confusion.
A bit of architecture history: The R3000A belongs to MIPS I, while MIPS II is basically only the less known R6000. MIPS III is the R4000 family (ie. the N64 has no load delay slots), MIPS IV is R10000 and R5000, etc.
It's still good style to program if there were a load delay slot, because the interlock is basically a pipeline stall, and you surely don't want that! Not that it's important for an emulator anyway...
To make a long story short: I don't think that you have to care about load delay slots in an emulator, because it's the compiler's or assembly programmer's job to make sure that such instruction combinations don't happen. If you want to be on the sure side you could monitor if the destination register of a load is used directly after the instruction, just for debugging purposes.

The other side effect (which was actually intended and thus is still present in all MIPS processors up to now) is that the instruction directly following a branch or jump instruction is executed before the branch is actually performed. Think of it as the branch calculating the branch target or checking the branch condition first, while the next instruction executes, and then the program counter is adjusted.
For emulation purposes you just interpret or generate the code for the instruction in the branch delay slot before you deal with the branch instruction, because that instruction is executed, no matter if the branch is taken or not (it's different with branch-likely, but that's only present from MIPS III onwards). Also the instruction has to be "safe" by defintion, or a NOP if a safe instruction cannot be found, ie. you shouldn't have to care that the instruction in the branch delay slot might change the condition for the branch.
A bit of processor theory: All processors with a pipeline fetch the instructions following a branch (unless an optional branch predictor tells them otherwise), and if the branch is taken the whole pipeline has to be cleared. The designers of the MIPS wanted to give the instruction in the branch delay slot, which is fetch anyway, some purpose, thus it is executed before the branch finishes.
So why didn't MIPS have branch prediction? Well, it was 1985 and they wanted to create a simple and cheap processor. It doesn't even have a real MMU, just a TLB that throws an exception when it a page translation isn't present...

I hope it's a little clearer now.


* What instructions are valid in a delay slot?


Good question, I actually have no real information about that. But I doubt that it would be good style to put a branch instruction in a branch delay slot!
Since processors of the MIPS I architecture probably don't check, I guess everything would be possible in theory, but I don't think that anyone will use anything but "safe" instructions.


Rethinking this idea, however, I cannot imagine multiple consecutive loads would be invalid so maybe that was not a correct explanation.


From my explanation above it should be clear that the only real problem would be the branch delay slot, and you're right that a sequence of load instructions is just fine, as long as one load doesn't use a base register that is loaded just before, because then the result simply is unpredictable.


Also, what's the rule with multi-cycle instructions or branches in a delay slot?


Apart from loads and branches the only real multi-cycle instructions I know of are multiplication and division, but these are simply started and then continue calculating in a separate multiplication unit, even if an exception should occur!
This can be a slight problem; quoting "See MIPS Run" by Dominic Sweetman:

The integer multiplier has its own separate pipeline. Operations started by instructions like mult or div, which take two register operands and feed them into the multiplier machine. The program then issues an mflo instruction (and sometimes also mfhi, for a 64-bit result or to obtain the remainder) to get the results back into a general-purpose register. The CPU stalls on mflo if the computation is not finished; so a programmer concerned with maximising performance will put as much useful work as possible between the two. In most MIPS implementations a multiply takes 10 or more clock cycles, with divide even slower.
The multiply machine is separately pipelined from the regular unit. Once launched, a multiply/divide operation is unstoppable even by an exception. That's not normally a problem, but suppose we have a code sequence like the following where we're retrieving one multiply unit result and then immediately firing off another operation:

mflo $8
mult $9, $10

If we take an exception whose restart address is the mflo instruction, then the first execution of mflo will be nullified under the precise-exception rules and the register $8 will be left as though mflo had never happened. Unfortuately, the mult will have been started too and since the multiply unit knows nothing of the exception will continue to run. Before the exception returns, the computation will most likely have finished and the mflo will now deliver the result of the mult that should have followed it.
We can avoid this problem, on all MIPS CPUs, by interposing at least two harmless instructions between the mflo/mfhi on the one hand and the mult (or any other instruction that starts a multiply unit computation) on the other.

I doubt that it's important for an emulator (especially since you won't be simulating the pipeline), but it's interesting nevertheless.
BTW, according to "MIPS RISC Architecture" by Gerry Kane and Joe Heinricht the cycle timing of multiply/divide instructions for the R3000 are: 12 cycles for MULT/MULTU and 35 cycles for DIV/DIVU. But I also don't think that this is important for an emulator to keep track of, if we assume that the programmer or compiler put enough code between these instructions and MFLO/MFHI for the operation to finish the computation by that time.
A more interesting question for the dynarec would be: When should the multiplication or the division be performed? When the instruction is found that starts the computation or when the result of that computation is referenced?


* Do stores technically have delays too?
A load delay is caused by the fact that an access from memory takes a cycle before it arrives in the register file. Is there a cycle delay before a store affects memory or will it never have a significance? e.g.
sw $t0, 0(t2)
lw $t1, 0(t2)
Is this a valid instruction pattern?


No, stores have no delay slots, and the code example above shouldn't be a problem. Stores simply write the data to the cache and are done. The following load will still find the data in the data cache.


How about this:
lw $t0, 0(t1)
sw $t0, 0(t1)
Is this a valid instruction pattern?


It's not valid, because when the store is executed the value of register $t0 is still undefined.


* Branches upon branches
Is a branch instruction valid in another branch's delay slot? I've heard this is invalid, but the more important rule is what's the expected behaviour if it's possible that a PSX game may include code using this?


I don't think it's totally impossible, but it certainly would be very, very bad code, and I doubt that you'll find that in practical code.


* Breakpoints
Since one of the features I'm interested in adding is better breakpoint support, I wanted to know if there's any common requirements for placement of breakpoints (such as not allowed to be in delay slots)?


Good question, and I have no answer right now...


I think that'll be enough for now... :-)


I hope that at least some of my information was useful to you...

jivera
October 21st, 2003, 21:21
M.I.K.e7: Apart from loads and branches the only real multi-cycle instructions I know of are multiplication and division, but these are simply started and then continue calculating in a separate multiplication unit, even if an exception should occur!

Okay, that makes things nice. I guess if we really wanted to be accurate, count the cycles between the mult/div op and the mfhi/mflo op and just add a bit if it doesn't hit the minimum. (Probably just calculate immediately too for simplicity sake - I see no advantage in delaying.)

Stores simply write the data to the cache and are done.

Ohh, I completely forgot about cache. The reason I was thinking a store might have a delay is because I knew a load delays so it can fetch from memory and I thought a store might need to go through the same delay to write into memory, but if it hits the cache first so the next load gets the right value that would make sense (and make things easier. :-)

To make a long story short: I don't think that you have to care about load delay slots in an emulator, because it's the compiler's or assembly programmer's job to make sure that such instruction combinations don't happen.

Okay, I was under the impression that the value of destination register for a load was unchanged until after the next instruction, not undefined. I agree with just making a warning or something when it has a pipeline hazard or something.

-jivera

linuzappz
October 24th, 2003, 16:34
hey, jivera, goldfinger is coding an API for recompilers which as he told me could be easily added to pcsx, it's still in development but you should contact him, i'm sure that it will be a nice addition for pcsx... of course when it's usable ;).

Cyberman
October 24th, 2003, 18:47
I have been for several months wanting to add a certain something to PCSX a Gnu Debug Compatible Debugging interface. Sort of like an integrated debugger but GDB compatible. I've found this a great tool for GBA playing at times :)

I doubt this would work too well in Dynamic REC mode. Also I doubt it would be particularly fast though it doesn't need to be. It might be KEY in improving the already good compatibility with the PS1. Small 'program' glitches can be found and detailed what is going wrong by steping through a section of code and undoing execution to find what is causing the emulator to trip up.

This would also relieve PCSX of needing a debuging console. This is because one can using the NETWORK capability of GDB with an existing debugger for the processor (MIPS R3K in this case). I believe CodeInsight supports it.

I haven't been able to find much on the debugging interface PCSX supports. So I'm pretty clueless with the code though I have managed to add a new plugin type for memcard saves (this might seem insane but there is a method to the madness). In case someone every makes a PocketStation Emu compatible with the Memcard interface (now that would be a complete PS1 emulation LOL).

Since these use the ARM thumb processor might take a bit of time to get one of those working. Might be worth a bit of fun :)

Cyb

jivera
October 25th, 2003, 09:29
linuzappz: hey, jivera, goldfinger is coding an API for recompilers which as he told me could be easily added to pcsx, it's still in development but you should contact him, i'm sure that it will be a nice addition for pcsx...

Sounds interesting.

of course when it's usable ;).

:-)

Cyberman: I have been for several months wanting to add a certain something to PCSX a Gnu Debug Compatible Debugging interface. Sort of like an integrated debugger but GDB compatible. I've found this a great tool for GBA playing at times

You read my mind! That's exactly one of the features I've been considering adding. I loved that about VirtualBoyAdvance (except how the Windows and Linux versions had different feature sets so to do complete debugging I had to use both).

I doubt this would work too well in Dynamic REC mode.

Probably not... we'll see (I'll try to add to the interpreter first though).

This would also relieve PCSX of needing a debuging console. This is because one can using the NETWORK capability of GDB with an existing debugger for the processor (MIPS R3K in this case). I believe CodeInsight supports it.

Hm... can you elaborate on this?

I haven't been able to find much on the debugging interface PCSX supports.

I don't know that it supports one, but I haven't finished going through all the code yet.

*runs off to find how to interface with GDB*

-jivera

Cyberman
October 27th, 2003, 07:52
Cyberman: I have been for several months wanting to add a certain something to PCSX a Gnu Debug Compatible Debugging interface. Sort of like an integrated debugger but GDB compatible. I've found this a great tool for GBA playing at times

You read my mind! That's exactly one of the features I've been considering adding. I loved that about VirtualBoyAdvance (except how the Windows and Linux versions had different feature sets so to do complete debugging I had to use both).

Probably not :) I asked linuzappz about it a few times but I know he's busy with PCSX2 now. (which would be cool to have a GDB interface added onto as well might make making demos for the PS2 .. easier if not more interesting ;) ).

I used VBA with Visual Ham and hamlib for some source level debugging and playing with GBA developement (it's pretty cool to watch your code go crazy instead of compile crash burn and inspect the pieces).


This would also relieve PCSX of needing a debuging console. This is because one can using the NETWORK capability of GDB with an existing debugger for the processor (MIPS R3K in this case). I believe CodeInsight supports it.

Hm... can you elaborate on this?

Using the GDB interface allows you to work with source level/high level debuggers Code Insight is one of the tools that is used to do source level debugging with the SDL interface Visual Ham supports this by loading an ELF file and loading the program through the debugger interface via direct PIPE or a network interface (TCP too a host and socket on that host where the emulator/debugger is running). By running PCSX in debug mode it's entirely possible to watch the boot up process and other things directly. I've attempted to compile PCSX with the debugging and done so successfully but I've not been able to run it with debugging running (sigh).


I haven't been able to find much on the debugging interface PCSX supports.

I don't know that it supports one, but I haven't finished going through all the code yet.

*runs off to find how to interface with GDB*

-jivera

There are a few articles on how to integrate the GDB interface into something well ONE to be exact, (yes kind of vague I apologize for that), I'll have to look for it again and give you a URL. A good disassmbler ( I think the GDB interface handles disassembly not the high level debugger) would be good for GTE and other PSX specific opcodes. The memory map can be inspected read and or written too, but physical 'device' states probably can't be directly inspected unless they are added to being readable via the GDB interface.

I babble, can PCSX as is be compiled with VC++5? tis what I have because VC++7.0 (that I bought and compiled originally PCSX with) won't install on win98 (Win2K where is it hehehe). Or do I need to use MingW to compile it? (make files .. where did I put them).

Cyb

jivera
October 27th, 2003, 20:47
Cyberman: Probably not :) I asked linuzappz about it a few times but I know he's busy with PCSX2 now.

Maybe there's a misunderstanding... I meant I would be interested in adding support to PCSX for GDB.

Using the GDB interface allows you to work with source level/high level debuggers [...]

Sorry, I meant can you elaborate on how the GDB network protocol works, but I have found some documentation about it. One thing I am interested in, however, is being able to debug and step through compiled programs without source code. VBA's built in debugger seemed more effective at this than using the GDB connection, but I didn't spend much time using GDB with it because it seems much more oriented towards use with access to source code.

I babble, can PCSX as is be compiled with VC++5? tis what I have because VC++7.0 (that I bought and compiled originally PCSX with) won't install on win98 (Win2K where is it hehehe). Or do I need to use MingW to compile it? (make files .. where did I put them).

I don't have a personal copy of VC++, but I added a patch to PCSX to allow compilation under MinGW. Attached you can find my latest patched development version.

In the pcsx--mainline--1.6--patch-4 subdirectory run "make -f Makefile.mingw" and it should compile. You need brcc32, however. You can get it from Borland's free compiler download or, alternatively, if you're really desperate I can send you a copy of the file it generates so you can just skip it. Tell me if anything goes wrong.

-jivera