One of the big advantages of dynamic recompilation is that you can actually use the registers of the host machine when an emulated instruction uses registers. This can not only reduce the amount of instructions needed to emulate another instruction but also minimize slow memory accesses for every referenced register.
The process of mapping the emulated registers to the registers of the host machine is called
register allocation.
Unfortunately a lot of dynarecs in emulators still fetch all needed register values from the
register file (just a memory structure where the contents of all registers of the emulated processor) in the beginning of each emulated instruction and write the result back to the register file afterwards. Only a little better are those that don't store the value of the result register if it is used as input in the following instruction.
This really spoils a large part of what dynamic recompilation is about, but still seen quite often in "real world" examples.
There are two different methods of allocating registers:
Static register allocation: This means that in every translated block the same emulated registers are always allocated to the same host registers. When the host machine has enough registers to hold all emulated registers this is the optimal solution, but there are also some advantages even if it has fewer registers (mainly related to timing and translation block handling; I'll cover that later). In the latter case this means that only some of the emulated registers are held in host registers (ideally the most often used ones) and for the remaining ones the register file is accessed.
Dynamic register allocation: In every translation block the registers are allocated differently. Ideally you load the values of the emulated registers into the host registers at the beginning of the block and store those that were modified back to the register file before the block returns control to the dispatcher loop. If there aren't enough registers to hold all registers used in the block you'll have store the value held in a host register to free it for the next one.
Implementation wise static register allocation is rather easy, because you always use the same host registers to hold specific emulated registers and access the memory locations in the register file for all the others. This also means that you always load the same register values and store them to the same memory locations afterwards no matter what translated code block you are executing. So you only need one setup/clean-up code for all blocks which is occasionally called
glue code because it's the thing that connects the emulator and the generated code.
The implementation of dynamic register allocation is a bit more complicated though. First of all there is more bookkeeping to do, since the emulator has to remember these facts:
- which host register holds which emulated register: you need to know if the register is already in use and where to store the value to if you need to free the register
- which emulated register is held in which host register: this tells you if the register is already allocated, and if it is you know which register to use in the generated code
- has the value held in the host register been modified? This isn't really necessary, but it's a good idea not to store a value to the register file that hasn't been changed since you spare another unnecessary memory access.
How does register replacement work when you run out of registers?
There are very many methods that could be applied, and probably only one would be ideal, which basically means that you replace that register which isn't needed in that block anymore. Since you do have the entire block you could actually go through it backwards to find out if there is a register the code doesn't use anymore, but that can be a bit tedious.
That ideal solution reminded me of the best but theoretical solution for a page replacement algorithm in operating systems, so I came up with the idea of using another page replacement algorithm called
second chance, which is similar to LRU (least recently used) but simpler to implement.
For that algorithm you set a reference flag for the host register every time it is referenced during code generation. When you need a register you go through all host registers in a circle (using a modulo operation with the maximal number of host registers), pick the next register register where the reference flag isn't set, while unsetting the reference flag of all registers you have to skip. Probably sounds a bit weird, but it's easy to implement and should lead to good results.
Of course if you come up with an easy implementation of the optimal solution that would be perfect.
I guess that's the most important things about register allocation...
If you want to know how to handle 64-bit registers on the IA-32 then take a look at the 1964 documentation (
http://e64.wwemu.com/emus/1964/1964_...er_doc_101.pdf).
__________________
The crownless again shall be King
Last edited by M.I.K.e7; June 18th, 2002 at 11:31..