View Full Version : Idea for an Xbox 360 Emulator
JoaoHadouken
June 28th, 2012, 17:15
I have been reading some forums threads and most posts says that CUDA wouldn't help in the emulation scenary because the video card would be already stressed with the graphics emulation, so using CUDA, also, would not bring benefits.
Well... basically, my idea is to use 2 video cards.
One video card would emulate the CPU of the Xbox 360 with CUDA (or OpenCL), that would solve the hardware speed
limitation to emulate the Xbox 360 CPU, as the GPU is 14x faster than a CPU!
(Intel assumed that -> http://blogs.nvidia.com/2010/06/gpus-are-only-up-to-14-times-faster-than-cpus-says-intel/)
And the other video card would handle the graphics emulation.
So the concept of the idea, as you can see, it's simple (I'm not saying that the implementation of this idea is simple too): one video card to emulate the CPU of Xbox 360 and the other one to emulate the GPU of the Xbox 360.
I searched in some forums and found that we can choose the graphic card to use CUDA and other one to do something else (in my case, graphics emulation). It's just a configuration task.
I also have read that GPU couldn't emulate the CPU at all, but I think this method could increase some speed in the process, sharing some emulation with cuda and some with the user CPU.
I do want to build my own emulator, of course I won't start with Xbox 360, but my that's my main target. So... before I go forward, and invest money in this project... I want your opinion about this idea... specially from the experienced ones... Thank you in advance! :thumb:
PS: As far as I know, OpenCL is not as mature as CUDA, but comparing the code... they are really similar... so I can start developing the emulator using CUDA and then, when the OpenCL gets more mature, translate the code. It's more advantage to use OpenCL because it can be runned in both video cards: ATI and nVidia.
Kaizen
June 28th, 2012, 21:01
I have been reading some forums threads and most posts says that CUDA wouldn't help in the emulation scenary because the video card would be already stressed with the graphics emulation, so using CUDA, also, would not bring benefits.
Well... basically, my idea is to use 2 video cards.
One video card would emulate the CPU of the Xbox 360 with CUDA (or OpenCL), that would solve the hardware speed
limitation to emulate the Xbox 360 CPU, as the GPU is 14x faster than a CPU!
(Intel assumed that -> http://blogs.nvidia.com/2010/06/gpus-are-only-up-to-14-times-faster-than-cpus-says-intel/)
And the other video card would handle the graphics emulation.
So the concept of the idea, as you can see, it's simple (I'm not saying that the implementation of this idea is simple too): one video card to emulate the CPU of Xbox 360 and the other one to emulate the GPU of the Xbox 360.
I searched in some forums and found that we can choose the graphic card to use CUDA and other one to do something else (in my case, graphics emulation). It's just a configuration task.
I also have read that GPU couldn't emulate the CPU at all, but I think this method could increase some speed in the process, sharing some emulation with cuda and some with the user CPU.
I do want to build my own emulator, of course I won't start with Xbox 360, but my that's my main target. So... before I go forward, and invest money in this project... I want your opinion about this idea... specially from the experienced ones... Thank you in advance! :thumb:
PS: As far as I know, OpenCL is not as mature as CUDA, but comparing the code... they are really similar... so I can start developing the emulator using CUDA and then, when the OpenCL gets more mature, translate the code. It's more advantage to use OpenCL because it can be runned in both video cards: ATI and nVidia.
Let's be super nice and pretend someone skilled enough wants to code an xbox 360 emulator in cuda, they wont be able to do it. There's a lack of the required documentation to emulate the 360 hardware. It's not possible at all without some decent information to work with.
JoaoHadouken
June 28th, 2012, 21:50
Let's be super nice and pretend someone skilled enough wants to code an xbox 360 emulator in cuda, they wont be able to do it. There's a lack of the required documentation to emulate the 360 hardware. It's not possible at all without some decent information to work with.
OK, thanks for the reply. I know that the lack of documentation/information is a big problem... but, let's not worry about this right now, I already know that the implementation won't be easy. And as I said, I won't start building an Xbox 360 emulator right now, so the documentation could appear in this meantime.
My real doubt is if this configuration would bring benefits in general, specially in the speed. Well... thank you anyway.
KrossX
June 28th, 2012, 23:38
In my very minor understanding of things, I think for GPGPU to be useful it needs to be used on extremely parallelizable (is that a word?) workloads. Like rendering stuff, which is what GPUs do so well.
I cannot imagine a point in an emulator (core?) that could benefit from that at all, specially with the latency of getting info back and forth from the GPU.
JoaoHadouken
June 29th, 2012, 00:34
In my very minor understanding of things, I think for GPGPU to be useful it needs to be used on extremely parallelizable (is that a word?) workloads. Like rendering stuff, which is what GPUs do so well.
I cannot imagine a point in an emulator (core?) that could benefit from that at all, specially with the latency of getting info back and forth from the GPU.
I think I understand your point of view... so, let's analyse, you're saying that GPU wouldn't be much useful because the CPU emulation is more serial than parallel?
Well, I don't know what are the techniques for emulating a CPU yet, that's why I came here, but... are you sure that in the CPU emulation there isn't any kind of stuff that could be done in parallel? Any kind of translation, calculation, or anything that need speed? My idea is not to emulate the CPU with ONLY CUDA, it's just to take the heavy part using CUDA. Because of that you pointed another possible problem: the latency. But can this latency between GPU and CPU really decrease the speed that we couldn't have much benefits by using this technology? Do you have any kind of real experience to share for proving me that it's not a good idea to use CUDA?
Thank you! :)
KrossX
June 29th, 2012, 01:11
I never coded an emulator, and I never coded GPGPU stuff so I don't have anything to prove anything. Take it as it is, a mere comment. You'll know your answer once you get your hands dirty with it. Code a Chip8 emu or something, code some GPGPU stuff, etc...
JoaoHadouken
June 29th, 2012, 01:31
hmm... for a moment I thought you're an emu author... anyway, thanks! :)
Code a Chip8 emu or something...
Yeah... I thought in beginning with something really small like this. Thank you for helping...
Anybody else have something to add? Any experience, statics, etc...?
Nintendo Maniac
June 30th, 2012, 01:10
Just an FYI, to the end-user, OpenCL is much more preferred due to its vender neutral-ness.
JoaoHadouken
June 30th, 2012, 01:50
Just an FYI, to the end-user, OpenCL is much more preferred due to its vender neutral-ness.
Yeah, thanks for giving me this information, it proves me that I thinking right.
PS: As far as I know, OpenCL is not as mature as CUDA, but comparing the code... they are really similar... so I can start developing the emulator using CUDA and then, when the OpenCL gets more mature, translate the code. It's more advantage to use OpenCL because it can be runned in both video cards: ATI and nVidia.
I also want to make this emulator cross-platform. So even my beginning emulator (CHIP-8 I think) has to work on the main systems available: Windows, Linux (specially Ubuntu) and Mac OS X. After I get this working fine, I can go forward. I don't want to exclude anyone... :)
-Ashe-
June 30th, 2012, 21:25
There are some major issues with your idea of using a GPU to emulate a CPU...
A GPU is good at dealing with parallel tasks, in a SIMD fashion (so doing the same things on each element of a vector), and generally speaking is mostly focused on math computations...
If you disassemble .xex files, you'll see that games don't launch a bunch of threads or do things in parallels; they're like PC games (I guess I could say they *are* PC games), they have a main thread that does a lot of things, and some smaller threads for limited - asynchronous - tasks (network, sound, kinect...).
Another problem is the kind of instructions a CPU does, as opposed to a GPU. If you take a simple instruction like RLWINM, which is basically a left rotation + applying some mask, this would be really painful to do with a GPU...
Now obviously there are some things that can be done in parallel. For example, something like static recompilation would heavily benefit from being done in parallel, once you've isolated code. Another place where it could be useful is when dealing with reconstructing the operations that have to be done by the GPU (Xbox 360 games use Direct3D 9, but as a static library; you actually have a single function that receives a buffer containing all the instructions generated by the D3D9 code... including shaders, vertex/texture objects, etc). One last one I can think of (after thinking about it for at least half a second) is decoding the XMA streams, since those are normally decoded on dedicated hardware.
Anyway at this point I guess it'd be more interesting for you to focus on getting something to work rather than thinking about optimizations this early in the process. If it works with a CPU, porting it to use a GPU won't be that big a deal, not to mention debugging your emulation is clearly easier with a CPU...
Fadingz
June 30th, 2012, 21:48
GPU emulator will not work simply because all algorithms are not written for massively multi-threaded architectures.
GPU path tracing looks a lot different from single threaded A* path tracing, though derived from A*. Main difference, GPU doesn't support recursion.
In other words, the entire algorithm has to be translated, not just instructions.
Try doing single threaded instruction on GPU, you will get like 1/100 the performance.
GPU is only fast when you write an algorithm that is inherently parallelizable and executable on GPU.
One example, you have to rewrite every single recursion into loops, if even possible.
You also need to have spinning locks or atomic operations implemented to incur memory consistency.
It's just gonna get ugly to force a single threaded algorithm into a massively-multithreded algorithm.
If you port a CPU code to GPU straight and enable multi-threaded kernels,
you are either gonna end up with memory hazard or 1/100 performance.
Bucket sort is a great example for performance.
If you just sort with each block and use atomic operation to collect, the performance will be terrible.
You will need to combine parallel reduction and bubble sort to obtain optimal result on a GPU.
An easier example, sum, is great for memory hazard.
If you let each block take on an address and add onto one destination address with zero as starting value.
The value in the address will be the value of the last accessed block.
Again, you need to use parallel reduction, which takes log base 2 iteration of the size of the array.
Not to mention the fact that it is hard to keep the entire grid saturated.
You have order of execution priority.
Even with asynchronous kernel calls, you get dependencies.
The longest path determines the total execution time.
Even if you can parallelize 90% of the instructions with only one series of instructions left.
If that 10% of instruction takes more time to execute on slower cores (GPU), CPU is gonna beat the crap out of GPU in this case.
Lastly, and most importantly, the most time consuming part of all GPU operation is memory transfer.
You cannot just let the GPU do certain operation and pop the result back to CPU.
CPU and GPU utilize different memories.
For most operations not rendering related, by the time you pass the info down to the GPU, the CPU can already finish the task on its own.
You either do everything on GPU or everything on CPU.
All, GPU is awesome at tasks designed specifically for GPU.
CPU should handle all the misc tasks, unless you have all the tasks well pipelined.
JoaoHadouken
July 1st, 2012, 01:05
Well... OK then... really frustrating, but... are the real facts and I have to deal with it...
-Ashe- and Fadingz... thank you for giving me those advice and information. Hope to find another way of developing an Xbox 360 emulator, I'm going to study, understand better emulation and how Xbox 360 works, etc... anyway I won't give up. :)
lagunareturns
July 1st, 2012, 06:32
Well... OK then... really frustrating, but... are the real facts and I have to deal with it...
-Ashe- and Fadingz... thank you for giving me those advice and information. Hope to find another way of developing an Xbox 360 emulator, I'm going to study, understand better emulation and how Xbox 360 works, etc... anyway I won't give up. :)
you wont understand better until you code emulators yourself especially what is involved when you have lack of documentation... it has been proven time and time again hardwarewise it is not possible to run xbox360 easily or even fast including if you decide to offload the tasks to video cards that doesnt offer less speeds to an actual cpu...
Fadingz
July 1st, 2012, 09:19
xbox utilizes a tri-core architecture.
Here are my notes taken on xbox spec from my graduate class:
(taken from Dr. Milo Martin's Lecture Slide, also available in the public domain)
http://forums.ngemu.com/attachment.php?attachmentid=218051&stc=1&d=1341130518
http://forums.ngemu.com/attachment.php?attachmentid=218052&stc=1&d=1341130518
• ISA Extended with VMX-128 operations
• 128 registers, 128-bits each
• Packed “vector” operations
• Example: four 32-bit floating point numbers
• One instruction: VR1 * VR2 VR3
• Four single-precision operations
• Also supports conversion to Microsoft DirectX data formats
• Similar to Altivec (and Intel’s MMX, SSE, SSE2, etc.)
• Works great for 3D graphics kernels and compression
• Peak performance: ~75 gigaflops
• Gigaflop = 1 billion floating points operations per second
• Pipelined superscalar processor
• 3.2 Ghz operation
• Superscalar: two-way issue
• VMX-128 instructions (four single-precision operations at a time)
• Hardware multithreading: two threads per processor
• Three processor cores per chip
• Result:
• 3.2 * 2 * 4 * 3 = ~77 gigaflops
• ISA: 64-bit PowerPC chip
• RISC ISA
• Like MIPS, but with condition codes
• Fixed-length 32-bit instructions
• 32 64-bit general purpose registers (GPRs)
http://forums.ngemu.com/attachment.php?attachmentid=218053&stc=1&d=1341130518
Each Xenon chip:
• 165 million transistors
• IBM’s 90nm process
• Three cores
• 3.2 Ghz
• Two-way superscalar
• Two-way multithreaded
• Shared 1MB cache
http://forums.ngemu.com/attachment.php?attachmentid=218057&stc=1&d=1341130560
Pipeline:
• Four-instruction fetch
• Two-instruction “dispatch”
• Five functional units
• “VMX128” execution
“decoupled” from other units
• 14-cycle VMX dot-product
• Branch predictor:
• “4K” G-share predictor
• Unclear if 4KB or 4K 2-bit
counters
• Per thread
http://forums.ngemu.com/attachment.php?attachmentid=218056&stc=1&d=1341130560
Memory:
• 128B cache blocks throughout
• 32KB 2-way set-associative instruction cache (per core)
• 32KB 4-way set-associative data cache (per core)
• Write-through, lots of store buffering
• Parity
• 1MB 8-way set-associative second-level cache (per chip)
• Special “skip L2” prefetch instruction
• MESI cache coherence
• Error Correcting Codes (ECC)
• 512MB GDDR3 DRAM, dual memory controllers
• Total of 22.4 GB/s of memory bandwidth
• Direct path to GPU
http://forums.ngemu.com/attachment.php?attachmentid=218055&stc=1&d=1341130518
GPU parent die:
• 232 million transistors
• 500 Mhz
• 48 unified shader ALUs
• Mini-cores for graphics
http://forums.ngemu.com/attachment.php?attachmentid=218054&stc=1&d=1341130518
GPU daughter die:
• 100 million
transistors
• 10MB eDRAM
• “Embedded”
• NEC Electronics
• Anti-aliasing
• Render at 4x
resolution,
then sample
• Z-buffering
• Track the
“depth” of
pixels
• 256GB/s internal
bandwidth
Hard core Rikki
July 1st, 2012, 11:34
Accelerating instruction execution is only viable when they already are properly emulated. CUDA could be useful for accelerating development processes and research, not execution.
System compatibility would also suffer. What about users of discrete GPUs, ATI? Performance aside, the output couldn't even be guaranteed across hardware.
JoaoHadouken
July 1st, 2012, 13:00
@Fadingz, again, thank you. I downloaded the PDF (https://www.cis.upenn.edu/~cis501/lectures/11_xbox.pdf) where did you get these information from. I will study it :)
Accelerating instruction execution is only viable when they already are properly emulated. CUDA could be useful for accelerating development processes and research, not execution.
System compatibility would also suffer. What about users of discrete GPUs, ATI? Performance aside, the output couldn't even be guaranteed across hardware.
hmm... I do understand... but do you think this method could solve the speed emulation problem? Of course, it doesn't need to be 100%, but 20% ~ 30% would be a nice increase of speed...
blueshogun96
July 4th, 2012, 09:17
Every time I see a post about Xbox 360 emu theory, the most important perspectives are missed... \/_\/
You guys ONLY focus on hardware and think almost nothing of the software aspect. Without that, you have nothing to go by. You can come up with a million and one ideas to emulate the hardware, or even manage to emulate the hardware perfectly, but if you insist on being clueless with the software side (i.e. BIOS, HDD format, Dashboard, .xex files, etc.) then you're wasting your time altogether because these things don't magically come documented or available either. Has the .xex file format been properly documented yet? How does the BIOS boot sequence go/work? What dashboards have been dumped? These are some of the things you have to think of before you can even think of touching anything hardware related.
Another thing, while I did enjoy scanning Fadingz post on how the hardware interacts and what not (which is also important to know), what good is it if you don't have register level information on the chipsets? Where are the devices located in the 64GB address space (MMIO) or what ports are they mapped to? Do you have any information on the exclusive instructions the 360's CPU has? Have you thought about how the GPU can directly access video memory (seriously, that's f@#%ing scary!!!) and how you'd possibly emulate that? Do you even know what you need to emulate the vector units or what instruction set it's based off of?? What else is there to emulating sound besides the Sis audio chip (what DSPs are we dealing with)? How does the gamepad interact on a hardware level? Those are just a few obstacles to think of in the beginning stages. You can know what the hardware specs are, and still get nowhere fast if you don't know how it works on a byte level; knowing the hardware specs is only 20% of the battle at most. Of course, it's impossible to lay out all of the requirements in the beginning, but eventually you need to ask yourself how are you going to [eventually] document all this? Some things you'll need to have right from the start, other things you can eventually reverse engineer.
As for finding register level documentation on these chipsets, some are easier to find than others. The GPU is based off of the ATI R600, and the specs are freely available as well as USB2.0 specs needed to emulate the gamepads and other input devices. There's even someone who's managed to write homebrew code to interface with the GPU on a low level, which can give you something to test on. The CPU is another story. It's common sense to find some doc on PPC64, but that's not enough to emulate it actually. Besides the fact that you have to emulate 3 of those suckers, what about the VUs? They use a special instruction set called AltiVec (IIRC). I searched the net for documentation on this, but haven't found s@#%. I did get my hands on a datasheet when I was working for Microsoft 2 years ago, but anything else I did have access to (which wasn't very much) was closely guarded and a breach of such information would not only get me fired but even blacklisted. Finding good and accurate information on the CPU is hard enough to come by.
And I don't even want to think about threading... argh!
thelittlegumnut
July 4th, 2012, 09:20
holy shiz i got a long way to go till i make an emu.
KrossX
July 4th, 2012, 11:37
Not if you make a chip8 emu. That's the usual starting point.
JoaoHadouken
July 4th, 2012, 13:51
hmmm... I see your point, BlueShogun. I thank you by the information and by the kind of questions that I have to ask myself to start working on a emu... I know these things are really important... but let's not put the cart before the horse... I don't intend to build an Xbox 360 emulator right now, it was just an idea I have had a year ago and wanted to share now... I don't know when I'm going to step into Xbox 360 emulation, perhaps I can help you in Xbox 1 emulation first... I don't know yet. Any way, your reply was really useful. Thanks! :)
fischkopf
July 4th, 2012, 18:08
hmmm... I see your point, BlueShogun. I thank you by the information and by the kind of questions that I have to ask myself to start working on a emu... I know these things are really important... but let's not put the cart before the horse... I don't intend to build an Xbox 360 emulator right now, it was just an idea I have had a year ago and wanted to share now... I don't know when I'm going to step into Xbox 360 emulation, perhaps I can help you in Xbox 1 emulation first... I don't know yet. Any way, your reply was really useful. Thanks! :)
Yes, better start with Xbox 1 emulation, and only if that's perfectly working (probably never?) then go on from there.
-Ashe-
July 4th, 2012, 19:19
You guys ONLY focus on hardware and think almost nothing of the software aspect.
Actually it's the other way around, I'm working on a high level emulator, so I don't really care about how the hardware works :)
Besides the fact that you have to emulate 3 of those suckers, what about the VUs? They use a special instruction set called AltiVec (IIRC).
Power ISA reference document, Book 1, chapter 6...
blueshogun96
July 5th, 2012, 08:20
Actually it's the other way around, I'm working on a high level emulator, so I don't really care about how the hardware works :)
Power ISA reference document, Book 1, chapter 6...
You can't even do HLE if you don't understand certain parts of the hardware. HLE is not a magic solution to avoid the hardware aspect completely (Cxbx is a prime example). I haven't looked into how the command buffers work yet, but if it's anything like push buffers on Xbox1, you'll have to understand a little bit about the GPU in order to effectively decode them.
How far have you gotten with your HLE emu?
hmmm... I see your point, BlueShogun. I thank you by the information and by the kind of questions that I have to ask myself to start working on a emu... I know these things are really important... but let's not put the cart before the horse... I don't intend to build an Xbox 360 emulator right now, it was just an idea I have had a year ago and wanted to share now... I don't know when I'm going to step into Xbox 360 emulation, perhaps I can help you in Xbox 1 emulation first... I don't know yet. Any way, your reply was really useful. Thanks! :)
Sorry if I sounded angry, but I just get tired of people throwing ideas without any idea how to do it themselves of even knowing if it's feasable. Not saying you're one of those people, but still, sometimes these threads can get really annoying and fast... especially when you've listed your own feasible plans time after time again.
I still stick by my idea of a PPC64 -> x86-64 static recompiler.
-Ashe-
July 5th, 2012, 10:51
How far have you gotten with your HLE emu?
So far I worked on it for about 2 months starting in September last year, then I picked it again 2 weeks ago... so not very far :)
I can load executables from ISO files and STFS packages (and extract pretty much every single piece of information they have, but that's not really emulation unless I want to write my own dashboard), and execute simple things like a hello world or a spinning triangle...
Reversing the buffer with the GPU commands takes time, and I'm working on the debugger far more than the 3D parts of the emulation at this point, so I'm not really expecting to get anything graphical running anytime soon...
Apart from those simple things, gamepads are working (but those are basically the same xinput calls anyway), creating threads and other basic features like those (mutex and such), some networking (the winsock-ish API, not the Xbox Live stuff, obviously), etc...
blueshogun96
July 5th, 2012, 18:42
So far I worked on it for about 2 months starting in September last year, then I picked it again 2 weeks ago... so not very far :)
I can load executables from ISO files and STFS packages (and extract pretty much every single piece of information they have, but that's not really emulation unless I want to write my own dashboard), and execute simple things like a hello world or a spinning triangle...
Reversing the buffer with the GPU commands takes time, and I'm working on the debugger far more than the 3D parts of the emulation at this point, so I'm not really expecting to get anything graphical running anytime soon...
Apart from those simple things, gamepads are working (but those are basically the same xinput calls anyway), creating threads and other basic features like those (mutex and such), some networking (the winsock-ish API, not the Xbox Live stuff, obviously), etc...
Excellent. How are you emulating the CPU? Static-rec, JIT, what?
-Ashe-
July 5th, 2012, 19:05
Oh no right now it's just an interpreter, I'd rather have some kind of reference point I can compare with before moving to static recompilation...
JayFoxRox
July 5th, 2012, 21:36
Sta-rec using llvm should actually be easier than an interpreter. It's a little bit harder to get started but once you have the base working it would be easy to debug and really fast. You could also detect code patterns more easily.
-Ashe-
July 6th, 2012, 00:20
Yes but I don't want to use llvm, it's far more fun doing it yourself
blueshogun96
July 6th, 2012, 05:31
I recommend taking JayFox's advice, but suit yourself.
-Ashe-
July 6th, 2012, 06:11
I've worked on many compilers and recompilers in the past, and it's a hobby, so not sure why I'd use a library to do what I wanted to code this emulator for in the first place ;)
JayFoxRox
July 6th, 2012, 13:42
llvm could also only provide the optimization passes etc. which are VERY complex. It'd take years to implement only parts of those.
For a project like this performance is critical, even if you don't want realtime emulation.
llvm is not a drop-in PPC emulator either, so there is still plenty of work, very similar to normal interpreter development (Converting your high-level code to llvm code).
But at the end of the day its obviously up to you.
Good luck with your project anyway.
Is there a repo already? Or do you have any screenshots?
-Ashe-
July 6th, 2012, 19:35
It's not opensource, and I'm not sure a screenshot of a spinning triangle would be very useful :p
blueshogun96
July 6th, 2012, 22:51
I'm not sure a screenshot of a spinning triangle would be very useful :p
It would actually. Makes your project more believable (not that I'm doubting you). Face it, to the uneducated people who still don't believe it's possible, they can't walk by faith alone. Whether you want to prove a point is up to you.
-Ashe-
July 6th, 2012, 23:13
Yeah, I guess I just don't care enough about what people might think of my personal-boredom-killing-hobby ;)
But sure when I have something more interesting to show (maybe the least technically complex indie game displaying its main menu) I'll post it around here
blueshogun96
July 7th, 2012, 04:54
Yeah, I guess I just don't care enough about what people might think of my personal-boredom-killing-hobby ;)
But sure when I have something more interesting to show (maybe the least technically complex indie game displaying its main menu) I'll post it around here
Yeah, I wouldn't care either. One of the greatest and most interesting points for me in an emulator's life is the day that it shows some of the little stuff like a triangle or a few boxes. I'm just making excuses for you to show me a screen of your project. :)
... please?
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.