PDA

View Full Version : Improving Speed


Spacy
December 8th, 2007, 15:48
Hi, I just used AMD's CodeAnalyst to create a performance profile of VBA.

The following list works as follows: The left column shows the function name, the right column shows the amount of CPU cycles spent in it.

This means, that the functions with the highest values will most likely be most worth optimizing.



Profile with SOUND DISABLED:

<table border="0" cellspacing="0" cols="2" frame="void" rules="none"> <colgroup><col width="321"><col width="123"></colgroup> <tbody> <tr> <td align="left" height="21" width="321">CPULoop</td> <td sdval="7168" sdnum="1031;" align="right" width="123">7168</td> </tr> <tr> <td align="left" height="21">armExecute</td> <td sdval="3306" sdnum="1031;" align="right">3306</td> </tr> <tr> <td align="left" height="21">thumbExecute</td> <td sdval="1975" sdnum="1031;" align="right">1975</td> </tr> <tr> <td align="left" height="21">cpuMasterCodeCheck</td> <td sdval="1833" sdnum="1031;" align="right">1833</td> </tr> <tr> <td align="left" height="20">mode0RenderLineAll</td> <td sdval="1596" sdnum="1031;" align="right">1596</td> </tr> <tr> <td align="left" height="21">copyImage</td> <td sdval="1082" sdnum="1031;" align="right">1082</td> </tr> <tr> <td align="left" height="20">codeTicksAccessSeq16</td> <td sdval="293" sdnum="1031;" align="right">293</td> </tr> <tr> <td align="left" height="20">mode0RenderLine</td> <td sdval="185" sdnum="1031;" align="right">185</td> </tr> <tr> <td align="left" height="20">dataTicksAccess16</td> <td sdval="155" sdnum="1031;" align="right">155</td> </tr> <tr> <td align="left" height="20">codeTicksAccess16</td> <td sdval="138" sdnum="1031;" align="right">138</td> </tr> <tr> <td align="left" height="20">codeTicksAccessSeq32</td> <td sdval="124" sdnum="1031;" align="right">124</td> </tr> <tr> <td align="left" height="20">codeTicksAccess32</td> <td sdval="70" sdnum="1031;" align="right">70</td> </tr> <tr> <td align="left" height="20">dataTicksAccess32</td> <td sdval="60" sdnum="1031;" align="right">60</td> </tr> <tr> <td align="left" height="21">inflate_fast</td> <td sdval="56" sdnum="1031;" align="right">56</td> </tr> <tr> <td align="left" height="21">doDMA</td> <td sdval="28" sdnum="1031;" align="right">28</td> </tr> <tr> <td align="left" height="21">CPUUpdateTicks</td> <td sdval="27" sdnum="1031;" align="right">27</td> </tr> <tr> <td align="left" height="21">CPULoadRom</td> <td sdval="14" sdnum="1031;" align="right">14</td> </tr> <tr> <td align="left" height="21">Gba_Pcm_Fifo::timer_overflowed</td> <td sdval="13" sdnum="1031;" align="right">13</td> </tr> <tr> <td align="left" height="20">dataTicksAccessSeq32</td> <td sdval="12" sdnum="1031;" align="right">12</td> </tr> <tr> <td align="left" height="21">crc32_little</td> <td sdval="11" sdnum="1031;" align="right">11</td> </tr> <tr> <td align="left" height="21">CPUCheckDMA</td> <td sdval="11" sdnum="1031;" align="right">11</td> </tr> <tr> <td align="left" height="21">CPUUpdateRegister</td> <td sdval="10" sdnum="1031;" align="right">10</td> </tr> <tr> <td align="left" height="21">Blip_Synth<16,1>::offset_resampled</td> <td sdval="9" sdnum="1031;" align="right">9</td> </tr> <tr> <td align="left" height="21">Gba_Pcm::update</td> <td sdval="9" sdnum="1031;" align="right">9</td> </tr> <tr> <td align="left" height="21">CPUCompareVCOUNT</td> <td sdval="8" sdnum="1031;" align="right">8</td> </tr> <tr> <td align="left" height="21">agbPrintFlush</td> <td sdval="8" sdnum="1031;" align="right">8</td> </tr> <tr> <td align="left" height="21">CzWINDOWEDFIR::CzWINDOWEDFIR</td> <td sdval="8" sdnum="1031;" align="right">8</td> </tr> <tr> <td align="left" height="21">inflate_table</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="21">Direct3DDisplay::render</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="21">soundEvent</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="20">BIOS_LZ77UnCompVram</td> <td sdval="3" sdnum="1031;" align="right">3</td> </tr> <tr> <td align="left" height="21">inflate</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">CPUSwitchMode</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">floor</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">vsprintf_s</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">AfxGetModuleState</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">VBA::OnIdle</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">systemDrawScreen</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">Gb_Apu::write_register</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">blip_eq_t::generate</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CPUInterrupt</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CPUSoftwareInterrupt</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CPUUpdateFlags</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">BIOS_RLUnCompVram</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">cheatsCheckKeys</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> </tbody> </table>

Profile with SOUND ENABLED:

<table border="0" cellspacing="0" cols="2" frame="void" rules="none"> <colgroup><col width="321"><col width="123"></colgroup> <tbody> <tr> <td align="left" height="21" width="321">CPULoop</td> <td sdval="7131" sdnum="1031;" align="right" width="123">7131</td> </tr> <tr> <td align="left" height="21">armExecute</td> <td sdval="3186" sdnum="1031;" align="right">3186</td> </tr> <tr> <td align="left" height="21">thumbExecute</td> <td sdval="1990" sdnum="1031;" align="right">1990</td> </tr> <tr> <td align="left" height="21">cpuMasterCodeCheck</td> <td sdval="1822" sdnum="1031;" align="right">1822</td> </tr> <tr> <td align="left" height="20">mode0RenderLineAll</td> <td sdval="1625" sdnum="1031;" align="right">1625</td> </tr> <tr> <td align="left" height="21">copyImage</td> <td sdval="1117" sdnum="1031;" align="right">1117</td> </tr> <tr> <td align="left" height="20">codeTicksAccessSeq16</td> <td sdval="293" sdnum="1031;" align="right">293</td> </tr> <tr> <td align="left" height="20">mode0RenderLine</td> <td sdval="182" sdnum="1031;" align="right">182</td> </tr> <tr> <td align="left" height="20">codeTicksAccess16</td> <td sdval="147" sdnum="1031;" align="right">147</td> </tr> <tr> <td align="left" height="20">dataTicksAccess16</td> <td sdval="146" sdnum="1031;" align="right">146</td> </tr> <tr> <td align="left" height="20">codeTicksAccessSeq32</td> <td sdval="135" sdnum="1031;" align="right">135</td> </tr> <tr> <td align="left" height="20">dataTicksAccess32</td> <td sdval="69" sdnum="1031;" align="right">69</td> </tr> <tr> <td align="left" height="20">codeTicksAccess32</td> <td sdval="65" sdnum="1031;" align="right">65</td> </tr> <tr> <td align="left" height="21">inflate_fast</td> <td sdval="60" sdnum="1031;" align="right">60</td> </tr> <tr> <td align="left" height="21">doDMA</td> <td sdval="25" sdnum="1031;" align="right">25</td> </tr> <tr> <td align="left" height="21">CPUUpdateTicks</td> <td sdval="25" sdnum="1031;" align="right">25</td> </tr> <tr> <td align="left" height="21">Blip_Synth<16,1>::offset_resampled</td> <td sdval="17" sdnum="1031;" align="right">17</td> </tr> <tr> <td align="left" height="21">CPUUpdateRegister</td> <td sdval="16" sdnum="1031;" align="right">16</td> </tr> <tr> <td align="left" height="20">dataTicksAccessSeq32</td> <td sdval="15" sdnum="1031;" align="right">15</td> </tr> <tr> <td align="left" height="21">CPULoadRom</td> <td sdval="14" sdnum="1031;" align="right">14</td> </tr> <tr> <td align="left" height="21">crc32_little</td> <td sdval="12" sdnum="1031;" align="right">12</td> </tr> <tr> <td align="left" height="21">agbPrintFlush</td> <td sdval="10" sdnum="1031;" align="right">10</td> </tr> <tr> <td align="left" height="21">CPUCheckDMA</td> <td sdval="9" sdnum="1031;" align="right">9</td> </tr> <tr> <td align="left" height="21">CzWINDOWEDFIR::CzWINDOWEDFIR</td> <td sdval="8" sdnum="1031;" align="right">8</td> </tr> <tr> <td align="left" height="21">Stereo_Mixer::mix_mono</td> <td sdval="7" sdnum="1031;" align="right">7</td> </tr> <tr> <td align="left" height="21">Gba_Pcm_Fifo::timer_overflowed</td> <td sdval="7" sdnum="1031;" align="right">7</td> </tr> <tr> <td align="left" height="21">soundEvent</td> <td sdval="6" sdnum="1031;" align="right">6</td> </tr> <tr> <td align="left" height="21">CPUUpdateCPSR</td> <td sdval="6" sdnum="1031;" align="right">6</td> </tr> <tr> <td align="left" height="21">DirectInput::readDevice</td> <td sdval="5" sdnum="1031;" align="right">5</td> </tr> <tr> <td align="left" height="21">Gb_Square::run</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="21">Gb_Apu::run_until_</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="20">mode0RenderLineNoWindow</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="21">CPUCompareVCOUNT</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="20">BIOS_LZ77UnCompWram</td> <td sdval="4" sdnum="1031;" align="right">4</td> </tr> <tr> <td align="left" height="21">inflate_table</td> <td sdval="3" sdnum="1031;" align="right">3</td> </tr> <tr> <td align="left" height="21">Gb_Noise::run</td> <td sdval="3" sdnum="1031;" align="right">3</td> </tr> <tr> <td align="left" height="21">Gba_Pcm::update</td> <td sdval="3" sdnum="1031;" align="right">3</td> </tr> <tr> <td align="left" height="21">cheatsCheckKeys</td> <td sdval="3" sdnum="1031;" align="right">3</td> </tr> <tr> <td align="left" height="21">memmove</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">OpenAL::write</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">Direct3DDisplay::render</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">Gb_Wave::run</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">Gb_Apu::write_register</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">BIOS_RLUnCompVram</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="20">BIOS_LZ77UnCompVram</td> <td sdval="2" sdnum="1031;" align="right">2</td> </tr> <tr> <td align="left" height="21">floor</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">inflate</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">_VEC_memzero</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">fastzero_I</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CWinThread::Run</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">DirectInput::readDevices</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">Direct3DDisplay::clear</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">VBA::OnIdle</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">systemDrawScreen</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">Gb_Sweep_Square::clock_sweep</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">Gba_Pcm::apply_control</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CPUUpdateFlags</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> <tr> <td align="left" height="21">CPUUpdateRender</td> <td sdval="1" sdnum="1031;" align="right">1</td> </tr> </tbody> </table>

shashClp
December 8th, 2007, 16:57
Without the game tested and ammount of calls per function, this is quite useless :P

Spacy
December 8th, 2007, 17:09
You are invited to run CodeAnalyst or another profiling tool yourself. You can gain much more information than I posted here.

This is just for reference to give the other developers an idea of where to begin.

shashClp
December 8th, 2007, 17:26
Yep, I use VTune quite a lot while optimizing. What I was to point, is that this is useless to any serious developer, as without the call ammount, it's the same as blind optimization, and that should be noted to anyone spending it's time on optimization.

Spacy
December 8th, 2007, 17:35
I didn't know about VTune before, but looks like I'd have to buy it, AMD's CodeAnalyst is for free and works for my Intel Core 2 Duo as well.

mudlord
December 8th, 2007, 20:01
AMD's CodeAnalyst is for free and works for my Cure2Duo as well.

Hmmm, I'll need to keep that in mind when I buy my new CPU, video card, harddrive and case this week...Which is why I havent done any major SVN commits, and most likely will be inactive until my new development system is all setup.

On topic though, shash made a valid point. Without the number of times each function is called, its not that helpful. That being said, also, some items in that list didnt surprise me. Blargg's code is shown having the least amount of cycles being allocated (which is due to optimizations already on that code), while components of the graphics core have the highest, which should be a focus area.