[benchmark] Performances stability and scalability : V

» Tue Nov 20, 2012 3:58 pm

Hello everyone. I felt there was things left untested regarding performances, so I decided to fill this missing piece.

Objectives
We already know that native calls suspend your script for one frame (with the exceptions of non-delayed calls that do not put your script on hold and latent calls that may take longer). For the rest, I was not interested in comparising different instructions speed, this is not the goal. Instead I wanted to test how performances behave regarding scalability (multiple scripts running at the same time) and stability (whether the results are stable or not).

Procedure
I only tested those loops. I was not interested in the rest because I suspect that in papyrus every elementary operation lasts about the same time because the execution time is far smaller than the fat added by the VM on every instruction (I know there is a benchmark on those forums that present results incompatible with this but its results have error margins greater than 100%). This is just a guess, though.

int i = 0int iterations = 10000000 ; 10Mwhile i &--#60; iterations	i += 1endwhile

* All things are expressed as iterations per frame. If you want to convert that to time, remember that with the default settings papyrus is only executed 1.5ms per frame.
* Tests were ran by looking at the floor in the Helgen dungeon at the beginning of the game without any other mod. The CPU has four cores (core i5 2500k)
* The temporal resolution of GetCurrentRealTime is unknown but keep in mind that it is probably "inaccurate" (I bet on a win32 timer with a 15ms resolution, so up to one frame for error margin).
* Speaking of error margin, since a script can end and start waiting for GetCurrentRealTime at the beginning or the end of a frame, there is one more frame of margin. So a total of two frames with the timer. Also the ips was not really constant and varied by +-1ips, this is a 3% relative error.
* On stability tests, I ran 20 batches of 100k (about 20 frames). After each batch I measured the elapsed duration and printed it.
* On scalabilty tests, I had up to 16 differents quests with one different script each (since we know that time slices are per-script, not per-object), running 10M iterations (about 2k frames). All scripts were clones of each other. Their code was:

event OnInit()	RegisterForSingleUpdate(10)endeventevent OnUpdate()	float start = Utility.GetCurrentRealTime()	; RUn loop	float elapsedFrames = (Utility.GetCurrentRealTime() - start) * ips - 1 ; 1 because one frame is needed for GetCurrentRealTime()	float speed = iterations / elapsedFrames	debug.trace(...)endevent

* Note that because things run on OnUpdate, they all start perfectly synchronized at the beginning of a frame.

Results
Regarding stabiity, the slowest batch had 4939 iterations per frame and the fastest had 5336 iterations per frame. The difference amounts to 7% and is within the error margin (two frames is 10% here). All in all, the VM seem to be stable enough.

Now those numbers are very low, in a one to thousand order of magnitude ratio when compared to C/C++/C#. This is papyrus...

Now, what happens when we have different quests, each with its own script, running their loops?
* 1 quest: 5203 iterations per frame.
* 2 quests: 2757 for the slowest quest, 2781 for the fastest quest. Total: 5638
* 4 quests: 1369 to 1371 iterations per frame. Total: 5480
* 8 quests: 693 to 694 iterations per frame. Total: 5544
* 16 quests: 348 to 348 iterations per frame. Total: 5568

The big surprise is that Skyrim only uses one core for papyrus. Indeed, since the loop cannot be sliced and spread on many cores, the first quest could only use one core. Yet we saw no improvements by increasing the number of concurrent running quests. So this means that parallelizing is only a way to avoid the waiting time associated with native calls, but it is not a way to increase the available computing power.

The second observation is that the VM does indeed behave as SmkViper described it: a multithreaded OS. It did not slow down the loop when there was only one quest, this single loop was able to enjoy nearly the full power available (what's lacking could have been used by other scripts non-related to my test). Now, our error margin here is 0.4%, while with two quests, we still had about a 1% difference in the numbers, and all scripts ended about at the same time according to the logs. This shows that the VM, while not perfect, does a pretty good job at distributing the computing power across scripts.

» Tue Nov 20, 2012 1:43 pm

Good work!

» Tue Nov 20, 2012 10:21 am

Thank you.

I may decide to a run a few more tests to check a few things, like the computing power in different places and during fights, or to check my hypothesis that all operations have about the same cost.

Now, I would like to hear more about the mono-threaded nature of the VM, this really surprised me. Maybe it is one thread per mod? I should test this too.