SkyBoost (topic 4)

Post » Sun May 20, 2012 5:24 am

amd phenom II x4 955 BE @ 3.20 GHz:
x87: 3745 ms
sse1: 5008 ms
sse2: 4774 ms

So this basically compared how my CPU handles instruction sets huh? So if programs have multiple versions, I shouldn't be choosing ssix versions and instead go for the FPU/x87 version? If so, this topic has helped me in 2 ways now :)
User avatar
Kitana Lucas
 
Posts: 3421
Joined: Sat Aug 12, 2006 1:24 pm

Post » Sat May 19, 2012 9:07 pm

Core i5-480M @ 2.66 GHz
testing x87
Starting matrix multiplication loop...
Elapsed time: 2824 ms

testing sse1
Starting matrix multiplication loop...
Elapsed time: 2855 ms

testing sse2
Starting matrix multiplication loop...
Elapsed time: 2606 ms
User avatar
jessica sonny
 
Posts: 3531
Joined: Thu Nov 02, 2006 6:27 pm

Post » Sun May 20, 2012 7:45 am

Core i5 M520 @ 2.4 GHz (laptop)


---------- testing started ----------
testing x87

Starting matrix multiplication loop...
Elapsed time: 3697 ms
testing sse1

Starting matrix multiplication loop...
Elapsed time: 3635 ms
testing sse2

Starting matrix multiplication loop...
Elapsed time: 3510 ms
---------- testing finished ----------
User avatar
Christina Trayler
 
Posts: 3434
Joined: Tue Nov 07, 2006 3:27 am

Post » Sun May 20, 2012 9:40 am

Wee!

Intel i7 980X @ 4.2GHz

C:\temp>echo off---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 1888 mstesting sse1Starting matrix multiplication loop...Elapsed time: 1919 mstesting sse2Starting matrix multiplication loop...Elapsed time: 1763 ms---------- testing finished ----------
User avatar
carly mcdonough
 
Posts: 3402
Joined: Fri Jul 28, 2006 3:23 am

Post » Sun May 20, 2012 8:38 am

I find it kind of funny how everyone jumped all over them for using x87 code, but that wasn't that bad of a decision to begin with apparently. It appears to be best for AMD machines, and only slightly worse than SSE2 for Intel machines.
User avatar
Chloé
 
Posts: 3351
Joined: Sun Apr 08, 2007 8:15 am

Post » Sat May 19, 2012 10:14 pm

So, wait, I'm confused. If x87 is the most optimized for my AMD CPU and that is vanilla, then why am I getting a boost from tesval/skyboost?
User avatar
Beulah Bell
 
Posts: 3372
Joined: Thu Nov 23, 2006 7:08 pm

Post » Sun May 20, 2012 10:19 am

I find it kind of funny how everyone jumped all over them for using x87 code, but that wasn't that bad of a decision to begin with apparently. It appears to be best for AMD machines, and only slightly worse than SSE2 for Intel machines.

SSE2 on my Core2Duo is 27% better the x87 (2215 vs. 3042 ms for the test). That's a lot more than "slightly worse". The situation is different on the i5 and i7 processors, but SSE2 all the way for me.
User avatar
Leah
 
Posts: 3358
Joined: Wed Nov 01, 2006 3:11 pm

Post » Sun May 20, 2012 8:37 am

So, wait, I'm confused. If x87 is the most optimized for my AMD CPU and that is vanilla, then why am I getting a boost from tesval/skyboost?
Because there is a bunch of other stuff the mod is doing that doesn't involve math.
User avatar
Vickytoria Vasquez
 
Posts: 3456
Joined: Thu Aug 31, 2006 7:06 pm

Post » Sun May 20, 2012 1:01 am

So, wait, I'm confused. If x87 is the most optimized for my AMD CPU and that is vanilla, then why am I getting a boost from tesval/skyboost?

Inline functions.
User avatar
carla
 
Posts: 3345
Joined: Wed Aug 23, 2006 8:36 am

Post » Sun May 20, 2012 8:25 am

Don't know if you still need more but here are my results

Intel Core i5-750 @2.67GHz
---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 3884 mstesting sse1Starting matrix multiplication loop...Elapsed time: 3962 mstesting sse2Starting matrix multiplication loop...Elapsed time: 3932 ms---------- testing finished ----------
User avatar
Elizabeth Falvey
 
Posts: 3347
Joined: Fri Oct 26, 2007 1:37 am

Post » Sat May 19, 2012 6:50 pm

So, wait, I'm confused. If x87 is the most optimized for my AMD CPU and that is vanilla, then why am I getting a boost from tesval/skyboost?
Inline functions.
It's also worth noting that if Bethesda compiled with optimisations enabled -- which include a whole variety of optimisation techniques, not just inlining -- we'd be getting better performance than TESVAL and Skyboost provide for both AMD and Intel platforms.
User avatar
Helen Quill
 
Posts: 3334
Joined: Fri Oct 13, 2006 1:12 pm

Post » Sun May 20, 2012 12:18 am

AMD Phenom II X4 3GHZ

---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 3963 mstesting sse1Starting matrix multiplication loop...Elapsed time: 5429 mstesting sse2Starting matrix multiplication loop...Elapsed time: 5366 ms
User avatar
Lily Something
 
Posts: 3327
Joined: Thu Jun 15, 2006 12:21 pm

Post » Sun May 20, 2012 5:54 am

I7 740Q @ 1.73GHz.
x87 - 4774
SSE1 - 4353
SSE2 - 4196
User avatar
Tai Scott
 
Posts: 3446
Joined: Sat Jan 20, 2007 6:58 pm

Post » Sun May 20, 2012 8:23 am

You can't say you didn't get fast feedback..... :)
User avatar
Stephanie I
 
Posts: 3357
Joined: Thu Apr 05, 2007 3:28 pm

Post » Sun May 20, 2012 1:25 am

i5 2500K - OC (4.7Ghz)

---------- testing started ----------
testing x87

Starting matrix multiplication loop...
Elapsed time: 1498 ms
testing sse1

Starting matrix multiplication loop...
Elapsed time: 1560 ms
testing sse2

Starting matrix multiplication loop...
Elapsed time: 1467 ms
---------- testing finished ----------
User avatar
Ells
 
Posts: 3430
Joined: Thu Aug 10, 2006 9:03 pm

Post » Sun May 20, 2012 7:28 am

C2D E7400 @ 3 GHz

---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 3588 mstesting sse1Starting matrix multiplication loop...Elapsed time: 3089 mstesting sse2Starting matrix multiplication loop...Elapsed time: 2683 ms---------- testing finished ----------
User avatar
Vahpie
 
Posts: 3447
Joined: Sat Aug 26, 2006 5:07 pm

Post » Sat May 19, 2012 8:15 pm

Intel Core2 Quad Q8200 3.1 ghz

---------- testing started ----------
testing x87

Starting matrix multiplication loop...
Elapsed time: 3401 ms
testing sse1

Starting matrix multiplication loop...
Elapsed time: 2948 ms
testing sse2

Starting matrix multiplication loop...
Elapsed time: 2574 ms
---------- testing finished ----------
User avatar
Marina Leigh
 
Posts: 3339
Joined: Wed Jun 21, 2006 7:59 pm

Post » Sun May 20, 2012 12:04 am



There is no SSE3 test amongst them

the tests are

1. x87 FPU (pre SSE)
2. SSE1
3. SSE2.

x87 is apparently fastest on amd (heh, maybe intel wasn't wrong with its ICC "optimisations"

Oh youre Right, what the hell was i looking at then? haha...
User avatar
GPMG
 
Posts: 3507
Joined: Sat Sep 15, 2007 10:55 am

Post » Sun May 20, 2012 3:30 am

Nice overlock, man!

Thanks, this took me lots of tweaking and lots of testing! :biggrin:

Anyway, here's a load of tests. Taken from computers around the house.

CPU: AMD Athlon 64 X2 4800+ @ 2.5GHz
---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 5141 mstesting sse1Starting matrix multiplication loop...Elapsed time: 6203 mstesting sse2Starting matrix multiplication loop...Elapsed time: 5938 ms---------- testing finished ----------

CPU: Intel Pentium Dual Core E5200 @ 2.5GHz
---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 4218 mstesting sse1Starting matrix multiplication loop...Elapsed time: 3594 mstesting sse2Starting matrix multiplication loop...Elapsed time: 3172 ms---------- testing finished ----------

CPU: Intel Core 2 Duo E8400 @ 3.0GHz
---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 3682 mstesting sse1Starting matrix multiplication loop...Elapsed time: 3074 mstesting sse2Starting matrix multiplication loop...Elapsed time: 2652 ms---------- testing finished ----------

That's it, should help. :)
User avatar
Charity Hughes
 
Posts: 3408
Joined: Sat Mar 17, 2007 3:22 pm

Post » Sun May 20, 2012 2:48 am

i5 2500K - OC (4.7Ghz)

---------- testing started ----------
testing x87

Starting matrix multiplication loop...
Elapsed time: 1498 ms
testing sse1

Starting matrix multiplication loop...
Elapsed time: 1560 ms
testing sse2

Starting matrix multiplication loop...
Elapsed time: 1467 ms
---------- testing finished ----------

*shakes tiny fist*

i2600K@4.8Ghz (I won't normally run this, my heatsink can't deal with 8 cores transcoding at this in a comfy temperature :P)
---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 1466 mstesting sse1Starting matrix multiplication loop...Elapsed time: 1513 mstesting sse2Starting matrix multiplication loop...Elapsed time: 1436 ms---------- testing finished ----------Press any key to continue . . .
User avatar
DeeD
 
Posts: 3439
Joined: Sat Jul 14, 2007 6:50 pm

Post » Sun May 20, 2012 7:41 am

I am starting to think that this CPU test is completely useless - systems that should be VERY NEAR - like i5 750 and i7 920 @3.8-4Ghz with HT disabled are getting completely different results:

for example:
Intel i5 750 4,0 Ghz

---------- testing started ----------
testing x87

Starting matrix multiplication loop...
Elapsed time: 2714 ms
testing sse1

Starting matrix multiplication loop...
Elapsed time: 2745 ms
testing sse2

Starting matrix multiplication loop...
Elapsed time: 2715 ms
---------- testing finished ----------

On i7 920 @3.8Ghz and HT disabled I am getting:

---------- testing started ----------testing x87Starting matrix multiplication loop...Elapsed time: 2091 mstesting sse1Starting matrix multiplication loop...Elapsed time: 2137 mstesting sse2Starting matrix multiplication loop...Elapsed time: 1981 ms---------- testing finished ----------

The only real difference between these CPU's (with HT disabled) is that i920 has triple channel DDR3 memory controller and i5 - dual channel. I can't run it under Vtune, but most probably it is hitting some memory hierarchy / prefetch bottlenecks and not testing performance.

Of course user could have got his results wrong as well :)
User avatar
Naazhe Perezz
 
Posts: 3393
Joined: Sat Aug 19, 2006 6:14 am

Post » Sun May 20, 2012 7:58 am

I am starting to think that this CPU test is completely useless - systems that should be VERY NEAR - like i5 750 and i7 920 @3.8-4Ghz with HT disabled are getting completely different results:
Something to keep in mind: This is more of a want to see the differences between those three stats per system, not really trying to compare systems themselves to one another. The other is that the more things open or going on in the background the higher the numbers will be. Even turning off firefox with the skyrimnexus page open dropped my numbers about 100ms. Again though: This is for comparising the different methods, not the different systems. It is not an [censored] benchmark.

ED: wait, e-youknowhat is censored? hahahaha.
User avatar
Lyndsey Bird
 
Posts: 3539
Joined: Sun Oct 22, 2006 2:57 am

Post » Sat May 19, 2012 10:52 pm

Yeah I don't think we need any more, the test shows:

AMD -> x87

Intel -> SSE2
User avatar
Stephanie I
 
Posts: 3357
Joined: Thu Apr 05, 2007 3:28 pm

Post » Sun May 20, 2012 8:29 am

Something to keep in mind: This is more of a want to see the differences between those three stats per system, not really trying to compare systems themselves to one another. The other is that the more things open or going on in the background the higher the numbers will be. Even turning off firefox with the skyrimnexus page open dropped my numbers about 100ms. Again though: This is for comparising the different methods, not the different systems. It is not an [censored] benchmark.

The flawed micro benchmarks lead to flawed conclusions about what versions to build for what CPUs. It doesn't take assembly god to google around to see things like this:

http://www.geeks3d.com/20100711/test-simple-x87-vs-sse2-performance-test-with-matrix-multiplication/

Where users are getting double performance increase in same micro benchmark ( multiplying matrixes) going from x87 to SSE2 on Intel machines. I am not seeing such gains here and that leads to conclusion that something is wrong.
User avatar
Blackdrak
 
Posts: 3451
Joined: Thu May 17, 2007 11:40 pm

Post » Sat May 19, 2012 6:50 pm

The flawed micro benchmarks lead to flawed conclusions about what versions to build for what CPUs. It doesn't take assembly god to google around to see things like this:

http://www.geeks3d.com/20100711/test-simple-x87-vs-sse2-performance-test-with-matrix-multiplication/

Where users are getting double performance increase in same micro benchmark ( multiplying matrixes) going from x87 to SSE2 on Intel machines. I am not seeing such gains here and that leads to conclusion that something is wrong.
What you're not taking into account in that case is that this mod does more then just change those methods, it also inlines a tonne of extraneous function calls that spend more time getting to and from the function the recovering the memory location the function itself was written to return. That's what the core of tesval did and what skyboost extends upon with further stuff like what he's asking us for feedback upon now.

ED: Er sorry wait scratch that, you're not talking about the mod but how the different methods vary.
User avatar
Eibe Novy
 
Posts: 3510
Joined: Fri Apr 27, 2007 1:32 am

PreviousNext

Return to V - Skyrim