DariusDThe point of sse3 and sse4 using is ambiguous , because :
- Timing is like on every other bottleneck , those people who has very fast cpus with sse4 support just don't need such code improvements
- SSSE3 is like 5% speedup from SSE2 on huge calculations , we don't have such here , 1% impact will do nothing
- In order to generate SSE3+ from high level code icc must be running on intel cpu of 6th series like c2d etc (
many cpuid checks inside of the compiler) and anyway because I'm using amd I wont be able to test something more than SSE3 (it has zero improvement comparing to SSE2) even if I'll find the way to force icc use SSE3+

Currently i have all dot code rewritten to high level because sse realization from Arisu on amd runs 5x times slower than pure x87 or sse1 generated by compiler
Xetrill, newt111As i wrote before SSE3 has zero improvement comparing to SSE2 , SSSE3 has 5% gain top with huge calculations . Comparing intel to msvc compilation intel has 1 more fps on SSE2 SkyBoost version on my amd cpu . Currently the code path is one , size is huge cuz i need to check every byte to be from original game
Squall LeonhartNot only dispatcher , compiler capabilities are limited while using it on amd
vejnHow much fps in skyrim do u have on vanilla game ?