无心人 发表于 2008-4-15 09:52:42

:)

liangbch的代码确实有高效的地方

liangbch 发表于 2011-11-16 19:48:27

51# liangbch

贴出在intel 双核处理器E8500的运算结果, 可以看出相对于PIV,虽然频率提升有限,但整数乘法的速度有极大的提高,但SSE2,MMX指令的性能提高的幅度却相对较小

函数                    E8500(3.16G)
UInt480x480_ALU         310.890 ms
UInt480x480_ALU2          385.515 ms

UInt480x480_MMX         304.725 ms
UInt480x480_MMX2          353.785 ms
UInt480x480_MMX3          350.830 ms

UInt480x480_base30_ALU    275.235 ms
UInt480x480_base30_MMX    260.435 ms
UInt480x480_base30_SSE2   197.840 ms

G-Spider 发表于 2011-11-16 20:42:38

intel 双核处理器E8500 的整数乘,确实有极大的提高,优于Intel Core i5 . 多个版本是必须的,呵呵。
用LS的包测试:

UInt480x480_ALU1392.100 ms
UInt480x480_ALU2 1919.780 ms

UInt480x480_MMX728.820 ms

UInt480x480_base30_ALU 932.190 ms
UInt480x480_base30_MMX 629.200 ms
UInt480x480_base30_SSE2 663.225 ms

G-Spider 发表于 2013-6-8 23:13:08

理论上,CPU存在多个执行单元,调整指令顺序,减少指令依赖可以提升速度。然而我之前的种种实验表明,这种调整指令顺序,提高并行度的做法并无多大效果,不知何故。

另一种行之有效的优化循环的办法是减少控制指 ...
liangbch 发表于 2008-4-11 10:41 http://bbs.emath.ac.cn/images/common/back.gif
1. "All modern x86 processors can execute instructions out of order."意味着调整指令顺序基本没有太大的效果。;Out-of-order execution
mov eax,
imul eax, 6
mov , eax
mov ebx,
add ebx, 2
mov , ebx乱序执行,如果不在cache中,则imul无法继续,但是第4条mov ebx,不依赖于前面的指令,所以也已经开始执行了。

2. 寄存器重命名技术register renamingmov eax,
imul eax, 6
mov , eax
mov eax,
add eax, 2
mov , eax"the CPU is able to use different physical registers for the same logical register eax." 所以我们看到的只是逻辑上的eax,在微内核会被重命名,所以也不用担心在乱序执行的时候会干扰。
This means that the above code is changed inside the CPU to a code that uses four different physical registers for eax. The first register is used for the value loaded from
. The second register is used for the output of the imul instruction. The third register is used for the value loaded from . And the fourth register is used for the output of the add instruction.
The use of different physical registers for the same logical register enables the CPU to make the last three instructions in exampleindependent of the first three instructions.

The CPU must have a lot of physical registers for this mechanism to work efficiently. The number of physical registers is different for different microprocessors, but
you can generally assume that the number is sufficient for quite a lot of instruction reordering.
页: 1 2 3 4 5 6 [7]
查看完整版本: 大数运算基选择测试