找回密码
 欢迎注册
楼主: gxqcn

[擂台] 求bit位为1数目不超出2的正整数

[复制链接]
发表于 2009-2-4 16:47:31 | 显示全部楼层
希望能得到GxQ机器的结果
你的P4和我的不同
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 17:06:27 | 显示全部楼层
GreaterEqualBit2_24F(314159265)=0x14000000
GreaterEqualBit2_25F(314159265)=0x14000000
GreaterEqualBit2_36F(314159265)=0x14000000
GreaterEqualBit2_57F(314159265)=0x14000000
GreaterEqualBit2_58F(314159265)=0x14000000
GreaterEqualBit2_59F(314159265)=0x14000000
GreaterEqualBit2_75F(314159265)=0x14000000
24# 56449.240 ms
25# 51673.252 ms
36# 49594.482 ms
57# 52741.690 ms
58# 49340.659 ms
59# 52868.823 ms
75# 49191.056 ms

以上是AMD CPU的测试数据,Intel 的等我回家测。。。
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 17:07:11 | 显示全部楼层
无心人编译的程序无法在我的机器上运行,
所以自己写代码编译测试(顺带将86# shshsh_0510 的代码一并测试),
结果如下:
GreaterEqualBit2_24F(314159265)=0x14000000
GreaterEqualBit2_25F(314159265)=0x14000000
GreaterEqualBit2_36F(314159265)=0x14000000
GreaterEqualBit2_57F(314159265)=0x14000000
GreaterEqualBit2_58F(314159265)=0x14000000
GreaterEqualBit2_59F(314159265)=0x14000000
GreaterEqualBit2_75F(314159265)=0x14000000
GreaterEqualBit2_86F(314159265)=0x14000000


n: 1-2^31

24#: 25 s
25#: 29 s
36#: 32 s
57#: 26 s
58#: 31 s
59#: 28 s
75#: 24 s
86#: 17 s
Press any key to continue


注:本帖所测代码最后修改于 2009-02-04 20:00

GreaterEqualBit2.zip

3.45 KB, 下载次数: 0, 下载积分: 金币 -1 枚, 经验 1 点, 下载 1 次

编译好的测试程序

毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:08:08 | 显示全部楼层
25#可进一步优化为

_declspec(naked)
UINT32 GreaterEqualBit2_25F(UINT32 n)
{
__asm {
mov eax, [esp+4]
    mov edx, eax
    bsr ecx, edx
    btr edx, ecx
    bsr  ecx, edx
    jz exit25
    btr edx, ecx
    cmp edx, 0
    je exit25
    mov edx, 1
    shl  edx, cl
    sub edx, 1
    or eax, edx
    add eax, 1
exit25:
ret
    }
}

==============================
Core2 Xeon 1.6G测试
24# 24303.416 ms
25# 22223.639 ms
36# 26344.195 ms
57# 22893.317 ms
58# 25275.783 ms
59# 24195.312 ms
75# 23295.180 ms
=====================
已经完全超越其他函数
并测试从1-15的结果均对
=======================
代码OK, 重新更新下载去
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:16:49 | 显示全部楼层
79#更新了

GxQ如果是以79#测试的, 要重新测试

25#应该时间减少
================================
另外, 25#可以输入0, 并正常产生输出, 结果是0
24也是, 36结果不对, 其他么测, 程序就异常了
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:18:09 | 显示全部楼层
49# 的程序我用汇编写了一下
我进行了简单测试,由于比较复杂不知是否正确,但应该差不多吧。
发现之所以慢,是因为功能本身太小,如果代码较长编译器稍稍放松一点就白费劲了。
另外,全部展开太复杂,就偷懒在3,4层将gxq的copy进去了,总之就是想说,不用bsr 未必就不行
第一次写汇编,还请各位多指教
  1. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////
  2. //m 1-2 ,2 bits :mark
  3. //           0: at least 3 ones
  4. //           1: 2 ones
  5. //           2: 1 one
  6. //           3: 0 one
  7. //x 3-12 ,10 bits:
  8. //          when contains at least 2 ones ,then x= upper-one
  9. //          eg : 11001 - 100000 ,1001 - 10000 ,0011-0100
  10. //y 13-22 ,10 bits:
  11. //          when contains  2 ones, y= GreaterEqualBit2(first byte+1)
  12. //          eg   :  1010-1100 ,0011 -0100
  13. //z 23-32 ,10 bits:
  14. //          when contains at least 3 ones , it is y =GreaterEqualBit2(first byte) ,so just return z<<24
  15. //          eg  :  0110010 -1000000,101011 -110000
  16. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////

  17. const UINT32 Data[256] = {
  18.         0xC0000000,0x80000000,0x80000000,0x40401000,0x80000000,0x40801800,0x40802000,0x00800008,
  19.         0x80000000,0x41002800,0x41003000,0x0100000C,0x41004000,0x01000010,0x01000010,0x01000010,
  20.         0x80000000,0x42004800,0x42005000,0x02000014,0x42006000,0x02000018,0x02000018,0x02000018,
  21.         0x42008000,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,
  22.         0x80000000,0x44008800,0x44009000,0x04000024,0x4400A000,0x04000028,0x04000028,0x04000028,
  23.         0x4400C000,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,
  24.         0x44010000,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,
  25.         0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,
  26.         0x80000000,0x48010800,0x48011000,0x08000044,0x48012000,0x08000048,0x08000048,0x08000048,
  27.         0x48014000,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,
  28.         0x48018000,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,
  29.         0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,
  30.         0x48020000,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  31.         0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  32.         0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  33.         0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  34.         0x80000000,0x50020800,0x50021000,0x10000084,0x50022000,0x10000088,0x10000088,0x10000088,
  35.         0x50024000,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,
  36.         0x50028000,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,
  37.         0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,
  38.         0x50030000,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  39.         0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  40.         0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  41.         0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  42.         0x50040000,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  43.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  44.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  45.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  46.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  47.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  48.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  49.         0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100
  50. };
  51. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////
  52. //        if  (first byte has at least 3 ones  )
  53. //        {
  54. //                 get z and return z<<24
  55. //        }
  56. //        if  (first byte has 2 ones  )
  57. //        {
  58. //                 if ( other bits are  all 0)
  59. //                          return (first byte)<<24
  60. //                 else
  61. //                          get y,  return y<<24
  62. //        }
  63. //        if (first byte has 1 one )
  64. //        {
  65. //                 if  (second byte has at least 2 ones  )
  66. //                         return (first byte)<<24 + (x of second byte)<<16
  67. //                 go to GXQ;
  68. //                          
  69. //        }
  70. //        if (first byte has no one ) {
  71. //                 go to GXQ;
  72. //        }
  73. ///////////////////////////////////////////////////////////////////////////////////////////////////////////
  74. _declspec(naked) int GreaterEqualBit2_SHSHSH(int n)
  75. {
  76.         __asm
  77.         {
  78.                 mov  ecx, dword ptr[esp+0x04]; //save n to ecx
  79.                 mov  eax, ecx;
  80.                 shr  eax, 24; //get first byte
  81.                 shl  eax, 2;  //mul 4
  82.                 mov  edx, dword ptr[Data+eax]; //get table item
  83.                
  84.                 mov  eax, 0xC0000000; //get mark
  85.                 and  eax, edx;

  86.                 cmp  eax,0;  
  87.                 jne   first_byte_l_3_ones;
  88.                 // at least 3 ones in first byte
  89.                 mov  eax, 0x000003FF; //get z 00000000 00000000 00000011 11111111
  90.                 and  eax, edx;
  91.                 shl  eax, 24;
  92.                 ret;

  93. first_byte_l_3_ones:
  94.                 cmp  eax,1;   
  95.                 jne   first_byte_l_2_ones;
  96.                 // exactly 2 ones in first byte
  97.                 mov  eax, ecx;
  98.                 shl  eax, 8; //other data
  99.                 cmp  eax,0;  
  100.                 jne   first_byte_2_ones_plus_sth;
  101.                 mov  eax,ecx;
  102.                 and  eax,0xFF000000;
  103.                 ret;
  104. first_byte_2_ones_plus_sth:
  105.                 mov  eax, 0x000FFC00; //get y 00000000 00001111 11111100 00000000
  106.                 and  eax, edx;
  107.                 shl  eax, 14;
  108.                 ret;
  109. first_byte_l_2_ones:
  110.                 cmp  eax,2;  
  111.                 jne return_hardcase;
  112.                 //  1 one,get second byte
  113.                 mov  eax, 0x00FF0000;
  114.                 and  eax, ecx;
  115.                 shr  eax, 14;
  116.                 mov  edx, dword ptr[Data+eax]; //get table item

  117.                 mov  eax, 0xC0000000; //get mark
  118.                 and  eax, edx;
  119.                
  120.                 cmp  eax, 0;  
  121.                 jne  second_byte_l_3_ones;
  122.                 //second byte has at least 3 ones
  123.                 mov  eax, 0x3FF00000; //get x 00111111 11110000 00000000 00000000
  124.                 and  eax, edx;
  125.                 shr  eax, 4;
  126.                 and  ecx, 0xFF000000;
  127.                 add  eax, ecx;
  128.                 ret;
  129. second_byte_l_3_ones:
  130.                 cmp  eax, 1;  
  131.                 jne  second_byte_l_2_ones;
  132.                 //second byte has 2 ones ,same to above
  133.                 mov  eax, 0x3FF00000; //get x 00111111 11110000 00000000 00000000
  134.                 and  eax, edx;
  135.                 shr  eax, 4;
  136.                 and  ecx, 0xFF000000;
  137.                 add  eax, ecx;
  138.                 ret;
  139. second_byte_l_2_ones:
  140.                 //second byte has at most 1 one
  141.                 jmp return_hardcase;

  142. return_hardcase: //copy from GXQ  :)
  143.         mov     edx, ecx;
  144.         mov     eax, 1;
  145.         bsr     ecx, edx;
  146.         shl     eax, cl;
  147.         push    eax;
  148.         xor     edx, eax;
  149.         bsr     ecx, edx;
  150.         mov     eax, 0;
  151.         setne   al;
  152.         shl     eax, cl;
  153.         lea     ecx, [eax*0x02];
  154.         xor     edx, eax;
  155.         cmovne  eax, ecx;
  156.         pop     edx;
  157.         add     eax, edx;
  158.         ret;
  159.         }
  160. }

复制代码
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:23:13 | 显示全部楼层
怎么说呢

感觉是杂货摊
有点不好啊
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:26:49 | 显示全部楼层
目前来说

24, 25, 57, 58, 75代表了几种不同的实现
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 20:05:20 | 显示全部楼层

回复 86# shshsh_0510 的帖子

厉害啊!第一次写就这么强!
在 82# 的测试中一举夺冠!
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 20:21:40 | 显示全部楼层


明白shshsh_0510代码的意思了
呵呵

属于查表法的变形
变形的很巧妙

不过,是否能简化呢?
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
您需要登录后才可以回帖 登录 | 欢迎注册

本版积分规则

小黑屋|手机版|数学研发网 ( 苏ICP备07505100号 )

GMT+8, 2024-3-29 21:45 , Processed in 0.052817 second(s), 17 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表