找回密码
 欢迎注册
楼主: gxqcn

[擂台] 求bit位为1数目不超出2的正整数

[复制链接]
发表于 2009-2-4 16:47:31 | 显示全部楼层
希望能得到GxQ机器的结果 你的P4和我的不同
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 17:06:27 | 显示全部楼层
GreaterEqualBit2_24F(314159265)=0x14000000 GreaterEqualBit2_25F(314159265)=0x14000000 GreaterEqualBit2_36F(314159265)=0x14000000 GreaterEqualBit2_57F(314159265)=0x14000000 GreaterEqualBit2_58F(314159265)=0x14000000 GreaterEqualBit2_59F(314159265)=0x14000000 GreaterEqualBit2_75F(314159265)=0x14000000 24# 56449.240 ms 25# 51673.252 ms 36# 49594.482 ms 57# 52741.690 ms 58# 49340.659 ms 59# 52868.823 ms 75# 49191.056 ms 以上是AMD CPU的测试数据,Intel 的等我回家测。。。
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 17:07:11 | 显示全部楼层
无心人编译的程序无法在我的机器上运行, 所以自己写代码编译测试(顺带将86# shshsh_0510 的代码一并测试), 结果如下:
GreaterEqualBit2_24F(314159265)=0x14000000 GreaterEqualBit2_25F(314159265)=0x14000000 GreaterEqualBit2_36F(314159265)=0x14000000 GreaterEqualBit2_57F(314159265)=0x14000000 GreaterEqualBit2_58F(314159265)=0x14000000 GreaterEqualBit2_59F(314159265)=0x14000000 GreaterEqualBit2_75F(314159265)=0x14000000 GreaterEqualBit2_86F(314159265)=0x14000000 n: 1-2^31 24#: 25 s 25#: 29 s 36#: 32 s 57#: 26 s 58#: 31 s 59#: 28 s 75#: 24 s 86#: 17 s Press any key to continue
注:本帖所测代码最后修改于 2009-02-04 20:00

GreaterEqualBit2.zip

3.45 KB, 下载次数: 0, 下载积分: 金币 -1 枚, 经验 1 点, 下载 1 次

编译好的测试程序

毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:08:08 | 显示全部楼层
25#可进一步优化为 _declspec(naked) UINT32 GreaterEqualBit2_25F(UINT32 n) { __asm { mov eax, [esp+4] mov edx, eax bsr ecx, edx btr edx, ecx bsr ecx, edx jz exit25 btr edx, ecx cmp edx, 0 je exit25 mov edx, 1 shl edx, cl sub edx, 1 or eax, edx add eax, 1 exit25: ret } } ============================== Core2 Xeon 1.6G测试 24# 24303.416 ms 25# 22223.639 ms 36# 26344.195 ms 57# 22893.317 ms 58# 25275.783 ms 59# 24195.312 ms 75# 23295.180 ms ===================== 已经完全超越其他函数 并测试从1-15的结果均对 ======================= 代码OK, 重新更新下载去
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:16:49 | 显示全部楼层
79#更新了 GxQ如果是以79#测试的, 要重新测试 25#应该时间减少 ================================ 另外, 25#可以输入0, 并正常产生输出, 结果是0 24也是, 36结果不对, 其他么测, 程序就异常了
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:18:09 | 显示全部楼层
49# 的程序我用汇编写了一下 我进行了简单测试,由于比较复杂不知是否正确,但应该差不多吧。 发现之所以慢,是因为功能本身太小,如果代码较长编译器稍稍放松一点就白费劲了。 另外,全部展开太复杂,就偷懒在3,4层将gxq的copy进去了,总之就是想说,不用bsr 未必就不行 第一次写汇编,还请各位多指教
  1. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////
  2. //m 1-2 ,2 bits :mark
  3. // 0: at least 3 ones
  4. // 1: 2 ones
  5. // 2: 1 one
  6. // 3: 0 one
  7. //x 3-12 ,10 bits:
  8. // when contains at least 2 ones ,then x= upper-one
  9. // eg : 11001 - 100000 ,1001 - 10000 ,0011-0100
  10. //y 13-22 ,10 bits:
  11. // when contains 2 ones, y= GreaterEqualBit2(first byte+1)
  12. // eg : 1010-1100 ,0011 -0100
  13. //z 23-32 ,10 bits:
  14. // when contains at least 3 ones , it is y =GreaterEqualBit2(first byte) ,so just return z<<24
  15. // eg : 0110010 -1000000,101011 -110000
  16. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////
  17. const UINT32 Data[256] = {
  18. 0xC0000000,0x80000000,0x80000000,0x40401000,0x80000000,0x40801800,0x40802000,0x00800008,
  19. 0x80000000,0x41002800,0x41003000,0x0100000C,0x41004000,0x01000010,0x01000010,0x01000010,
  20. 0x80000000,0x42004800,0x42005000,0x02000014,0x42006000,0x02000018,0x02000018,0x02000018,
  21. 0x42008000,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,0x02000020,
  22. 0x80000000,0x44008800,0x44009000,0x04000024,0x4400A000,0x04000028,0x04000028,0x04000028,
  23. 0x4400C000,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,0x04000030,
  24. 0x44010000,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,
  25. 0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,0x04000040,
  26. 0x80000000,0x48010800,0x48011000,0x08000044,0x48012000,0x08000048,0x08000048,0x08000048,
  27. 0x48014000,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,0x08000050,
  28. 0x48018000,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,
  29. 0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,0x08000060,
  30. 0x48020000,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  31. 0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  32. 0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  33. 0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,0x08000080,
  34. 0x80000000,0x50020800,0x50021000,0x10000084,0x50022000,0x10000088,0x10000088,0x10000088,
  35. 0x50024000,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,0x10000090,
  36. 0x50028000,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,
  37. 0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,0x100000A0,
  38. 0x50030000,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  39. 0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  40. 0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  41. 0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,0x100000C0,
  42. 0x50040000,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  43. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  44. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  45. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  46. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  47. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  48. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,
  49. 0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100,0x10000100
  50. };
  51. ///////////////////////////////////////////////////////////////////////////////////////////////////////////////
  52. // if (first byte has at least 3 ones )
  53. // {
  54. // get z and return z<<24
  55. // }
  56. // if (first byte has 2 ones )
  57. // {
  58. // if ( other bits are all 0)
  59. // return (first byte)<<24
  60. // else
  61. // get y, return y<<24
  62. // }
  63. // if (first byte has 1 one )
  64. // {
  65. // if (second byte has at least 2 ones )
  66. // return (first byte)<<24 + (x of second byte)<<16
  67. // go to GXQ;
  68. //
  69. // }
  70. // if (first byte has no one ) {
  71. // go to GXQ;
  72. // }
  73. ///////////////////////////////////////////////////////////////////////////////////////////////////////////
  74. _declspec(naked) int GreaterEqualBit2_SHSHSH(int n)
  75. {
  76. __asm
  77. {
  78. mov ecx, dword ptr[esp+0x04]; //save n to ecx
  79. mov eax, ecx;
  80. shr eax, 24; //get first byte
  81. shl eax, 2; //mul 4
  82. mov edx, dword ptr[Data+eax]; //get table item
  83. mov eax, 0xC0000000; //get mark
  84. and eax, edx;
  85. cmp eax,0;
  86. jne first_byte_l_3_ones;
  87. // at least 3 ones in first byte
  88. mov eax, 0x000003FF; //get z 00000000 00000000 00000011 11111111
  89. and eax, edx;
  90. shl eax, 24;
  91. ret;
  92. first_byte_l_3_ones:
  93. cmp eax,1;
  94. jne first_byte_l_2_ones;
  95. // exactly 2 ones in first byte
  96. mov eax, ecx;
  97. shl eax, 8; //other data
  98. cmp eax,0;
  99. jne first_byte_2_ones_plus_sth;
  100. mov eax,ecx;
  101. and eax,0xFF000000;
  102. ret;
  103. first_byte_2_ones_plus_sth:
  104. mov eax, 0x000FFC00; //get y 00000000 00001111 11111100 00000000
  105. and eax, edx;
  106. shl eax, 14;
  107. ret;
  108. first_byte_l_2_ones:
  109. cmp eax,2;
  110. jne return_hardcase;
  111. // 1 one,get second byte
  112. mov eax, 0x00FF0000;
  113. and eax, ecx;
  114. shr eax, 14;
  115. mov edx, dword ptr[Data+eax]; //get table item
  116. mov eax, 0xC0000000; //get mark
  117. and eax, edx;
  118. cmp eax, 0;
  119. jne second_byte_l_3_ones;
  120. //second byte has at least 3 ones
  121. mov eax, 0x3FF00000; //get x 00111111 11110000 00000000 00000000
  122. and eax, edx;
  123. shr eax, 4;
  124. and ecx, 0xFF000000;
  125. add eax, ecx;
  126. ret;
  127. second_byte_l_3_ones:
  128. cmp eax, 1;
  129. jne second_byte_l_2_ones;
  130. //second byte has 2 ones ,same to above
  131. mov eax, 0x3FF00000; //get x 00111111 11110000 00000000 00000000
  132. and eax, edx;
  133. shr eax, 4;
  134. and ecx, 0xFF000000;
  135. add eax, ecx;
  136. ret;
  137. second_byte_l_2_ones:
  138. //second byte has at most 1 one
  139. jmp return_hardcase;
  140. return_hardcase: //copy from GXQ :)
  141. mov edx, ecx;
  142. mov eax, 1;
  143. bsr ecx, edx;
  144. shl eax, cl;
  145. push eax;
  146. xor edx, eax;
  147. bsr ecx, edx;
  148. mov eax, 0;
  149. setne al;
  150. shl eax, cl;
  151. lea ecx, [eax*0x02];
  152. xor edx, eax;
  153. cmovne eax, ecx;
  154. pop edx;
  155. add eax, edx;
  156. ret;
  157. }
  158. }
复制代码
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:23:13 | 显示全部楼层
怎么说呢 感觉是杂货摊 有点不好啊
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 17:26:49 | 显示全部楼层
目前来说 24, 25, 57, 58, 75代表了几种不同的实现
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
 楼主| 发表于 2009-2-4 20:05:20 | 显示全部楼层

回复 86# shshsh_0510 的帖子

厉害啊!第一次写就这么强! 在 82# 的测试中一举夺冠!
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2009-2-4 20:21:40 | 显示全部楼层
明白shshsh_0510代码的意思了 呵呵 属于查表法的变形 变形的很巧妙 不过,是否能简化呢?
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
您需要登录后才可以回帖 登录 | 欢迎注册

本版积分规则

小黑屋|手机版|数学研发网 ( 苏ICP备07505100号 )

GMT+8, 2024-12-23 23:31 , Processed in 0.026242 second(s), 17 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表