找回密码
 欢迎注册
查看: 54484|回复: 20

[讨论] Toom-Cook 乘法

[复制链接]
发表于 2008-3-28 12:36:04 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?欢迎注册

×
当数据规模稍大时,Toom-Cook 乘法是比 Karatsuba 乘法更快的算法,当然 相对于Karatsuba 算法,这个算法更难一些。我曾想自己实现一个Toom-3 和Toom-4算法,但限于时间和精力,一直没有来得及动手。相信本版许多人也对Toom-Cook算法感兴趣,先贴出一些资料,欢迎大家补充和讨论。

  以下内容转自 http://everything2.com/index.pl?node=Toom-Cook+multiplication&lastnode_id=346167&searchy.x=38&searchy.y=7
  1. Description
  2. The Toom-Cook algorithm for multiplication of large numbers on computer is a divide and conquer approach that combines features of many other methods. Like Karatsuba multiplication, it operates by dividing the input numbers into limbs of smaller size, and expresses the larger product in terms of calculations made on the smaller pieces. Recursion can be applied to do the calculations on these smaller pieces, again using Toom-Cook, until the individual pieces are small enough to succumb to a more straightforward approach such as the same "long multiplication" performed on pencil and paper.

  3. The method takes the names of its two discoverers. In 1963, Andrei Toom produced a construction that proved that the time complexity of multiplication had an upper bound that was considerably lower than the O(n2) of conventional long multiplication. The proof included an implicit description of the method, which Stephen Cook included, slightly modified, in his 1966 PhD thesis. One feature of the Toom-Cook method is it does not define a single solution; instead, it defines an entire family of methods for efficient multiplication. Implementors have considerable freedom to select a particular Toom-Cook method that makes most sense on a particular computer architecture, such as by making use of powers of 2 which suit the computer's binary notation.

  4. The method has considerable commonality with multiplication using the fast Fourier Transform, in that it works on the same principles of polynomial multiplication. The input numbers are split into limbs of a given size, and each is written as a polynomial, using the limb size as radix. Instead of multiplying the polynomials directly, they are evaluated at a set of points, and the values multiplied together at those points. The product polynomial is then determined, based on the products at those points. Finally, substitution of the radix returns the final answer. The degrees of freedom available to choose an appropriate algorithm are the number of limbs the input is divided into, and the points at which the polynomials are evaluated.

  5. An example
  6. Let us take an example, multiplying 123,456,789 by 987,654,321. We can apply a "Toom-3" algorithm, splitting each number into 3 limbs of length 3 digits, and can choose a set of points, for example, {-2, -1, 0, 1, 2}. The calculations so far look like this:

  7. A(x) = 123x2 + 456x + 789
  8. B(x) = 987x2 + 654x + 321

  9. x= -2  A=  369  B= 2961  AB=  1092609
  10. x= -1  A=  456  B=  654  AB=   298224
  11. x=  0  A=  789  B=  321  AB=   253269
  12. x=  1  A= 1368  B= 1962  AB=  2684016
  13. x=  2  A= 2193  B= 5577  AB= 12230361
  14. We could have used Toom-Cook again to perform the multiplications AB, but this will do for illustrative purposes. There are efficient ways to calculate the values of A and B, such as the method of finite differences, which can be used to calculate the successive values of A and B using only addition. It now remains to find the polynomial P(x)=A(x)B(x) based on the values at the five points. There is a general solution to this problem, but we can do much better. The solution we are looking for is of the form
  15. P(x) = p4x4 + p3x3 + p2x2 + p1x + p0
  16. and we can substitute the values of x at the points to get a set of simultaneous equations that are linear in the unknowns pi:
  17. 16p4 - 8p3 + 4p2 - 2p1 + p0 =  1092609
  18.   p4 -  p3 +  p2 -  p1 + p0 =   298224
  19.                          p0 =   253269
  20.   p4 +  p3 +  p2 +  p1 + p0 =  2684016
  21. 16p4 + 8p3 + 4p2 + 2p1 + p0 = 12230361
  22. That's a few linear equations to solve, with a little algebra. Fortunately many of the terms are easy to separate and we get the solution:
  23. p4 =  121401
  24. p3 =  530514
  25. p2 = 1116450
  26. p1 =  662382
  27. p0 =  253269
  28. which we can finally turn back to a number by substituting x=1000 in the product polynomial P(x). The easiest way to do this is to just write each coefficient with the right place value.
  29. 121,401
  30.     530,514
  31.       1,116,450
  32.             662,382
  33.                 253,269
  34. =======================
  35. 121,932,631,112,635,269
  36. That may have seemed like hard work, but the difficult steps (evaluating the values and solving the equations) turn out to be quite easy for a computer to do, requiring simple operations such as addition, subtraction, shifts, and multiplication and division by small constants. The important part is to note that the body of the problem was reduced to five multiplies, each on numbers about one-third the size of the original inputs.
  37. Variations
  38. The simplest variations on the Toom algorithm rely on judicious choices of the set of points to evaluate all the polynomials at. A good choice will mean that the initial evaluations and the solution of the equations are easy (for the computer!). Some authors make a formal presentation of the method, in terms of matrix manipulations and the linear algebra solution of the equations. However, a good choice of points always supports a more efficient solution at this step.

  39. There are always two of the coefficients which can be computed directly. The numerical coefficient of the product polynomial can be got by multiplying the trailing terms in the inputs (which corresponds to the substitution x=0), and the leading coefficient is likewise the product of the original leading terms (which corresponds to the asymptotic behavior of the function as x approaches infinity). Since these terms required one multiplication each, many authors compute those first and eliminate them from the other equations immediately.

  40. The method generalizes immediately to splits of the number into more pieces. In general, a "Toom-K" algorithm will split the inputs into k limbs of equal width, evaluate the polynomials at (2k-1) points, and solve for the (2k-1) coefficients in the product polynomial. Knuth shows this in action with the choices of points {0, 1, ... 2k-2} or {-(k-1), -(k-2), ... k-2, k-1} in The Art of Computer Programming. The multiplication part then becomes (2k-1) calculations, at 1/k of the original size. A "Toom-2" algorithm can be shown to be equivalent to Karatsuba multiplication.

  41. The gmp library for multiple precision arithmetic uses a Toom-3 algorithm for numbers of tens of thousands of digits, and only switches to a Fourier transform method when the numbers reach hundreds of thousands of digits. It uses the points x=0, 0.5, 1, 2 and the point at infinity to perform simple evaluations and solution of the equations.

  42. Efficiency
  43. The time complexity of a Toom-k algorithm is relatively easy to calculate. Since the algorithm recursively replaces a multiplication by (2k-1) smaller ones, k times smaller than the previous iteration, the theoretical complexity is O(n(log(2k-1))/(log k)). This would suggest a larger k value gives a more efficient algorithm, but for larger k, the evaluation and interpolation steps can become quite hairy. (However, some authors have attempted values of k as large as 16).

  44. Time complexity of some Toom-k sizes

  45. k | Notes       | Complexity for n digits big-Oh notation
  46. ============================================================
  47. 2 | Karatsuba   |                O(n1.585)
  48. 3 | Most common |                O(n1.465)
  49. 4 | Feasible    |                O(n1.404)
  50. ...|             |
  51. 16 | Difficult!  |                O(n1.239)
  52. ============================================================
  53. The table illustrates increased sizes experience diminishing returns, and while the method can approach the O(n log n) of fast Fourier Transform methods with larger values of k, the computation efforts outside of the multiplications can rapidly become significant. Nevertheless, the big-Oh notation hides a constant, and empirical results indicate that a lovingly crafted Toom algorithm will beat an FFT method for all but the most enormous calculations.
复制代码
另一篇文章,Toom-Cook multiplication —— 维客(wiki)

Toom-Cook multiplication
来自 维客
Jump to: navigation, search
Toom-Cook, sometimes known as Toom-3, is a multiplication algorithm, a method of multiplying two large integers.

Given two large integers, a and b, Toom-Cook splits up a and b into k smaller parts each of length l, and performs operations on the parts. As k grows, one may combine many of the multiplication sub-operations, thus reducing the overall complexity of the algorithm. The multiplication sub-operations can then be computed recursively using Toom-Cook multiplication again, and so on. Although the terms "Toom-3" and "Toom-Cook" are sometimes used interchangeably, Toom-3 is only a single instance of the Toom-Cook algorithm, where k = 3.

Toom-3 reduces 9 multiplications to 5, and runs in Θ(nlog(5)/log(3)), about Θ(n1.465). In general, Toom-k runs in Θ(c(k) ne), where e = log(2k-1) / log(k), ne is the time spent on sub-multiplications, and c is the time spent on additions and multiplication by small constants.<ref>Knuth, p. 296</ref> The Karatsuba algorithm is a special case of Toom-Cook, where the number is split into two smaller ones. It reduces 4 multiplications to 3 and so operates at Θ(nlog(3)/log(2)), which is about Θ(n1.585). Ordinary long multiplication is equivalent to Toom-1, with complexity Θ(n2).

Although the exponent e can be set arbitrarily close to 1 by increasing k, the function c unfortunately grows very rapidly.<ref>Knuth, p. 296</ref> <ref>Crandall & Pomerance, p. 474</ref> The growth rate for mixed-level Toom-Cook schemes was still an open research problem in 2005.<ref>Crandall & Pomerance, p. 536</ref> An implementation described by Donald Knuth achieves the time complexity Θ(n 2√(2 log n) log n).<ref>Knuth, p. 302</ref>

Due to its overhead, Toom-Cook is slower than long multiplication with small numbers, and it is therefore typically used for intermediate-size multiplications, before the asymptotically faster Sch&ouml;nhage-Strassen algorithm (with complexity Θ(n log n log log n)) becomes practical.

Andrei Toom first described this algorithm in 1963, while Stephen Cook improved the algorithm and published it in his PhD thesis in 1966. [1] (thesis)

目录
[隐藏]
1 Example
2 Notes
3 References
4 External links

[编辑]Example
A possible implementation of Toom-3, optimal with respect to the number of operations<ref>Bodrato, p.132</ref> is illustrated below. The same idea can be generalized to any Toom-N algorithm, but an efficient sequence of operation is not easy to find.

Assume we wish to multiply A=31,415,926 and B=123,456,789. Each integer is divided into three equal length numbers, l digits wide.

With l=3 we can represent the two operands by two polynomials:

a(x) =  31x^2+415x+926
b(x) = 123x^2+456x+789
where A=a(1,000) and B=b(1,000).

To compute c(x) = a(x) * b(x), we evaluate the two polynomials a and b in five points, and multiply the results, for our example we perform the following:

w4=c(1/0)=a(1/0)*b(1/0)=   31           *   123           =             3813
w3=c(-2) =a(-2) *b(-2) =(4*31-2*415+926)*(4*123-2*456+789)= 220*369 =  81180
w2=c(-1) =a(-1) *b(-1) =  (31-  415+926)*  (123-  456+789)= 542*456 = 247152
w1=c(1)  =a(1)  *b(1)  =  (31+  415+926)*  (123+  456+789)=1372*1368=1876896
w0=c(0)  =a(0)  *b(0)  =            926 *             789 =           730614
Then solve the interpolation problem

w3 <-(w3 - w1)/3       = -598572
w1 <-(w1 - w2)/2       =  814872
w2 <- w2 - w0          = -483462
w3 <-(w2 - w3)/2 + 2*w4=   65181
w2 <- w2 + w1 - w4     =  327597
w1 <- w1 - w3          =  749691
The result can be recomposed with:

c(x) = w4 x4 + w3 x3 + w2 x2 + w1 x + w0 =
C = c(1,000) =
3,813
    65,181
       327,597
           749,691
               730,614
----------------------
3,878,509,347,421,614
[编辑]Notes
<references/>

[编辑]References
D. Knuth. The Art of Computer Programming, Volume 2. Third Edition, Addison-Wesley, 1997.
R. Crandall & C. Pomerance. Prime Numbers - A Computational Perspective. Second Edition, Springer, 2005.
M. Bodrato. Toward Optimal Toom-Cook Multiplication.... In WAIFI'07, Springer, 2007.
[编辑]External links
Toom-Cook 3-Way Multiplication from GMP Documentations
取自"http://www.wiki.cn/wiki/Toom-Cook_multiplication"


最后贴上我收集到的关于Toom-Cook算法的资料。

Toom-cook 算法.rar

431.37 KB, 下载次数: 25, 下载积分: 金币 -1 枚, 经验 1 点, 下载 1 次

毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2008-3-28 13:46:12 | 显示全部楼层
这个算法就是把大数当作多项式
然后求出结果多项式的值
再根据多项式求结果的值

中文资料在TAOCP V2中文版上有
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2008-3-30 10:21:43 | 显示全部楼层
从 V7.x 起,HugeCalc 已实现了Toom3算法,
而且对于平方,还实现了切分成4段、5段的算法,
所以即便不采用任何高级算法(FFT/FNT算法),速度依然可以很快!
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2011-4-13 04:50:38 | 显示全部楼层
好吧,从google绕回来了。

有没有从因子位数长度m确定取多大n的经验公式,即 Toom-Cook-n
说是总效率是C* $(m/n)^(log(2k-1)/log(k))$,但好象单步复杂度C服从$n^2$。
接着该怎么分析?
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2011-6-28 16:52:58 | 显示全部楼层
Toom-4是应该是最好的算法了,只要选择0,无穷大,1,-1,2,2,1/2即可实现,再往上处理的数据位太少了,比Toom-4不经济(Toom-3可达到$2^29$,Toom-4可以达到$2^28$,Toom-6只能达到$2^20$)
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2013-4-10 14:10:51 | 显示全部楼层
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2013-5-5 09:53:22 | 显示全部楼层
好像GMP实现到了Toom6
具体在其源代码的mpn/generic下
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2013-5-5 10:19:48 | 显示全部楼层
刚去读GMP源代码了
在乘法顶层函数里有着超复杂的分类调用
大概是有
toom22, toom32, toom42, toom33, toom43, toom53, toom63, toom44, toom6h, toom8h几种算法
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2013-5-5 16:12:27 | 显示全部楼层
大数乘法最低复杂度的极限人类能达到多少?可否降至O(nLog(n))以下,甚至更低?
除了FFT以外,是否可以找到比FFT更好的计算方法?
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
发表于 2013-5-5 18:10:00 | 显示全部楼层
乘法就是卷积。
我前段时间发现,FFT和Toom-Cook是等价的。
毋因群疑而阻独见  毋任己意而废人言
毋私小惠而伤大体  毋借公论以快私情
您需要登录后才可以回帖 登录 | 欢迎注册

本版积分规则

小黑屋|手机版|数学研发网 ( 苏ICP备07505100号 )

GMT+8, 2024-3-28 20:25 , Processed in 0.055797 second(s), 19 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表