Chudnovsky binary splitting and factoring

Question

In this article, a fast recursive formulation of the Chudnovsky pi formula using binary splitting is given. In python:

C = 640320
C3_OVER_24 = C**3 // 24
def bs(a, b):
    if b - a == 1:
        if a == 0:
            Pab = Qab = 1
        else:
            Pab = (6*a-5)*(2*a-1)*(6*a-1)
            Qab = a*a*a*C3_OVER_24
        Tab = Pab * (13591409 + 545140134*a) # a(a) * p(a)
        if a & 1:
            Tab = -Tab
    else:
        m = (a + b) // 2
        Pam, Qam, Tam = bs(a, m)
        Pmb, Qmb, Tmb = bs(m, b)

        Pab = Pam * Pmb
        Qab = Qam * Qmb
        Tab = Qmb * Tam + Pam * Tmb
    return Pab, Qab, Tab

N = int(digits/DIGITS_PER_TERM + 1)
# Calclate P(0,N) and Q(0,N)
P, Q, T = bs(0, N)
one = 10**digits
sqrtC = sqrt(10005*one, one)
return (Q*426880*sqrtC) // T

This method is already very fast, but it's mentioned that an implementation on the GMP library website, gmp-chudnovsky.c, also factors the numerator and denominator in binary splitting. As the code is optimized and hard for me to understand, what is the general idea behind how this is done? I can't tell if fractions are simplified, numbers are kept in factorized form instead of being entirely multiplied, or both.

Here is a code sample of the binary splitting:

/* binary splitting */
void
bs(unsigned long a, unsigned long b, unsigned gflag, long int level)
{
  unsigned long i, mid;
  int ccc;

  if (b-a==1) {
    /*
      g(b-1,b) = (6b-5)(2b-1)(6b-1)
      p(b-1,b) = b^3 * C^3 / 24
      q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
    */
    mpz_set_ui(p1, b);
    mpz_mul_ui(p1, p1, b);
    mpz_mul_ui(p1, p1, b);
    mpz_mul_ui(p1, p1, (C/24)*(C/24));
    mpz_mul_ui(p1, p1, C*24);

    mpz_set_ui(g1, 2*b-1);
    mpz_mul_ui(g1, g1, 6*b-1);
    mpz_mul_ui(g1, g1, 6*b-5);

    mpz_set_ui(q1, b);
    mpz_mul_ui(q1, q1, B);
    mpz_add_ui(q1, q1, A);
    mpz_mul   (q1, q1, g1);
    if (b%2)
      mpz_neg(q1, q1);

    i=b;
    while ((i&1)==0) i>>=1;
    fac_set_bp(fp1, i, 3);  /*  b^3 */
    fac_mul_bp(fp1, 3*5*23*29, 3);
    fp1[0].pow[0]--;

    fac_set_bp(fg1, 2*b-1, 1);  /* 2b-1 */
    fac_mul_bp(fg1, 6*b-1, 1);  /* 6b-1 */
    fac_mul_bp(fg1, 6*b-5, 1);  /* 6b-5 */

    if (b>(int)(progress)) {
      printf("."); fflush(stdout);
      progress += percent*2;
    }

  } else {
    /*
      p(a,b) = p(a,m) * p(m,b)
      g(a,b) = g(a,m) * g(m,b)
      q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
    */
    mid = a+(b-a)*0.5224;     /* tuning parameter */
    bs(a, mid, 1, level+1);

    top++;
    bs(mid, b, gflag, level+1);
    top--;

    if (level == 0)
      puts ("");

    ccc = level == 0;

    if (ccc) CHECK_MEMUSAGE;

    if (level>=4) {           /* tuning parameter */
#if 0
      long t = cputime();
#endif
      fac_remove_gcd(p2, fp2, g1, fg1);
#if 0
      gcd_time += cputime()-t;
#endif
    }

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(p1, p1, p2);

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(q1, q1, p2);

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(q2, q2, g1);

    if (ccc) CHECK_MEMUSAGE;
    mpz_add(q1, q1, q2);

    if (ccc) CHECK_MEMUSAGE;
    fac_mul(fp1, fp2);

    if (gflag) {
      mpz_mul(g1, g1, g2);
      fac_mul(fg1, fg2);
    }
  }

  if (out&2) {
    printf("p(%ld,%ld)=",a,b); fac_show(fp1);
    if (gflag)
      printf("g(%ld,%ld)=",a,b); fac_show(fg1);
  }
}

Peter Taylor · Answer 1 · 2015-11-28T18:56:32.007

The following comments are key:

/*
    g(b-1,b) = (6b-5)(2b-1)(6b-1)
    p(b-1,b) = b^3 * C^3 / 24
    q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
*/

/*
    p(a,b) = p(a,m) * p(m,b)
    g(a,b) = g(a,m) * g(m,b)
    q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
*/

/*
          p*(C/D)*sqrt(C)
    pi = -----------------
             (q+A*p)
*/

Observe:

If we scale both p and q by a factor of k in the final division, it doesn't affect the result.
When computing the merge operation which corresponds to the second set of comments, if we were to scale each of p(a,m), g(a,m), and q(a,m) by k then that factor would simply be carried through to p(a,b), g(a,b), q(a,b); similarly if we were to scale each of p(m,b), g(m,b), and q(m,b) then that factor would carry through.
The line
```
fac_remove_gcd(p2, fp2, g1, fg1);
```
is effectively
```
k = gcd(p2, g1);
p2 /= k; // p(m,b) /= k
g1 /= k; // g(a,m) /= k
```
This has the net effect of downscaling p(a,b), g(a,b), and q(a,b) by that gcd. By the previous two observations, this downscaling carries through cleanly all the way to the final result.

Postscript

I've tried three ways of implementing the factoring in Python.

Tracking the factorisations using immutable lists slows things down horrendously because there's far too much busy-work maintaining the lists.
Using gmpy's gcd does not give a speed-up
Pre-calculating a list of primes up to 6 * N (i.e. which might divide g) and testing each of those primes slows things down by a factor of 2 to 3.

I conclude that getting a speed-up with this approach requires using mutable state to track the factorisations, so it's a big hit for maintainability.

Hi from PPCG! Do you think the code maintains a list of factors and their powers instead of representing the number in full? This is what I got from `fac_set_bp`, `fac_mul` and related functions. Also, gcc can optimize GMP in ways python obviously can't. — qwr, Nov 28 '15 at 20:22
As Baruchel's answer already pointed out, it does both: it keeps a `mpz_t` representation and a `fac_t` representation for `p` and `g`. — Peter Taylor, Nov 28 '15 at 21:29

score 3 · Accepted Answer · answered Nov 15 '15 at 18:26

I didn't look at the full code, but I had a quick glance at it in order to understand better the excerpt you provide in your question.

In order to answer to some points from your question, have a look at this piece of code first:

typedef struct {
  unsigned long max_facs;
  unsigned long num_facs;
  unsigned long *fac;
  unsigned long *pow;
} fac_t[1];

It defines a new type as a structure (if you don't know C at all, let's say it is like a rudimentary Pyhton object embedding variables but no methods). This type allows to handle integer numbers as two integer values and two arrays (say: two lists):

greatest factor
number of factors
list of factors
list of (corresponding) powers for these factors

In the same time the code keeps the same numbers as big integers from the libgmp type (this is what is meant by mpz_t p and mpz_t g in the arguments of the function).

Now, what about the function you are showing. It is called fac_remove_gcd; the initial fac has to do with the name of the previously described type; the two following words are easy to understand: divide two integer numbers of type fac by their gcd.

The C code iterates over the two lists of factors in both lists; it is easy to synchronize both lists since factors are ordered (section of the code around the else if and else statements); whenever two common factors are detected (initial if statement), dividing is a matter of simple substraction: the smallest power is substracted in both lists of powers for this factor (for instance with a=2*5^3*17 and b=3*5^5*19, the value 3 will be substracted in the lists of powers for a and b at the position corresponding to factor 5 leading to a=2*5^0*17 and b=3*5^2*19).

During this operation a number (of the same fac type) is created and called fmul; it is obviously the gcd of both numbers.

After this step the gcd called fmul and being of the fac type is converted to a GMP big integer with the function (also in the code of the program) called bs_mul. This allows to compute the GCD as a big integer in order to synchronize the new values of the divided numbers in both forms: big integers and special fac type. Once the GCD is computed as a big integer, it is easy to divide both initial numbers by the GCD.

Thus the functions acts upon two versions of each initial numbers.

Hope it can help.

This is helpful, but I am more interested in how the numerator and denominator can be factored during binary splitting, given that `q(a, b)` is not calculated just as a product of two numbers. I've edited the code sample given. — qwr, Nov 15 '15 at 19:01
I'm guessing factors are added to the factor list during multiplication, from `fac_set_bp` and `fac_mul_bp`? — qwr, Nov 29 '15 at 03:12

Chudnovsky binary splitting and factoring

2 Answers2

Postscript