## enumerating combinations

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

### enumerating combinations

A combination of size ⍺ from ⍳⍵ is a sorted vector of length ⍺ with unique items, all of which are in ⍳⍵. The following function from [Hui 1987, §4.1], rewritten in current Dyalog APL, generates all combinations of size ⍺ from ⍳⍵, in sorted order.

`      ⍝comb←{  (⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺  c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺  (c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)}  2 comb 4      3 comb 5        4 comb 60 1           0 1 2           0 1 2 30 2           0 1 3           0 1 2 4 0 3           0 1 4           0 1 2 51 2           0 2 3           0 1 3 41 3           0 2 4           0 1 3 52 3           0 3 4           0 1 4 5              1 2 3           0 2 3 4              1 2 4           0 2 3 5              1 3 4           0 2 4 5              2 3 4           0 3 4 5                              1 2 3 4                              1 2 3 5                              1 2 4 5                              1 3 4 5                              2 3 4 5`

(Aside: 1+(⍺-1)∇(⍵-1) ←→ ⍺∇⍢(-∘1)⍵ where ⍢ is the prospective under AKA dual operator.)

In this note, we consider the problem of enumerating combinations: Given ⍺ and ⍵ and a combination c, determine the index of that combination in ⍺ comb ⍵. For example, for ⍺←4 and ⍵←6, the index of 0 2 3 4 is 6 and that of 1 2 3 5 is 11. The index is of course (⍺ comb ⍵)⍳c, but we want to compute the index without creating ⍺ comb ⍵.

`      ⍝ifc1←{  (m n)←⍺  1≥m:⊃⍵,0  ((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵}      4 6 ifc1 0 2 3 46      4 6 ifc1⍤1 ⊢4 comb 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14      (⍳4!6) ≡ 4 6 ifc1⍤1 ⊢4 comb 61`

In ifc1, k is the leading term of the combination in question. The sum (m-1)+.!(n-k)+⍳k accounts for the combinations whose leading terms are less than k. Using the sum directly as in ifc1 is not satisfactory for an enormous combination such as 23 1e4 ifc1 ¯23↑⍳1e4 (with k=9977).

What is this sum? To use a particular example, the terms of the sum are

`      m←3 ⋄ n←8 ⋄ k←4      (m-1)!(n-k)+⍳k6 10 15 21`

and are adjacent entries in Pascal’s triangle, shown in green below: This is the sum analyzed in Pascal’s Ladder, and the sum equals (m!n)-m!n-k (the blue (56) minus the red (4) ), the difference of two binomial coefficients instead of the sum of k binomial coefficients. Whence:

`      ifc← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}      4 6 ifc 0 2 3 46      4 6 ifc⍤1 ⊢4 comb 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14      (⍳4!6) ≡ 4 6 ifc⍤1 ⊢4 comb 61`

We now develop an enumeration which works for large combinations, by using the extended precision number facility nats, a monadic operator, from the dfns workspace. The results of nats-derived functions are character vectors of decimal digits; the arguments can be character vectors or scalars or ordinary integers. (We leave as an exercise for the reader the enumeration of humongous combinations, combinations where ⌈/c is 2*53 or larger, combinations whose elements require extended precision integers.)

`      )copy dfns nats    ⍝ http://dfns.dyalog.com/n_nats.htm      3 +nats 47      (,'7') ≡ 3 +nats 41      3 ×nats 412      '12' ≡ 3 ×nats 41      3 ×nats '4'12`

It is relatively straightforward to model extended precision versions of functions, by replacing + by +nats, × by ×nats, etc.

`      xbin← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}      11 ! 33193536720      11 xbin 33193536720      47 ! 994.47391E28      47 xbin 9944739148260023940935799206928`

Translating ifc into the extended precision domain, we get:

`      xfc← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}      4 6 ifc ¯4↑⍳614      4 6 xfc ¯4↑⍳614      4!615      23 1e4 xfc ¯23↑⍳1e43771461434429626616429953717057906562371311072521139025732118617119999      23 xbin 1e43771461434429626616429953717057906562371311072521139025732118617120000      23!1e43.77146E69      0 ⍕ 23!1e4 3771461434429628______________________________________________________`

Collected Definitions

`      ⍝comb←{  (⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺  c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺  (c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)}ifc1←{  (m n)←⍺  1≥m:⊃⍵,0  ((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵}ifc  ← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}xbin ← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}xfc  ← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}`

See Enumerating Permutation for the enumeration of permutations. Portions of this text previously appeared in the J Wiki essay Combination Index, 2005-11-11.
Roger|Dyalog

Posts: 207
Joined: Thu Jul 28, 2011 10:53 am

### Re: enumerating combinations

The previous post left some unfinished business, namely, producing the combination from the index. In this post we derive cfi (combination from index) and cfx (combination from extended-precision index), inverses of the ifc and xfc functions. First, cfi.

`      ⍝cfi←{  (m n)←⍺  1≥m:m⍴⍵  s← +⍀ ⊖ (m-1) ! (m-1) ↓ ⍳n  k← 1 ⍳⍨ ⍵<s  k , (1+k) + (⍺-1,1+k)∇(⍵-k⌷0,s)}      m←4 ⋄ n←9      (m,n) cfi 00 1 2 3      (m,n) cfi ¯1+m!n5 6 7 8comb←{  (⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺  c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺  (c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)}      (m comb n) ≡ (m,n) cfi⍤1 0 ⊢⍳m!n1`

The expression ⊖(m-1)!(m-1)↓⍳n in cfi computes the same vector as c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺ in comb, with ⍺=m and ⍵=n. c[k] is the number of combinations whose leading item is k. For combination index ⍵ in cfi, the leading item of the corresponding combination is 1⍳⍨⍵<+⍀c.

`      ⊖(m-1)!(m-1)↓⍳n56 35 20 10 4 1    ⊢ c← (m-1)!(m-1)+⊖k←⍳1+n-m56 35 20 10 4 1      ⊢ c49 ←m comb n0 1 2 30 1 2 40 1 2 50 1 2 60 1 2 7 ...4 5 7 84 6 7 85 6 7 8      {≢⍵}⌸c49[;0]56 35 20 10 4 1`

In cfx, the binomial coefficients in c, (m-1)!(m-1)+⊖k←⍳1+n-m, and the number of them can both be very large. The sum +\c is seen to be equivalent to (m!n)-m!(m-1)+⊖⍳1+n-m using reasoning similar to the Pascal’s Ladder derivation for ifc.

`      +\c56 91 111 121 125 126      (m!n)-m!(m-1)+⊖⍳1+n-m56 91 111 121 125 126`

The latter expression is amenable to binary search. Such search can avoid computing a binomial coefficient until it is needed.

`      ⍝ilead←{  (m n)←⍺ ⋄ a←m!n ⋄ i←⍵  bs←{    ⍵<⍺: (n-1)-⍵    d←a-m!j←⍺+⌊2÷⍨⍵-⍺    i<d: (j+1) ∇ ⍵    i>d: ⍺     ∇ (j-1)    n-j  }  (m-1)bs(n-1)}      (m,n) ilead⍤1 0 ⍳m!n0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...  2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 5      ⍉ {⍺,≢⍵} ⌸(m,n) ilead⍤1 0⍳m!n 0  1  2  3 4 556 35 20 10 4 1`

As an intermediate step towards cfx, we write a version of cfi which uses ilead:

`      ⍝cfi1←{  f←{(q i z)←⍵ ⋄ (q-1+k),(i-(⍺!q)-⍺!q-k),⊂z,k←(⍺,q)ilead i}  (m n)←⍺  (⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,⍵,⊂⍬}      (m comb n) ≡ (m,n) cfi1⍤1 0 ⍳m!n1`

cfx obtains by extending ilead and cfi1 to use extended precision operations:

`      ⍝xlead←{  (m n)←⍺ ⋄ a←m xbin n ⋄ i←⍵  bs←{    ⍵<⍺:(n-1)-⍵    d←a-nats m xbin⊢j←⍺+⌊2÷⍨⍵-⍺    ⍎i<nats d: (j+1) ∇ ⍵    ⍎i>nats d: ⍺     ∇ (j-1)    n-j  }  (m-1)bs(n-1)}cfx←{  f←{    (q i z)←⍵    (q-1+k), (⊂i -nats (⍺ xbin q) -nats ⍺ xbin q-k), ⊂z,k←(⍺,q)xlead i  }  (m n)←⍺  (⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,(⊂⍵),⊂⍬}`

Examples:

`      5 ! 5e62.60416E31      ⊢ x←5 xbin 5e626041614583369791656250001000000      5 5e6 cfx x -nats 14999995 4999996 4999997 4999998 4999999      ¯5↑⍳5e64999995 4999996 4999997 4999998 4999999      99!2991.38608E81      99 xbin 299138608382108618824826112784210880186009348866858121623622101121910158544277466      9540      (1+⍳100) ≡ 100 300 cfx 99 xbin 2991`

The last example illustrates that since the number rows of m comb n with a leading 0 is (m-1)!(n-1), the
(m-1)!(n-1)-th row is necessarily 1+⍳m.

Collected Definitions

`      ⍝)copy dfns natscomb←{  (⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺  c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺  (c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)}ifc1←{  (m n)←⍺  1≥m:⊃⍵,0  ((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵}ifc  ← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}xbin ← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}xfc  ← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}cfi  ← {  (m n)←⍺  1≥m:m⍴⍵  s← +⍀ ⊖ (m-1) ! (m-1) ↓ ⍳n  k← 1 ⍳⍨ ⍵<s  k , (1+k) + (⍺-1,1+k)∇(⍵-k⌷0,s)}ilead←{  (m n)←⍺ ⋄ a←m!n ⋄ i←⍵  bs←{    ⍵<⍺: (n-1)-⍵    d←a-m!j←⍺+⌊2÷⍨⍵-⍺    i<d: (j+1) ∇ ⍵    i>d: ⍺     ∇ (j-1)    n-j  }  (m-1)bs(n-1)}cfi1←{  f←{(q i z)←⍵ ⋄ (q-1+k),(i-(⍺!q)-⍺!q-k),⊂z,k←(⍺,q)ilead i}  (m n)←⍺  (⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,⍵,⊂⍬}xlead←{  (m n)←⍺ ⋄ a←m xbin n ⋄ i←⍵  bs←{    ⍵<⍺:(n-1)-⍵    d←a-nats m xbin⊢j←⍺+⌊2÷⍨⍵-⍺    ⍎i<nats d: (j+1) ∇ ⍵    ⍎i>nats d: ⍺     ∇ (j-1)    n-j  }  (m-1)bs(n-1)}cfx←{  f←{    (q i z)←⍵    (q-1+k), (⊂i -nats (⍺ xbin q) -nats ⍺ xbin q-k), ⊂z,k←(⍺,q)xlead i  }  (m n)←⍺  (⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,(⊂⍵),⊂⍬}`

The computation of cfx is rather labored. One has the feeling that it can be simplified by additional insights into the mathematics.
Roger|Dyalog

Posts: 207
Joined: Thu Jul 28, 2011 10:53 am