enumerating combinations

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

enumerating combinations

Postby Roger|Dyalog on Wed May 27, 2020 5:17 am

A combination of size ⍺ from ⍳⍵ is a sorted vector of length ⍺ with unique items, all of which are in ⍳⍵. The following function from [Hui 1987, §4.1], rewritten in current Dyalog APL, generates all combinations of size ⍺ from ⍳⍵, in sorted order.

      
comb←{
(⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺
c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺
(c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)
}

2 comb 4 3 comb 5 4 comb 6
0 1 0 1 2 0 1 2 3
0 2 0 1 3 0 1 2 4
0 3 0 1 4 0 1 2 5
1 2 0 2 3 0 1 3 4
1 3 0 2 4 0 1 3 5
2 3 0 3 4 0 1 4 5
1 2 3 0 2 3 4
1 2 4 0 2 3 5
1 3 4 0 2 4 5
2 3 4 0 3 4 5
1 2 3 4
1 2 3 5
1 2 4 5
1 3 4 5
2 3 4 5

(Aside: 1+(⍺-1)∇(⍵-1) ←→ ⍺∇⍢(-∘1)⍵ where ⍢ is the prospective under AKA dual operator.)

In this note, we consider the problem of enumerating combinations: Given ⍺ and ⍵ and a combination c, determine the index of that combination in ⍺ comb ⍵. For example, for ⍺←4 and ⍵←6, the index of 0 2 3 4 is 6 and that of 1 2 3 5 is 11. The index is of course (⍺ comb ⍵)⍳c, but we want to compute the index without creating ⍺ comb ⍵.

      
ifc1←{
(m n)←⍺
1≥m:⊃⍵,0
((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵
}

4 6 ifc1 0 2 3 4
6
4 6 ifc1⍤1 ⊢4 comb 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(⍳4!6) ≡ 4 6 ifc1⍤1 ⊢4 comb 6
1

In ifc1, k is the leading term of the combination in question. The sum (m-1)+.!(n-k)+⍳k accounts for the combinations whose leading terms are less than k. Using the sum directly as in ifc1 is not satisfactory for an enormous combination such as 23 1e4 ifc1 ¯23↑⍳1e4 (with k=9977).

What is this sum? To use a particular example, the terms of the sum are

      m←3 ⋄ n←8 ⋄ k←4
(m-1)!(n-k)+⍳k
6 10 15 21

and are adjacent entries in Pascal’s triangle, shown in green below:

Image

This is the sum analyzed in Pascal’s Ladder, and the sum equals (m!n)-m!n-k (the blue (56) minus the red (4) ), the difference of two binomial coefficients instead of the sum of k binomial coefficients. Whence:

      ifc← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}

4 6 ifc 0 2 3 4
6
4 6 ifc⍤1 ⊢4 comb 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(⍳4!6) ≡ 4 6 ifc⍤1 ⊢4 comb 6
1

We now develop an enumeration which works for large combinations, by using the extended precision number facility nats, a monadic operator, from the dfns workspace. The results of nats-derived functions are character vectors of decimal digits; the arguments can be character vectors or scalars or ordinary integers. (We leave as an exercise for the reader the enumeration of humongous combinations, combinations where ⌈/c is 2*53 or larger, combinations whose elements require extended precision integers.)

      )copy dfns nats    ⍝ http://dfns.dyalog.com/n_nats.htm

3 +nats 4
7
(,'7') ≡ 3 +nats 4
1
3 ×nats 4
12
'12' ≡ 3 ×nats 4
1
3 ×nats '4'
12

It is relatively straightforward to model extended precision versions of functions, by replacing + by +nats, × by ×nats, etc.

      xbin← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}

11 ! 33
193536720
11 xbin 33
193536720
47 ! 99
4.47391E28
47 xbin 99
44739148260023940935799206928

Translating ifc into the extended precision domain, we get:

      xfc← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}

4 6 ifc ¯4↑⍳6
14
4 6 xfc ¯4↑⍳6
14
4!6
15

23 1e4 xfc ¯23↑⍳1e4
3771461434429626616429953717057906562371311072521139025732118617119999
23 xbin 1e4
3771461434429626616429953717057906562371311072521139025732118617120000

23!1e4
3.77146E69
0 ⍕ 23!1e4
3771461434429628______________________________________________________

Collected Definitions

      
comb←{
(⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺
c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺
(c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)
}

ifc1←{
(m n)←⍺
1≥m:⊃⍵,0
((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵
}

ifc ← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}

xbin ← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}

xfc ← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}

See Enumerating Permutation for the enumeration of permutations. Portions of this text previously appeared in the J Wiki essay Combination Index, 2005-11-11.
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: enumerating combinations

Postby Roger|Dyalog on Wed Jun 03, 2020 12:55 am

The previous post left some unfinished business, namely, producing the combination from the index. In this post we derive cfi (combination from index) and cfx (combination from extended-precision index), inverses of the ifc and xfc functions. First, cfi.

      
cfi←{
(m n)←⍺
1≥m:m⍴⍵
s← +⍀ ⊖ (m-1) ! (m-1) ↓ ⍳n
k← 1 ⍳⍨ ⍵<s
k , (1+k) + (⍺-1,1+k)∇(⍵-k⌷0,s)
}

m←4 ⋄ n←9
(m,n) cfi 0
0 1 2 3
(m,n) cfi ¯1+m!n
5 6 7 8

comb←{
(⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺
c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺
(c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)
}

(m comb n) ≡ (m,n) cfi⍤1 0 ⊢⍳m!n
1

The expression ⊖(m-1)!(m-1)↓⍳n in cfi computes the same vector as c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺ in comb, with ⍺=m and ⍵=n. c[k] is the number of combinations whose leading item is k. For combination index ⍵ in cfi, the leading item of the corresponding combination is 1⍳⍨⍵<+⍀c.

      ⊖(m-1)!(m-1)↓⍳n
56 35 20 10 4 1
⊢ c← (m-1)!(m-1)+⊖k←⍳1+n-m
56 35 20 10 4 1

⊢ c49 ←m comb n
0 1 2 3
0 1 2 4
0 1 2 5
0 1 2 6
0 1 2 7
...
4 5 7 8
4 6 7 8
5 6 7 8
{≢⍵}⌸c49[;0]
56 35 20 10 4 1

In cfx, the binomial coefficients in c, (m-1)!(m-1)+⊖k←⍳1+n-m, and the number of them can both be very large. The sum +\c is seen to be equivalent to (m!n)-m!(m-1)+⊖⍳1+n-m using reasoning similar to the Pascal’s Ladder derivation for ifc.

      +\c
56 91 111 121 125 126
(m!n)-m!(m-1)+⊖⍳1+n-m
56 91 111 121 125 126

The latter expression is amenable to binary search. Such search can avoid computing a binomial coefficient until it is needed.

      
ilead←{
(m n)←⍺ ⋄ a←m!n ⋄ i←⍵
bs←{
⍵<⍺: (n-1)-⍵
d←a-m!j←⍺+⌊2÷⍨⍵-⍺
i<d: (j+1) ∇ ⍵
i>d: ⍺ ∇ (j-1)
n-j
}
(m-1)bs(n-1)
}

(m,n) ilead⍤1 0 ⍳m!n
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 5

⍉ {⍺,≢⍵} ⌸(m,n) ilead⍤1 0⍳m!n
0 1 2 3 4 5
56 35 20 10 4 1

As an intermediate step towards cfx, we write a version of cfi which uses ilead:

      
cfi1←{
f←{(q i z)←⍵ ⋄ (q-1+k),(i-(⍺!q)-⍺!q-k),⊂z,k←(⍺,q)ilead i}
(m n)←⍺
(⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,⍵,⊂⍬
}

(m comb n) ≡ (m,n) cfi1⍤1 0 ⍳m!n
1

cfx obtains by extending ilead and cfi1 to use extended precision operations:

      
xlead←{
(m n)←⍺ ⋄ a←m xbin n ⋄ i←⍵
bs←{
⍵<⍺:(n-1)-⍵
d←a-nats m xbin⊢j←⍺+⌊2÷⍨⍵-⍺
⍎i<nats d: (j+1) ∇ ⍵
⍎i>nats d: ⍺ ∇ (j-1)
n-j
}
(m-1)bs(n-1)
}

cfx←{
f←{
(q i z)←⍵
(q-1+k), (⊂i -nats (⍺ xbin q) -nats ⍺ xbin q-k), ⊂z,k←(⍺,q)xlead i
}
(m n)←⍺
(⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,(⊂⍵),⊂⍬
}

Examples:

      5 ! 5e6
2.60416E31
⊢ x←5 xbin 5e6
26041614583369791656250001000000
5 5e6 cfx x -nats 1
4999995 4999996 4999997 4999998 4999999
¯5↑⍳5e6
4999995 4999996 4999997 4999998 4999999

99!299
1.38608E81
99 xbin 299
138608382108618824826112784210880186009348866858121623622101121910158544277466
9540

(1+⍳100) ≡ 100 300 cfx 99 xbin 299
1

The last example illustrates that since the number rows of m comb n with a leading 0 is (m-1)!(n-1), the
(m-1)!(n-1)-th row is necessarily 1+⍳m.

Collected Definitions

      
)copy dfns nats

comb←{
(⍺=0)∨⍺=⍵:(1,⍺)⍴⍳⍺
c←(⍺-1)!(⍺-1)+⊖k←⍳1+⍵-⍺
(c⌿k), ⊃ ⍪⌿ (-c) ↑¨ ⊂1+(⍺-1)∇(⍵-1)
}

ifc1←{
(m n)←⍺
1≥m:⊃⍵,0
((m-1)+.!(n-k)+⍳k) + (⍺-1,1+k) ∇ 1↓⍵-1+k←⊃⍵
}

ifc ← {(m n)←⍺ ⋄ -⌿ (m-⍳m) +.! n-(¯1↓0,1+⍵),⍪⍵}

xbin ← {(0≥⍺)∨⍺≥⍵:⍕⍺=⍵ ⋄ (⊃ ×nats⌿ ⍵-d) ÷nats (⊃ ×nats⌿ 1+d←⍳⍺⌊⍵-⍺)}

xfc ← {(m n)←⍺ ⋄ ⊃ -nats⌿ (m-⍳m) (+nats).xbin n-(¯1↓0,1+⍵),⍪⍵}

cfi ← {
(m n)←⍺
1≥m:m⍴⍵
s← +⍀ ⊖ (m-1) ! (m-1) ↓ ⍳n
k← 1 ⍳⍨ ⍵<s
k , (1+k) + (⍺-1,1+k)∇(⍵-k⌷0,s)
}

ilead←{
(m n)←⍺ ⋄ a←m!n ⋄ i←⍵
bs←{
⍵<⍺: (n-1)-⍵
d←a-m!j←⍺+⌊2÷⍨⍵-⍺
i<d: (j+1) ∇ ⍵
i>d: ⍺ ∇ (j-1)
n-j
}
(m-1)bs(n-1)
}

cfi1←{
f←{(q i z)←⍵ ⋄ (q-1+k),(i-(⍺!q)-⍺!q-k),⊂z,k←(⍺,q)ilead i}
(m n)←⍺
(⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,⍵,⊂⍬
}

xlead←{
(m n)←⍺ ⋄ a←m xbin n ⋄ i←⍵
bs←{
⍵<⍺:(n-1)-⍵
d←a-nats m xbin⊢j←⍺+⌊2÷⍨⍵-⍺
⍎i<nats d: (j+1) ∇ ⍵
⍎i>nats d: ⍺ ∇ (j-1)
n-j
}
(m-1)bs(n-1)
}

cfx←{
f←{
(q i z)←⍵
(q-1+k), (⊂i -nats (⍺ xbin q) -nats ⍺ xbin q-k), ⊂z,k←(⍺,q)xlead i
}
(m n)←⍺
(⍳m) + +⍀ ⊃⌽ ⊃ f⌿ (1+⍳m),⊂n,(⊂⍵),⊂⍬
}

The computation of cfx is rather labored. One has the feeling that it can be simplified by additional insights into the mathematics.
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am


Return to APL Chat

Who is online

Users browsing this forum: No registered users and 1 guest