## * versus ⍟ performance

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

### * versus ⍟ performance

0. A benchmark was posted in the APL Orchard on 2021-02-27:

`      b←1+?⍨1e5  ⍝ with ⎕io←0; ⎕io delenda est!      cmpx '0.1*b' '5⍟b' '10⍟b' '17⍟b'  0.1*b → 9.0E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕* 5⍟b   → 1.4E¯3 | -84% ⎕⎕⎕⎕⎕⎕* 10⍟b  → 1.2E¯3 | -87% ⎕⎕⎕⎕⎕* 17⍟b  → 1.4E¯3 | -84% ⎕⎕⎕⎕⎕⎕`

The questions are, why is ⍺*⍵ so much slower than ⍺⍟⍵? And shouldn't power be faster than logarithm?

Explaining the relative speeds of * and ⍟ (both monadic and dyadic) is complicated.

1. *⍵ is computed by exp() and ⍟⍵ is computed by log() or log2() or log10(), C library functions all.

2. Mathematically and in the Dyalog implementation, ⍺⍟⍵ ←→ (⍟⍵)÷⍟⍺.

Mathematically, ⍺*⍵ ←→ *⍵×⍟⍺. In the Dyalog implementation, when ⍺≥0, ⍺*⍵ is computed by pow(), a C library function.

3. scalar⍟⍵ invokes different C library functions when scalar is 2 (log2), *1 (log), or 10 (log10). These do not make a big difference in speed.

`      e←*1      cmpx '2⍟b' 'e⍟b' '10⍟b' '5⍟b' '17⍟b'  2⍟b  → 1.70E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕* e⍟b  → 1.09E¯3 | -37% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕          * 10⍟b → 1.18E¯3 | -31% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕        * 5⍟b  → 1.39E¯3 | -19% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      * 17⍟b → 1.38E¯3 | -19% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕`

In fact:

`      cmpx '2⍟b' '(⍟b)÷⍟2'  2⍟b     → 1.70E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  (⍟b)÷⍟2 → 8.94E¯4 | -48% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                    cmpx '10⍟b' '(⍟b)÷⍟10'  10⍟b     → 1.18E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  (⍟b)÷⍟10 → 8.96E¯4 | -25% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕            x←(⍴b)⍴10      cmpx 'x⍟b' '(⍟b)÷⍟x'  x⍟b     → 2.82E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  (⍟b)÷⍟x → 1.56E¯3 | -45% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕`

I suspect the verboser APL expressions are faster because the divides are done using vector instructions, whereas the Dyalog implementation of ⍺⍟⍵ does scalar divides in a C loop, log(⍵[j])/log(⍺[j]) or log(⍵[j])/d, where d←log(scalar).

4. As stated above, ⍺*⍵ when ⍺≥0 is done using the C library function pow(). Again, using a mathematically equivalent expression, *⍵×⍟⍺, is faster, I suspect for the same reason why (⍟⍵)÷⍟⍺ is faster than ⍺⍟⍵--the divides are done using vector instructions.

`      cmpx '0.1*b' '*b×⍟0.1'  0.1*b   → 7.79E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕* *b×⍟0.1 → 4.19E¯3 | -47% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕         reldiff←{⌈/,(|⍺-⍵)÷(|⍺)⌈(|⍵)}   ⍝ maximum relative difference                 (0.1*b) reldiff *b×⍟0.18.72981E¯14`

(The last expression indicates that 0.1*b and *b×⍟0.1 are "equal", despite the asterisk in the first column of the cmpx result.)

In addition, I suspect 0.1*b and *b×⍟0.1 suffer in speed from hitting lots of underflows. (⍺*⍵ for finite arguments should not be 0, other than underflows. Likewise *⍵.)

`      0.1*123450      +/ 0 = 0.1*b99677`

What if there are very few or no underflows?

`      b1←b÷1e3      +/ 0 = 0.1*b10      cmpx '0.1*b1' '*b1×⍟0.1'  0.1*b1   → 5.91E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕* *b1×⍟0.1 → 1.14E¯3 | -81% ⎕⎕⎕⎕⎕⎕                              (0.1*b1) reldiff *b1×⍟0.13.12521E¯14`

5. In a hypothetical situation where I get to write the C library functions, I believe log() can be faster than exp():

You can have a precomputed table so that ⍟⍵ only need to be computed in detail in the range ÷√2 and √2 (about 0.7 and 1.4). For ⍵ in that range, equation 4.1.27 in Abramowitz and Stegun

`      ⍟⍵ ←→ 2 × +/ j ÷⍨ ((⍵-1)÷(⍵+1)) * j←1+2×⍳n`

provides a fast converging series for t←(⍵-1)÷(⍵+1), or 0.171573 or less. The terms of the series are odd powers of t, which means each term is smaller than the previous by a factor of t*2, or 0.0294373, which declines to 0 rapidly. See the J Wiki essay, Extended Precision Functions.

On the other hand, *⍵ would be computed by +/(⍵*j)÷!j←⍳n. You can have a precomputed table here too so that *⍵ only need to use the series for small |⍵ (<1), but you are still operating on both even and odd powers.

6. Let us do one big happy benchmark:

`      x←?1e5⍴0      y←?1e5⍴0      cmpx '⍟x' 'x⍟y' '(⍟y)÷⍟x' '*x' 'x*y' '*y×⍟x'  ⍟x      → 6.43E¯4 |    0% ⎕⎕⎕                          * x⍟y     → 2.81E¯3 | +337% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                * (⍟y)÷⍟x → 1.59E¯3 | +148% ⎕⎕⎕⎕⎕⎕⎕⎕                      * *x      → 1.11E¯3 |  +72% ⎕⎕⎕⎕⎕                        * x*y     → 6.21E¯3 | +865% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕* *y×⍟x   → 1.82E¯3 | +182% ⎕⎕⎕⎕⎕⎕⎕⎕⎕`

If you compare ⍟x to *x (6.43e¯4÷⍨1.11e¯3 or 1.73), and (⍟y)÷⍟x to *y×⍟x (1.59e¯3÷⍨1.82e¯3 or 1.14), the differences aren't so large, certainly not large enough (no greater than a factor of 2) to sweat over.

7. Some morals of this story:

• There are lies, damned lies, and benchmarks.
• Is it really worth it to have extra code for 2⍟⍵ and (*1)⍟⍵ and 10⍟⍵, when the benefits are not that significant?
• Speed improvements made in one area (e.g. divide using vector instructions) create opportunities in other areas.
Finally, I don't recommend that y'all start coding (⍟⍵)÷(⍟⍺) instead of ⍺⍟⍵, or *⍵×⍟⍺ instead of ⍺*⍵. Personally, I expect Dyalog will improve * and ⍟ sometime, although not in the immediate future, and certainly not to the extent of writing replacements for C library functions.
Roger|Dyalog

Posts: 238
Joined: Thu Jul 28, 2011 10:53 am