## Permutations and Anagrams

This is a rather long thread, and what follows are only my messages on the subject. There are others by Roger Hui and Eugene McDonnell that I haven't included here. (Get a copy of the APL Newsreader and comp.lang.apl archive to read the other postings!) The messages include:

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 5 Jun 1995 13:04:11 GMT
Subject: Puzzler: Permutations and Anagrams

Here's the programming puzzle from the 1995 Quarter 2 issue of the Zark APL Tutor Newsletter. Perhaps someone at APL95 would be kind enough to print this out (please untransliterate first!), post it, and encourage attendees to submit solutions.

LIMBERING UP:

Permutations and Anagrams

In the 1992 Quarter 2 issue of the Zark APL Tutor News, a highly efficient (though obscure) function for constructing combinations was presented:

```     {del} R{<-}M COMBIN N;D;E;F;G;P
[1]    @ Returns a matrix of every possible
[2]    @ combination of M elements from the
[3]    @ vector {iota}N.  That is, returns a
[4]    @ matrix with M!N rows and N columns.
[5]    @
[6]    E{<-}({iota}P{<-}N-R{<-}M-1)-#IO
[7]    D{<-}R+{iota}P
[8]    R{<-}(P,1){rho}D
[9]    P{<-}P{rho}1
[10]  L1:{->}(#IO>1{take}D{<-}D-1){rho}0
[11]   P{<-}+\P
[12]   G{<-}+\{neg}1{drop}0,F{<-}{reverse}P
[13]   E{<-}F/E-G
[14]   R{<-}(F/D),R[E+{iota}{rho}E;]
[15]   E{<-}G
[16]   {->}L1
{del}

3 COMBIN 5
1 2 3
1 2 4
1 2 5
1 3 4
1 3 5
1 4 5
2 3 4
2 3 5
2 4 5
3 4 5
```

Your task is to write a monadic function PERMUTE, whose argument is a positive integer N and whose result is an N-column, (!N)-row matrix, in which each row is a different permutation of {iota}N.

```      PERMUTE 3
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
```

Try to make the function as efficient as possible. In case you don't find this exercise adequately limbering, your second task is to write a monadic function ANAGRAM, whose argument is a character vector, and whose result is a character matrix in which each row is a different arrangement of the argument's letters. (ANAGRAM calls PERMUTE.)

```      ANAGRAM 'SMSI'
IMSS
ISMS
ISSM
MISS
MSIS
MSSI
SIMS
SISM
SMIS
SMSI
SSIM
SSMI
```

Mail your solution on paper or APL*PLUS PC or APL*PLUS II disk to:

Zark Incorporated
23 Ketchbrook Lane
Ellington, CT 06029
U.S.A.

[And please post it here in comp.lang.apl, too! -JimW] The notable functions and their authors' names will be printed in the next issue of the Zark APL Tutor News. Entries must be received by July 15, 1995. Good luck and happy limbering.

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 9 Jun 1995 14:33:51 GMT
Subject: Puzzler: Permutations and Anagrams

Well, Gary asked us to try to make the function "as efficient as possible". Assuming this means "as fast as possible", here's my solution:

```      {del}Z{<-}PERM N
[1]    @Really fast version
[2]    Z{<-}{disclose}PERMS[1;#IO+N]
{del}
```

All possible results are precomputed and stored in a 2-by-X matrix PERMS. The columns of PERMS hold the result for arguments of 0, 1, 2, etc., with the first row holding the 1-origin results and the second row the 0-origin results. This is practical because PERM can be called with only a few different arguments. For arguments of 8, 9, and 10, the result is 1.3, 13, and 145 million bytes in size, one of which seems likely to be a practical upper bound for today's computers. Out-of-range arguments will give an INDEX ERROR instead of WS FULL, but you could correct that with some additional checking.

Of course you also need a program to generate the PERMS global variable. Here's what seems to be a reasonable solution. #IO is 1.

```     {del} Z{<-}PERM1 N;A;D;S
[1]    @Returns a (!{omega})-by-{omega} matrix of all permutations of {+
+}{iota}{omega}
[3]   L1:S{<-}{rho}Z            @ Loop
[4]    D{<-}S[2]+1              @   index of column we will add
[5]    {->}(D>N)/0              @   quit if we have enough columns
[6]    Z{<-}(S{times}D,1){rho}Z @   make D duplicate copies of Z atop {+
+}each other
[7]    A{<-}S[1]/{iota}D        @   shift each copy by a different amount
[8]    Z{<-}(-A){rotate}D,A{rotate}Z @   insert D in a different column {+
+}for each copy
[9]    {->}L1                   @ Endloop
{del}
```

(Exposure time and f-stop not recorded, and not very relevant because I swiped the algorithm from Knuth's TAO CP vol 1.) The basic idea is that, given a single order-3 permutation such as 3 1 2, you can generate order-4 permutations by inserting 4 in all possible locations:

```                4 3 1 2
3 4 1 2
3 1 4 2
3 1 2 4
```

This is approximatly what line [8] is doing. If Z is 4 3{rho}3 1 2 and A is 0 1 2 3, then A{rotate}Z shifts the insertion points to the first column, "D," does the "insertion", and (-A){rotate} restores the original alignment. When you repeat this process for each order-3 permutation, you get all possible order-4 permutations.

Using PERM1, you can define PERMS as:

```      PERMS{<-}0 {neg}1{jot}.+PERM1{each}0,{iota}N
```

With APL*PLUS II/386 v5.2 on my 486/66 and a #WA of about 7.8E6, the statement above runs in 2.11 secs for N=8. Then, PERM takes about 0.0006 secs to return any result.

The original problem didn't say anything about row ordering, and PERM1 doesn't return its result in any particular order. If lexical order is desired, the result can be sorted with Z{<-}Z[{gradeup}Z;].

In writing ANAGRAM, you have to be careful about duplicate letters in the argument--they shouldn't result in duplicate entries in the result. Here's a straightforward version that calls PERM:

```     {del} Z{<-}ANAGRAM C
[1]    @Returns all anagrams of {omega}
[2]    C{<-},C                           @ arg might be scalar
[3]    Z{<-}C[PERM {rho}C]               @ all arrangements of C
[5]    {->}((C{iota}C)^.={iota}{rho}C)/0 @ If any duplicate letters,
[6]    Z{<-}(1,1{drop}{or}/Z{/=}{neg}1{rotatebar}Z){slashbar}Z {+
+} @    remove dups from the result
{del}
```

It may be possible to write a more efficient version that deals with duplicate letters directly instead of calling PERM.

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 12 Jun 1995 17:32:52 GMT
Subject: Re: Puzzler: Permutations and Anagrams

On June 10, Roger Hui wrote:

It may be informative and amusing to compare the time to compute "anagram" in APL and in J, where APL computes permutations by table look-up (as above) and J computes them from scratch.

Actually, using table look-up in ANAGRAM doesn't save as much time as you might think. Alphabetizing the result takes longer than building the permutation matrix from scratch. Using the #MF monitoring facility of APL*PLUS II/386, here are the line-by-line execution times for ANAGRAM (modified to eliminate table look-up):

```      1 #MF'ANAGRAM2'          @ activate monitoring
1
Z{<-}ANAGRAM2'ANAGRAM'   @ run the function

]LTIMES ANAGRAM2         @ display a report

All  Here  Iter      {del} Z{<-}ANAGRAM2 C
0     0     0  [1]   @Returns all anagrams of {omega}
0     0     1  [2]   C{<-},C
174     0     1  [3]   Z{<-}PERM1 {rho}C       @ build permutation matrix
25    25     1  [4]   Z{<-}C[Z]
336   336     1  [5]   Z{<-}Z[#AV{gradeup}Z;]  @ alphabetize
1     1     1  [6]   {->}((C{iota}C)^.={iota}{rho}C)/0
91    91     1  [7]   Z{<-}(1,1{drop}{or}/Z{/=}{neg}1{rotatebar}Z){slashbar}Z
{del}
627   452     1
```

The first two columns of the report are execution time in milliseconds on my 486/66. The "All" column gives the total time spent on each line, including time spent in subroutines. The "Here" column excludes time spent in subroutines. "Iter" is the number of times each line was executed. The last row is totals.

A much faster way of removing duplicate rows (without alphabetizing) is by using the MATIOTA (aka ROWFIND) fastfn. MATIOTA is similar to dyadic iota for character matrices--it looks up each row of the right argument in the left argument, and it can take the place of {iota} in the classic "nodups" idiom. Here's the execution profile for a version of ANAGRAM that uses this technique:

```      1 #MF'ANAGRAM3'
1
Y{<-}ANAGRAM3 'ANAGRAM'

]LTIMES ANAGRAM3

All  Here  Iter      {del} Z{<-}ANAGRAM3 C
0     0     0  [1]   @Returns all anagrams of {omega}
0     0     1  [2]   C{<-},C
176     0     1  [3]   Z{<-}PERM1 {rho}C
26    26     1  [4]   Z{<-}C[Z]
58    12     1  [5]   Z{<-}((Z MATIOTA Z)={iota}1{take}{rho}Z){slashbar}Z
{del}
261    38     1
```

MATIOTA is written in assembler and uses hashing to reduce the number of rows that are compared. Although using a non-APL subroutine is not as satisfying as using standard built-in operations, the subroutine is, at least, supplied with the APL system, so ANAGRAM3 qualifies as an out-of-the-box "stock" solution (for APL*PLUS, at least).

In spite of the faster execution, the APL version is still slower than Roger's version running on J 2.06:

```                     Argument

System     'ANAGRAM'  'ABCDEFG'    Function
--------    ---------  ---------    --------
+II 5.2      0.636      0.486      ANAGRAM2
+III 1.1      0.450      0.373      ANAGRAM2

+II 5.2      0.253      0.257      ANAGRAM3
+III 1.1      0.236      0.242      ANAGRAM3

J 2.03      0.379      0.379      anagram
J 2.06      0.117      0.121      anagram    (timed by Roger)
```

Times are seconds on a 486/66. (I multiplied Roger's 486/50 times by 50{divide}66 to make them comparable with my times.)

Some observations on the times: APL*PLUS III is significantly faster than APL*PLUS II. The speedups illustrated above for ANAGRAM come mostly from improvements to gradeup. On +II, gradeup's execution time depends on the data in the argument; on III, execution is faster and the time appears to depend only on the size of the argument. Once again, J 2.06 demonstrates that it is much faster than J 2.03.

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 14 Jun 1995 00:49:41 GMT
Subject: Re: Puzzler: Permutations and Anagrams

For what it's worth, here are two more ways of computing a permutation matrix.

```      {del}Z{<-}PERM2 N;C;I
[1]    @Returns a (!{omega})-by-{omega} matrix of all permutations of{+
+} {iota}{omega}
[2]    Z{<-}1 0{rho}I{<-}0
[3]   L1:{->}(N<I{<-}I+1)/0         @ Loop for each column
[4]    C{<-}(1{take}{rho}Z)/{iota}I @   new column to put in front
[5]    Z{<-}(({rho}C),I-1){rho}Z    @   stack I copies of Z atop each other
[6]    Z{<-}C,Z+Z{>=}{transpose}({reverse}{rho}Z){rho}C {+
+} @    recode Z and put on the col
[7]    {->}L1                       @ Endloop
{del}
```

This version returns the permutation matrix in lexical order, adapts to the current index origin, returns the correct result for N=0, and runs a bit faster than PERM1.

I derived this algorithm by looking at section 4 of an order-four permutation matrix (the rows that have 4 in the first column):

```                4 1 2 3
4 1 3 2
4 2 1 3
4 2 3 1
4 3 1 2
4 3 2 1
```

The stuff to the right of the first column is just an order-three permutation matrix, so this section can be built by induction. Section 3 of the order-four matrix is similar, but the 3s in the order-three matrix are replaced by 4s. And in section 2, the 2s are replaced by 3s and the 3s by 4s. So the first row of each section looks like this:

```                1 2 3 4
2 1 3 4
3 1 2 4
4 1 2 3
```

At first I thought of using indexing to recode P, the order-three permutation matrix (e.g., 2,(1 3 4)[P] generates section 2), but then I realized that the recoding simply increments the elements of P that are {>=} the number in the first column. This can be done by adding a Boolean matrix to P, as in 2,P+P{>=}2 which generates section 2. Line [4] of PERM2 generates the new first column, and on line [6] the reshape-transpose duplicates the first column I times for comparison with Z.

After writing PERM2, I finally managed to figure out what Roger's J program is doing. Here is an APL translation:

```      {del}Z{<-}PERM3 N;I;T
[1]    @Roger Hui's perm algorithm, in APL
[2]    Z{<-}1 0{rho}I{<-}0
[3]   L1:{->}(N<I{<-}I+1)/0
[5]    Z{<-}{disclose}{commabar}/T NDX{each}{enclose}#IO,1+Z
[6]    {->}L1
{del}
```

NDX is just {alpha}[{omega}], needed for applying each to indexing. The {enclose}[1] on line [4] should be {split} in APL*PLUS Evlevel 1 or Dyalog APL. For I=4, the items of T on line [4] are the rows of the first-row-of-each-section matrix shown above. Indexing each item with #IO,1+Z gives the sections of the next higher-order permutation matrix as a vector of matrices, and {disclose}{commabar}/ stacks the sections atop each other, forming a flat matrix.

Here are some timings:

```                        Argument
6      7      8
+II 5.2
PERM1          .025   .178   1.59
PERM2          .021   .137   1.36
PERM3          .023   .098   1.37

+III 1.1
PERM1          .021   .134   1.63
PERM2          .019   .132   1.38
PERM3          .022   .093   1.00

perm
J 2.03         .077   .244   1.69
J 2.06         .017   .037    .28   (timed by Roger Hui)
```

Times are seconds on my 486/66. The J 2.06 timings are from Roger's June 10 posting, scaled by 50{divide}66. The performance of J 2.06 is nothing short of astonishing--perm 8 apparently runs fully six times faster on J 2.06 than on 2.03!

Jim

From: Jim Weigang <jimw@MATH.UMASS.EDU>
Newsgroups: comp.lang.apl
Date: 14 Jun 95 17:30:27 GMT
Subject: Re: Puzzler: Permutations and Anagrams

I wrote PERM2 because I thought that if the permutation matrix P was in lexical order and I sorted the letter list C, then the anagram matrix C[P] would be in alphabetical order, and I could just shift and compare to remove duplicates. However, on trying this, I found out I was wrong! C[P] is not in alphabetical order. To see why, consider the letters 'AABC'. Both 1 and 2 in the numeric matrix P become 'A' in C[P]. When sorting P, if one row has 1 in the first column and another has 2, you can stop right there and put the 1-row before the 2-row. But in the character matrix, both 1 and 2 are As, and the 2-row might actually belong before the 1-row, depending on what's in later columns.

Jeff Chilton suggested the following:

'ABCC' indexed by the in-order permutation matrix is not an alphabetical matrix but the rows that are out of order are the ones you want to delete--algorithmically, examine the second through last rows of the anagram matrix and remove each one that is less than or equal to its last non-deleted predecessor.

Here's one way of doing this in APL:

```     {del}Z{<-}ANAGRAM4 C;B;D;J;K
[1]    @Jeff Chilton's algorithm
[2]    Z{<-}PERM2 {rho}C{<-},C  @ numeric permutation matrix
[3]    C{<-}C[#AV{gradeup}C]    @ alphabetize the letter list
[4]    {->}(({iota}{rho}C)^.=D{<-}C{iota}C)/L1 {+
+} @ If duplicate letters in C,
[5]    J{<-}({rho}C){basevalue}{transpose}D[Z] {+
+} @   recode Z, then pack each row into a number
[6]    K{<-}(B{<-}J={max}\J)/J  @   remove out-of-order entries
[7]    Z{<-}(B\1,1{drop}K{/=}{neg}1{rotate}K){slashbar}Z {+
+} @   remove duplicate entries as well
[8]   L1:Z{<-}C[Z]              @ return letters
{del}
```

Line [5] recodes the full permutation matrix so it has the same duplicates as C, then packs each row into a single number. Duplicate rows will have identical numbers. Lines [6] and [7] do what Jeff suggested. This program isn't quite as fast as the MATIOTA version, but it's a lot faster than the sorting version (and it's pure APL). Incidentally, using PERM2 in the MATIOTA version and sorting the letter list before indexing with the permutation matrix will cause the anagram matrix to be in alphabetical order.

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 17 Jun 1995 03:17:52 GMT
Subject: Re: Puzzler: Permutations and Anagrams

This is an analysis and comparison of the J and APL implementations of Roger Hui's p4 perm function (listed in his June 10 comp.lang.apl posting).

The main statement in Roger's function is:

```        z=. ,/ (0,.1+z) {"2 1 \:"1 =i.j
```

This is executed repeatedly with j=1, 2, 3, etc. First, a description of the steps and some commentary from an APLer who is just learning to creep in the world of J.

i. j - {iota}j. The . and : suffixes are one of the easier aspects of J to get used to. They serve a role similar to the {quad} prefix of APL, giving the J system two entire "namespaces" for built-in system functions. One can only wish the foreign conjunctions had been given names like "timer." instead of expecting users to define their own covers for 6!:2.

1=i.j - No, the = is not dyadic. The 1 is grabbed by the " to its left, leaving the = as a monadic function. This is an example of an aspect of J that is fundamentally more difficult than APL-- figuring out what's an argument to what. Monadic = is "self-classify"; for a vector X, =X is ({unique}X){jot}.=X, and =i.j is a j-by-j identity matrix.

Roger Hui pointed out in a later message that the 1 being grabbed by an operator to its left is a property that J shares with APL2. (I was not aware of this becase APL*PLUS has neither user-defined operators nor any operators that take array arguments.) However, I stand by my point that figuring out what's an argument to what is more difficult in J than in APL.

\:"1 - grade down (\:) with rank 1. The rank operator causes the grade argument to be viewed as an array of rank-1 objects. Because the argument is a matrix, it is viewed as a vector of vectors, with each row being an item of the top-level vector. Grade is applied to each row in turn, with the permutation vector being placed in the corresponding row of the matrix result.

Rank is a much richer way of describing repetition than each. In APL2 and its kin, you have one very simple repetition operator (each), and to do anything complicated you have to transform the data structure to specify the axes you want repetition over. For example, to do \:"1 M in APL2, you would use {enclose}[2]M to hide the column axis, apply grade-each, then use {disclose} to restore the column axis. The trouble is, the enclose and disclose actually result in data movement, which is not so good for efficiency, and they require some thought on the part of the reader to figure out what is going on. With rank, the data sits still and the numeric right argument describes the repetition. On the other hand, rank works only if the overall result can fit within the confines of a rectangular multidimensional array.* But, as the success of non-nested APL demonstrates, you can do quite a lot within those limits.

* Roger pointed out that this is not correct. He wrote: "The individual results from the argument cells need not have the same rank or shape. In constructing the overall result from the individual cellular results, a process which I call 'permissive assembly' uses appropriate padding to produce a rectangular overall result."

(0,.1+z) - this would be 0,1+z in APL. APLers should note that, for matrices, A,B in J is similar to A{commabar}B in APL, and that A,.B is like A,B in APL. The result of this expression is a matrix.

{"2 1 - "from", J's indexing function, with the rank operator. For a vector V, I{V is similar to V[I] in APL. In the program, the left and right arguments to this function are both matrices. The 2 specifies the rank of the cells in the left argument--it says to view the argument as an array of rank-2 objects (matrices); in this case, as a scalar of matrices. The 1 says to treat the right argument as an array of vectors; a vector of vectors in this case. Then, the scalar (of matrices) is extended to match the shape of the vector (of vectors) in a manner analogous to 1+2 3 4. So each row of the right argument is indexed by the matrix (0,.1+z). The result is a three-dimensional (flat) array.

In APL2, a vector can be indexed with a matrix using V[M] or ({enclose}M){squad}V. To index a set of vectors, you can use squad-each. If the vectors reside in the rows of a matrix, you must first use {enclose}[2] to hide the column dimension. To specify that the left argument should be applied in toto to each item of the right argument, you have to apply an extra enclose to the left argument. After indexing, disclose must be used to reveal the nested dimension. So the J statement L{"2 1 R for matrices L and R can be done in APL2 using:

```   {disclose}({enclose}{enclose}L){squad}{each}{enclose}[2]R
```

As mentioned before, with rank the data stands still and the argument describes the repetition. With APL2 the data has to be twisted to get it ready for each, then straightened out afterwards.

,/ - yes, it's catenate-reduction, but quite different from APL. It has the effect of coalescing the first two dimensions of the 3-D matrix. Remember that , stacks matrices vertically in J. In J, / applies to the "items" of the argument, with an "item" being a subarray obtained by selecting one level of the first dimension. For a matrix, each row is an item; for a 3-D array, each plane is an item.

In fairness to APL2, mimicing the p4 J program isn't as difficult as described above because the data doesn't always have to be de-nested after each step--it can be left nested so it is ready for the next step.

Now for the APL version. Why does it run so slowly? An execution profile for PERM 8 (executed 10 times using APL*PLUS II/386 v5.2) is quite revealing:

```   All    Here  Iter      {del}  Z{<-}PERM3B N;I;T
0       0     0  [1]    @Roger Hui's algorithm, in APL
2       2    10  [2]    Z{<-}1 0{rho}I{<-}0
21      21    90  [3]   L1:{->}(N<I{<-}I+1)/0
72      72    80  [4]    T{<-}{gradedown}{each}({enclose}{iota}I)= {+
+} {each}{iota}I
532     532    80  [5]    Z{<-}{enclose}#IO,1+Z
2,655      59    80  [6]    Z{<-}T NDX{each}Z
10,454  10,454    80  [7]    Z{<-}{commabar}/Z
3       3    80  [8]    Z{<-}{disclose}Z
3       3    80  [9]    {->}L1
{del}
13,747  11,146    10
```

Note that {commabar}/ is responsible for most of the execution time. The slowness is because the reduction is implemented literally, as a series of catenations, and it involves moving a lot of data. On the last iteration (the one that takes nearly all the time), it's catenating together eight 158 Kbyte matrices, one at a time, resulting in a total data movement of about 5.5 Mbytes, even though the result is only 1.3 Mbytes in size. Furthermore, as it builds larger and larger arrays, it chews through memory, forcing more frequent garbage collects.

A better way of doing this is by using {disclose} (or {mix} in APL*PLUS Evlevel 1 or Dyalog APL). Disclose has the sense to figure out the result size and copy each block of data just once. After the disclose, reshape must be used to coalesce the first two dimensions. Here's a version that does that:

```  All   Here  Iter      {del}  Z{<-}PERM3C N;I;T
0      0     0  [1]    @Version that avoids {commabar}/
1      1    10  [2]    Z{<-}1 0{rho}I{<-}0
20     20    90  [3]   L1:{->}(N<I{<-}I+1)/0
+}{each}{iota}I
523    523    80  [5]    Z{<-}{enclose}#IO,1+Z
2,691     60    80  [6]    Z{<-}T NDX{each}Z
1,127  1,127    80  [7]    Z{<-}{disclose}Z
1,131  1,131    80  [8]    Z{<-}(({times}/2{take}{rho}Z),2{drop}{rho}Z){rho}Z
1      1    80  [9]    {->}L1
{del}
5,577  2,935    10
```

In fact, the dimension-coalescing reshape can be postponed until the very end of the program. Z will build up a lot of dimensions as the loop proceeds, but they don't affect any of the operations in the loop.

```  All   Here  Iter      {del}  Z{<-}PERM3D N;I;T
0      0     0  [1]    @Version with the reshape postponed
2      2    10  [2]    Z{<-}1 0{rho}I{<-}0
19     19    90  [3]   L1:{->}(N<I{<-}I+1)/L2
+}{each}{iota}I
548    548    80  [5]    Z{<-}{enclose}#IO,1+Z
2,662     57    80  [6]    Z{<-}T NDX{each}Z
1,120  1,120    80  [7]    Z{<-}{disclose}Z
3      3    80  [8]    {->}L1
985    985    10  [9]   L2:Z{<-}(({times}/{neg}1{drop}{rho}Z),{neg}1{+
+}{take}{rho}Z){rho}Z
{del}
5,416  2,803    10
```

The savings is minor because the final reshape is the one that takes the most time, and moving the reshape out of the loop just avoids the less time-consuming cases.

(Note that I'm using NDX{each} in this program because APL*PLUS doesn't have {squad}. I don't expect that there's much penalty for calling a user-defined function here because arrays are large.)

By the way, it's possible to do this using only standard APL operations. The index-each technique is replaced with indexing the enlist (or ravel) of T with copies of Z that are increased by 0, I, I{times}2, I{times}3, etc., to select the different "rows" of T. Here's the program and execution profile:

```  All   Here  Iter      {del}  Z{<-}PERM3E N;I;M;T;V
0      0     0  [1]    @Roger Hui's algorithm, in flat APL
2      2    10  [2]    Z{<-}1 0{rho}I{<-}0
20     20    90  [3]   L1:{->}(N<I{<-}I+1)/L2
12     12    80  [4]    M{<-}({iota}I){jot}.={iota}I
31     31    80  [5]    M{<-}M+(2{times}I-{iota}I){jot}.+I{rho}0
31     31    80  [7]    T{<-}T-,(I{times}{neg}1+{iota}I){jot}.+I{rho}0
276    276    80  [8]    Z{<-}0,Z
16     16    80  [9]    V{<-}1+I{times}{neg}1+{iota}I
3,230  3,230    80  [10]   Z{<-}V{jot}.+Z
2,828  2,828    80  [11]   Z{<-}T[Z]
1      1    80  [12]   {->}L1
983    983    10  [13]  L2:Z{<-}(({times}/{neg}1{drop}{rho}Z),{neg}1{+
+}{take}{rho}Z){rho}Z
{del}
7,454  7,447    10
```

This runs slower than the nested version because the addition on line [10] generates a matrix the size of an I-th order permutation matrix, instead of an (I-1)th order matrix as in PERM3D. On the final iteration, this results in line [10] doing 8 times as much work as 1+Z on line [5] of PERM3D, and the times reflect this.

That the indexing on line [11] is slower than the NDX{each} on line [6] of PERM3D is surprising, since both lines are doing the same amount of work. Perhaps the difference is caused by my computer's 256K of RAM cache. In PERM3D, a 158K Z is used repeatedly, while in PERM3E, Z is 1.3M and is used only once.

The flat technique used on lines [4-7] to generate T may run fast, but it sure is obtuse. Remember that these four lines are just \:"1=i.I in J or {gradedown}{each}({enclose}{iota}I)={each}{iota}I in APL2.

PERM3D is by far the fastest of the APL versions. Here are some execution times (seconds on my 486/66):

```                        Argument
6      7      8

+II 5.2        .019   .067    .53

+III 1.1       .020   .071    .54

J 2.06         .017   .037    .28   (scaled from Roger's timings)
```

So, why is the J version still twice as fast? One possibility is that J2.06 might be able to alter the shape of an array without having to make a copy of the data, so the ,/ (which corresponds to lines [7] and [9] of PERM3D) is done without copying the large array. Another possibility is that J's block-move subroutine exploits the 80386's REP MOVSD instruction (something APL*PLUS does not appear to do) and that J's indexing function has a tighter inner loop than APL*PLUS's.

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 17 Jun 1995 04:07:01 GMT
Subject: Re: Puzzler: Permutations and Anagrams

Roger Hui wrote on June 16:

The component \:"@=@i. in the enumeration of permutations is a neat little problem per se. [...]

Can you think of other solutions?

For nested-array APL:

```      {disclose}{gradedown}{each}({enclose}{iota}N)={each}{iota}N
```

For flat APL:

```      +{backslashbar}({iota}N){commabar}1,(N-1 1){rho}N{take}{neg}1

N{<-}5
(N-1 1){rho}N{take}{neg}1
-1  0  0  0
0 -1  0  0
0  0 -1  0
0  0  0 -1

({iota}N){,-}1,(N-1 1){rho}N{take}{neg}1
1  2  3  4  5
1 -1  0  0  0
1  0 -1  0  0
1  0  0 -1  0
1  0  0  0 -1

+{\-}({iota}N){,-}1,(N-1 1){rho}N{take}{neg}1
1 2 3 4 5
2 1 3 4 5
3 1 2 4 5
4 1 2 3 5
5 1 2 3 4

#IO{<-}0
+{\-}({iota}N){,-}1,(N-1 1){rho}N{take}{neg}1
0 1 2 3 4
1 0 2 3 4
2 0 1 3 4
3 0 1 2 4
4 0 1 2 3
```

Unfortunately, neither of these solutions works for N=0 (at least on APL*PLUS II).

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 20 Jun 1995 20:55:37 GMT
Subject: Puzzler: Permutations and Anagrams

Ram Biyani (rambi@vnet.ibm.com) sent me the following version of PERM:

```     {del} Z{<-}PERM3R N;I;S;T
[1]    @Ram Biyani's version
[2]    Z{<-}1 0{rho}I{<-}0
[3]   L1:{->}(N<I{<-}I+1)/L2
[4]    S{<-}I*2
[5]    T{<-}({iota}I),(I-0 1){rho}(S{rho}0,I{rho}1)/S{rho}{iota}I
[6]    Z{<-}T[;#IO,1+Z]
[7]    {->}L1
[8]   L2:Z{<-}(({times}/{neg}1{drop}{rho}Z),{neg}1{take}{rho}Z){rho}Z
{del}
```

His solution to building the recode matrix (lines [4] and [5]) is novel and very nice. Displaying intermediate results with (I,I) in place of S makes the technique apparent:

```      I{<-}4
+J{<-}(I,I){rho}{iota}4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
+B{<-}(I,I){rho}0,I{rho}1
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
+T{<-}(I-0 1){rho}(,B)/,J
2 3 4
1 3 4
1 2 4
1 2 3
({iota}I),T
1 2 3 4
2 1 3 4
3 1 2 4
4 1 2 3
```

And line [6]! Oh, line [6]! Why didn't I think of it this way? With J{<-}#IO,1+Z, what we want is (T[1;])[J], or T[1;J], and T[2;J] and T[3;J], etc. Once stated this way, it's obvious that T[;J] does it all at once. Beautiful and simple.

As you might expect, this is the fastest APL solution so far. Here are the execution times on APL*PLUS II and III on my 486/66. The last row gives the estimated run time for Roger's perm 8 on the same computer.

```                        Argument
6      7      8

+II 5.2        .012   .048    .40

+III 1.1       .012   .048    .40   (no kidding; exactly = +II)

J 2.06         .017   .037    .28   (scaled from Roger's timings)
```

By the way, fully 98% of the execution time is spent on lines [6] and [8], so improvements to the other lines won't result in significant speedups.

Jim

From: jimw@math.umass.edu (Jim Weigang)
Newsgroups: comp.lang.apl
Date: 4 Sep 1995 14:06:20 GMT
Subject: Puzzler: Permutations and Anagrams

The 1995 Quarter 3 issue of the Zark APL Tutor Newsletter is out, with the results of the Permutations and Anagrams programming contest.

Published solutions include two interesting-but-not-practical nonlooping versions from Georges Brigham and J. Philip Benkard (originally printed in the Big APL Newsletter published by NY/SIGAPL; contact jhlaboyd@ix.netcom.com for info), two recursive versions from John Sullivan and Ram Biyani (who contributed solutions to c.l.a), and the "c.l.a group effort" that I sent in. The recursive versions gained speed by having the results for PERM 1, 2, and 3 imbedded in the functions, which made Ram Biyani's function the fastest for arguments <6. The group effort was the fastest for arguments >6. Adding precomputation to the group effort yields the following "ultimate" APL version:

```     {del} Z{<-}PERM N;A;I
[1]    @Returns a (!{omega})-by-{omega} matrix of all permutations of {iota}{+
+}{omega}
[2]    {->}(N{>=}3 2 1)/L3,L2,L1
[3]    Z{<-}1 0{rho}0
[4]    {->}0
[5]   L1:Z{<-}1 1{rho}#IO
[6]    {->}0
[7]   L2:Z{<-}2 2{rho}#IO+0 1 1 0
[8]    {->}0
[9]   L3:Z{<-}6 3{rho}#IO+0 1 2 0 2 1 1 0 2 1 2 0 2 0 1 2 1 0
[10]   I{<-}3
[11]  L4:{->}(N<I{<-}I+1)/L5
[12]   A{<-}+{backslashbar}({iota}I),[#IO]1,(I-1 1){rho}I{take}{neg}1
[13]   Z{<-}A[;#IO,1+Z]
[14]   {->}L4
[15]  L5:Z{<-}((!N),N){rho}Z
{del}
```

Gary printed only the simplest version of ANAGRAM (the version that sorts the matrix if there are any duplicates in the letter list), even though sorting the result takes more time than generating the permutation matrix. Maybe he didn't trust Jeff Chilton's clever and fast version (see my 14 Jun 1995 posting here) or maybe there wasn't enough space in the newsletter.

Jim

From: Jim Weigang <jimw@chilton.com>
Newsgroups: comp.lang.apl
Date: Tue, 5 Dec 1995 04:26:55 GMT
Subject: Puzzler: Permutations and Anagrams

The November 1995 issue of Dr. Dobbs has an article by Mani Iyer entitled "Permutation Generation Using Matrices" (p. 133). Unfortunately, the algorithm described in the article doesn't actually work. I mentioned this to Mani and he sent me a corrected version. The algorithm is not as fast (in APL, at least) as the algorithms described here in June, and the result isn't in sorted order, and it doesn't work for N{<=}2. But the technique is intriguing because it builds a set of square matrices and extracts permutations by reading off the rows and columns of the matrices. Half the permutations are found in the rows and half in the columns! An APL implementation is appended below.

The article has a pointer to an interesting paper: "Permutation Generation Methods", by Robert Sedgewick, in Computing Surveys, Vol. 9, No. 2, June 1977. (Incidentally, Roger Hui's indexing algorithm is not among those listed.) The paper describes an algorithm, attributed to D. N. Lehmer in 1906, for generating the i-th lexical pemutation of {iota}N (function LEXPERM, below). I had wondered how J's "atomic permutation" function was able to generate the i-th lexical permutation so quickly. It's simple enough to invert the algorithm to produce a function similar to J's "atomic index" function (PERMNDX below).

Jim

Note: These programs assume #IO is 1. For you IBMers, {commabar} is ,[#IO].

```     {del} Z{<-}PERMDD3 N;I;S
[1]    @Returns a (!{omega})-by-{omega} matrix of all permutations of {+
+}{iota}{omega}
[2]    @ Algorithm by Mani Iyer, a corrected version of the technique
[3]    @   described in the Nov 95 issue of Dr. Dobbs on page 133.
[4]    @
[5]    Z{<-}1 2{rho}{iota}2
[6]    I{<-}2
[7]   L1:{->}(N<I{<-}I+1)/L2
[8]    Z{<-}I{slashbar}Z,I
[9]    Z{<-}((1{take}{rho}Z){rho}-{iota}I){rotate}Z
[10]   {->}L1
[11]  L2:S{<-}((1{take}{rho}Z){divide}N),N,N
[12]   Z{<-}Z{commabar}({rho}Z){rho}1 3 2{transpose}S{rho}Z
[13]   @ Z{<-}Z[{gradeup}Z;] @ sort result if desired
{del}
```

LEXPERM1 and PERMNDX1 are simple versions that process only a single permutation. LEXPERM and PERMNDX are more complicated parallel versions that accept or return an array of permutation indices. Some examples:

```      1 LEXPERM 7   @ first permutation (6 and 7 are reversed)
1 2 3 4 5 7 6

P{<-}0 1000 5039 LEXPERM 7
P
1 2 3 4 5 6 7       @ 0th permutation (no change)
2 4 3 6 7 1 5       @ 1000th
7 6 5 4 3 2 1       @ 5039th (last one, number (!7)-1)

PERMNDX P
0 1000 5039

{del} Z{<-}J LEXPERM1 N;C;I
[1]    @Returns the {alpha}-th lexical permutation of {iota}{omega}
[2]    @ From an algorithm by D. N. Lehmer [1906]
[3]    Z{<-},1
[4]    C{<-}1+({reverse}1+{iota}N-1){represent}J
[5]    I{<-}N
[6]   L1:{->}(0=I{<-}I-1)/0
[7]    Z{<-}C[I],Z+Z{>=}C[I]
[8]    {->}L1
{del}

{del} Z{<-}PERMNDX1 P;N
[1]    @Returns the lexical index of permutation {omega}
[2]    N{<-}{rho}P{<-},P
[3]    Z{<-}{iota}0
[4]   L1:{->}(1{>=}{rho}P)/L2
[5]    Z{<-}Z,1{take}P
[6]    P{<-}1{drop}P-P>1{take}P
[7]    {->}L1
[8]   L2:Z{<-}({reverse}1+{iota}N-1){basevalue}Z-1
{del}

{del} Z{<-}J LEXPERM N;C;I
[1]    @Returns the {alpha}-th lexical permutations of {iota}{omega}
[2]    @ N must be a scalar, but J may have any shape.  {rho}Z {<-}{->} ({+
+}{rho}J),N
[3]    @ From an algorithm by D. N. Lehmer [1906]
[4]    Z{<-}(1,{times}/{rho}J){rho}1
[5]    C{<-}1+({reverse}1+{iota}N-1){represent},{transpose}J
[6]    I{<-}N
[7]   L1:{->}(0=I{<-}I-1)/L2
[8]    Z{<-}C[I;]{commabar}Z+Z{>=}({rho}Z){rho}C[I;]
[9]    {->}L1
[10]  L2:Z{<-}{transpose}((1{take}{rho}Z),{reverse}{rho}J){rho}Z
{del}

{del} Z{<-}PERMNDX P;S
[1]    @Returns the lexical indices of permutations {omega}
[2]    @ P may be a vector containing a single permutation, or an array
[3]    @ having a permutation in each row.  {rho}Z {<-}{->} {neg}1{drop}{rho}P
[4]    S{<-}{rho}P
[5]    P{<-}{transpose}(({times}/{neg}1{drop}S),{neg}1{take}1,S){rho}P
[6]    Z{<-}(0 1{times}{rho}P){rho}0
[7]   L1:{->}(1{>=}1{take}{rho}P)/L2
[8]    Z{<-}Z{commabar}P[1;]
[9]   P{<-}1 0{drop}P-P>({rho}P){rho}P[1;]
[10]   {->}L1
[11]  L2:Z{<-}({neg}1{drop}S){rho}({reverse}1+{iota}{neg}1+{neg}1{take}1,S){+
+}{basevalue}Z-1
{del}

```