A friend sent me the following puzzle:

From: Jeff CHILTON <jwc@caliban.chilton.com>

To: jimw@math.umass.edu

Date: Mon, 24 Apr 1995 22:54:00 -0400

Subject: Programming exercise...

We have an imposing looking cipher-lock at our office. It has five buttons and is programmed accordingly: The combination is a sequence of sets of buttons. To open the lock you first press all the buttons in the first set, release, then the buttons in second set, and so on. Also, each button may only be used once in the entire combination (so if the first set is all of the buttons then there can't be a next set, they're used up).

Clarification: Not all buttons need be used; a button can be pushed either once or not at all. But at least one button must be pushed to unlock the door.

The Problem: Write a program to generate all possible combinations for the lock. How many are there?

Jeff included a C program and wanted to know how an APL solution stacks up in terms of speed.

Jim

I took a wrong turn solving this puzzle and got lost in the boonies. My approach was to build the result recursively: For each possible set of buttons, recurse on the remaining nonintersecting sets. The problem with this technique is that the result is built up using repeated catenation, a process whose execution time can rise with the square of the problem size. For example, here's a very simple program that builds up a 1081-by-5 matrix by repeated catenation:

{del} CATS;I;Z [1] Z{<-}0 5{reshape}{neg}1 [2] I{<-}0 [3] L1:{->}(1081<I{<-}I+1)/0 [4] Z{<-}Z{commabar}1 2 3 4 5 [5] {->}L1 {del}

On my 66 MHz 486, this program takes 1.3 seconds to run. Why so long?
It's because the program moves 11.7 million bytes of data around, even
though the final result is only 21,620 bytes in size. With each
catenation, the result so far is copied to a new, slightly larger
container along with the new row. On the first iteration, 20 bytes are
copied; on the second, 40 bytes; on the third, 60, then 80, 100, etc.
The total is `{times}/.5 1082 1081 20` bytes.

The recursive program (PBLOCK1, listed below) ran in about 1.47 seconds. (Including formatting the result, which took 0.2 secs.) Jeff said his C program ran in 0.07 seconds, about 20 times faster than the APL version. I tried various approaches to speed up my program, but nothing helped; all the alternatives were slower. One approach was preallocating a result and using indexed assignment to fill it in. This eliminated the N-squared catenation problem, but the savings were more than offset by the overhead of building an index vector and checking bounds just to stuff in a few characters.

When I showed my solutions to Jeff (who's an ex-APLer), he asked why I didn't "whomp up a 5 by 5 by 7776 Boolean matrix, do a few scans, reductions and comparisons, and voila: the selection vector of valid combinations." I thought about this, and it seemed to me that, without knowing ahead of time what the solutions were, there were about 32 possibilities for each of five button sets, for a total of 32*5 or about 33 million possible combinations. Then he explained how to use a base-6 number to represent button sets. (As Roger Hui figured out immediately--impressive.) With that giveaway, the APL solution was straightforward:

A combination for the lock can be represented as a 5-digit base-6 number. The digits correspond to buttons, and the digit values indicate the round in which the button is pushed. Zero means not pushed. For example, 11023 means push buttons 1 and 2 first, button 4 second, and button 5 third (and don't push button 3). All 5-digit base-6 numbers (except 00000, the null combination, which we don't want) can be generated using:

T{<-}(5{reshape}6){represent}{iota}(6*5)-1 @ #IO=1 here

The result is a 5-by-7775 matrix of digits. Each column holds the digits of one number.

Some of the numbers aren't valid combinations. For example, 11333 isn't valid because there's no second set of buttons. The validity check is similar to that for the left argument to dyadic transpose: it must contain all values from 1 up to the largest element. (Except we ignore zeros here.) To identify valid combinations, first create a 7775-by-5 bit matrix indicating which of the values 1-5 occurs in each column of T:

B{<-}{or}{slashbar}T{jot}.={iota}5

Valid numbers are those whose row of B consists of 1s followed by 0s
(e.g., 1 1 1 0 0 is OK, but 1 1 0 1 0 is not). A valid row will equal
its `^\`, so the valid columns of T can be selected with:

T{<-}(^/B=^\B)/T

For formatting, it's convenient to have each set of buttons be represented by a single number. This can be done by forming a mask in which 1s mark buttons being pushed, and treating the mask as a binary number. For example, 1 0 1 1 0 means buttons 1, 3, and 4 are pressed, and it can be coded as 2{basevalue}1 0 1 1 0, which is 22. The combinations in T can be transformed to this coded representation using:

T{<-}2{basevalue}T{jot}.={iota}5

The result is a 1081-by-5 matrix of codes. For example, row 277 holds 22 8 1 0 0, which means:

22 = 1 0 1 1 0 push buttons 1, 3 and 4 8 = 0 1 0 0 0 push button 2 1 = 0 0 0 0 1 push button 5 0 = 0 0 0 0 0 (null set) 0 = 0 0 0 0 0 (null set)

One nice way of displaying this combination is "134 2 5". To convert a code like 22 to the phrase "134", you can index into a character matrix containing the phrases for each of the 32 possible button sets. The formatting process is not complicated; see the listing of PBFORMAT below.

So the answer is...

Z{<-}PBLOCK 5 @ run with five buttons {shape}Z 1081 10 @ 1081 possible combinations @ Some of the combinations: 10 55{reshape}2 1 3{transpose}Z[0 250 500 750 1071{jot}.+{iota}10;],' ' 1 14 235 24 15 3 35 4 1 5 4 2 3 1 2 14 25 24 3 35 4 1 2 5 4 2 3 1 1 2 3 14 25 3 24 3 1 35 4 12 5 4 23 1 2 3 4 14 3 24 3 1 5 35 4 2 5 4 23 1 1 2 3 4 5 14 3 2 24 3 15 35 4 2 1 5 4 3 1 2 3 45 14 3 2 5 24 3 5 4 5 4 3 1 1 2 3 5 14 3 25 24 3 5 1 4 1 5 4 3 1 2 1 2 3 5 4 14 3 5 24 35 4 1 2 5 4 3 12 1 2 34 14 3 5 2 24 35 1 4 1 2 3 5 4 3 2 1 2 34 5 14 35 24 5 4 1 2 3 5 5 4 3 2 1 1{pick}{each}{shape}{each}PBLOCK{each}0,{iota}5 0 1 5 25 149 1081 @ number of combinations for 0-5 buttons

The base-6 solution in PBLOCK runs in 0.72 seconds, twice as fast as the recursive version and about 10 times slower than Jeff's C program (which, when I looked at it, turned out to be recursive). Jeff wrote another C program which used the base-6 algorithm, but it ran 2 times slower than his recursive version. (Incidentally, all output from the C programs was redirected to a file, not assembled in memory or displayed.) The speed comparisons can be summarized as follows:

My best recursive APL program was 20 times slower than Jeff's recursive C version.

The noniterative base-6 APL program was 5 times slower than the corresponding base-6 C program.

By way of APL-J comparison, Roger Hui's "comb" function ran in 0.44 secs on a 486/50; the corresponding APL solution (lines [2-4] of PBLOCK) ran in 0.41 secs on a 486/66 using APL*PLUS II v5.2. Close enough that we'd need to use the same computer to decide which version was faster by a nose.

How long to write the programs? The first recursive version took 1.25 hours to develop, but I didn't realize that not all buttons need be pressed. Allowing short combinations took only a few minutes, and only a few more minutes were required to fix a bug once I realized there were duplicate solutions in the result. A couple of hours were wasted on recursive variations. Once the base-6 technique was explained to me, the nonlooping solution took only a few minutes to put together. The code has been cleaned up a bit for this posting.

What particularly impressed me was the stunning effect the choice of data structure had on solving this problem. The base-6 structure eliminated the need for recursion, reduced the problem space from 33 million to 7775 possibilities, made the problem much easier to solve conceptually, and made the solution "stand still", so the result is arrived at by a series of array transformations instead of being the result of nontrivial recursive logic. When optimization by close examination of the tree bark doesn't get you anywhere, consider backing way up and taking a fresh look at the forest.

Jim

The base-6 nonlooping version:

{del} Z{<-}PBLOCK N;B;T [1] @Returns all possible combinations for an {omega}-button door lock [2] T{<-}(N{reshape}N+1){represent}{iota}{neg}1+(N+1)*N [3] B{<-}{or}{slashbar}T{jot}.={iota}N [4] T{<-}(^/B=^\B)/T [5] T{<-}2{basevalue}T{jot}.={iota}N [6] Z{<-}PBFORMAT T {del} {del} Z{<-}PBFORMAT A;C;M;N;P [1] @Formats the numeric coded PBLOCK matrix for display [2] P{<-}2*N{<-}{neg}1{take}{shape}A [3] M{<-},{transpose}(N{reshape}2){represent}{neg}1+{iota}P {+ +} @ all possible subsets [4] C{<-}N{take}'123456789ABCDEF...' @ button names [5] C{<-}(P,N){reshape}M\M/({shape}M){reshape}C @ button set names [6] C{<-}C,'_' @ _ will become blank in result [7] C[1;]{<-}' ' @ leave first row blank [8] Z{<-}C[1+A;] @ spell the button set names [9] Z{<-}((1{take}{shape}Z),{times}/1{drop}{shape}Z){reshape}Z {+ +} @ merge last two dims to make matrix [10] Z{<-}'/'MTOV Z [11] Z{<-}(Z{/=}' ')/Z @ remove padding blanks [12] Z[(Z='_')/{iota}{shape}Z]{<-}' ' @ _s are the real blanks [13] Z{<-}VTOM Z [14] Z{<-}Z[#AV{gradeup}Z;] @ put in reasonable order {del}

The recusive version:

{del} Z{<-}PBLOCK1 N [1] @Returns all possible combinations for an {omega}-button door lock [2] Z{<-}PBFORMAT PBLOCKR N{reshape}0 {del} {del} Z{<-}PBLOCKR B;C;E;I;M;N;T [1] @Recursive part of PBLOCK1. {omega} is bit mask of buttons already used [2] @ The result is an integer matrix of all possible remaining sets [3] @ [4] N{<-}+/~B @ num buttons left [5] {->}(N>1)/L1 @ If only one, [6] Z{<-}(N,N){reshape}2{basevalue}~B @ return its code [7] {->}0 [8] L1:M{<-}(~B)\{transpose}(N{reshape}2){represent}{iota}{neg}1+2*N {+ +} @ nonempty subsets of remaining buttons [9] C{<-}2{basevalue}{transpose}M @ codes for the subsets [10] Z{<-}1 1{reshape}{neg}1{take}C @ don't recurse on the full set [11] I{<-}0 & E{<-}{neg}1+{shape}C [12] L2:{->}(E<I{<-}I+1)/0 @ Loop for each subset [13] T{<-}PBLOCKR B{or}M[I;] @ possible later sets after using C[I] [14] T{<-}0{commabar}T @ or we could just stop [15] Z{<-}Z OVER C[I],T @ remember the combinations [16] {->}L2 @ Endloop {del}

The programs were timed using assembler-coded fastfn versions of the utility functions OVER, VTOM, and MTOV. APL versions follow for reference:

{del} Z{<-}A OVER B [1] @Forms a matrix with {alpha} over {omega} [2] A{<-}({neg}2{take}1 1,{shape}A){reshape}A [3] B{<-}({neg}2{take}1 1,{shape}B){reshape}B [4] A{<-}(({shape}A){max}0 1{times}{shape}B){take}A [5] B{<-}(({shape}B){max}0 1{times}{shape}A){take}B [6] Z{<-}A,[#IO]B {del} {del} Z{<-}VTOM V;F;I;M;W [1] @Converts delimited vector {omega} to a matrix [2] F{<-}V=1{take}V{<-},V [3] I{<-}(F/{iota}{shape}F),1+{shape}F [4] W{<-}(1{drop}I)-{neg}1{drop}I [5] Z{<-}(,W{jot}.{>=}{iota}M{<-}0{max}{max}/W)\V [6] Z{<-}0 1{drop}(({shape}W),M){reshape}Z {del} {del} Z{<-}D MTOV M [1] @Converts matrix {omega} to an {alpha}-delimited vector [2] Z{<-}(,{reverse}{or}\{reverse}1,M{/=}' ')/,D,M {del}

Jeff's recursive C program:

#include <stdio.h> /* N_BUTTONS must be less than the number of bits in an ``int'' */ #define N_BUTTONS 5 #define MAX_MOVE (1 << N_BUTTONS) main() { reportAll("", 0); } reportAll(prefix, used) char *prefix; unsigned int used; { register int i, j; char display[N_BUTTONS * 2]; unsigned int plength; unsigned int move; strcpy(display, prefix); plength = strlen(display); if (plength) { display[plength++] = '-'; } for (move = 1; move < MAX_MOVE; move++) { if (move & used) { continue; } j = plength; for (i = 0; i < N_BUTTONS; i++) { if (move >> i & 1) { display[j++] = '1' + i; } } display[j] = '\0'; printf("%s\n", display); reportAll(display, used | move); } }

Jeff's base-6 C program:

#include <stdio.h> #define N_BUTTONS 5 #define MAX_LENGTH N_BUTTONS #define RADIX (N_BUTTONS + 1) main() { register int i, j; int nCandidates; int candidate; int nButtons[MAX_LENGTH]; char display[MAX_LENGTH][N_BUTTONS + 1]; int digit; int valid; /* Number of candidate combinations */ nCandidates = 1; for (i = 0; i < N_BUTTONS; i++) { nCandidates *= RADIX; } /* Format and test each candidate combination */ for (candidate = 1; candidate < nCandidates; candidate++) { /* Clear the n of buttons in each mash counters */ for (i = 0; i < MAX_LENGTH; i++) { nButtons[i] = 0; } /* Loop over buttons and distribute them to mashes */ i = candidate; for (digit = 0; digit < N_BUTTONS; digit++) { j = i % RADIX - 1; if (j > -1) { display[j][nButtons[j]] = '1' + digit; nButtons[j]++; } i /= RADIX; } /* Test validity */ j = 0; valid = 1; for (i = MAX_LENGTH - 1; i > -1; i--) { if (nButtons[i] > 0) { j = 1; } else { if (j) { valid = 0; break; } } } /* Print if valid */ if (valid) { for (i = 0; i < MAX_LENGTH; i++) { display[i][nButtons[i]] = '\0'; printf("%s ", display[i]); } printf("\n"); } } }

Here are a few timings of Roger Hui's and my pushbutton lock programs. Roger and I both have copies of APL*PLUS III (although I have version 1.1 and he has 1.2), and it appears that the speed ratio of our 66 and 50 MHz 486s is about 66:50. I've scaled Roger's J 2.06 timing to represent the estimated run time on a 66 MHz 80486/DX2. The following table gives the time required to compute the raw base-6 numeric result (Roger's "comb" function, or lines [2-4] of my PBLOCK function):

APL*PLUS II/DOS v5.2 ______ .41 secs APL*PLUS III/Win v1.1 ______ .36 J/Win v2.06 ________________ .28 J/Win v2.03 ________________ 8.7 J/DOS v7 ___________________ 6.0

So J 2.06 is the winner of the speed competition, and by somewhat more than a nose. But don't expect to see this speed on older versions of J. (J 2.03 is the freeware Windows version available on watserv1.) The folks at ISI have done an impressive job of speeding up the J interpreter recently. Something to keep in mind if you're using older versions of J for serious work.

Jim

Home Page