This problem posed by Ken Iverson on comp.lang.apl resulted in lively discussion that was a pleasant change from a period of squabbling about whether J should be discussed in c.l.a. As with the best problems, the solution lead into fascinating areas seemingly unrelated to the original subject. In this case, topics include factoring and number representations. In addition to the solutions, I posted two FASTFNS (assembler routines for APL*PLUS II and III), one for finding prime numbers and another for factoring numbers.

- Problem description
- My solution
- The prime finder
- Comparison with J's prime finder
- My adaptation of Randy MacDonald's solution
- The factoring routine
- Discussion of J q: speedups and speed of an APL model of q:

One reason for maintaining a group devoted to all dialects of APL is the interest of different solutions to the same problem, as in the solutions to the Ball Clock problem given by Weigang (APL) and Hui (J). I hope that more problems will be submitted with invitations for solutions in any dialect. Here is one:

Although the binomial coefficient 10!21 is an integer, the straightforward calculation of !21 divided by the product of !10 and !11 would only approximate an integer, because !21 is not represented to full precision. In the first implementation of APL, Larry Breed provided precise integers by judicious cancellations.

The problem posed is to write a corresponding precise calculation of the binomial coefficients in some dialect.

For non-negative integers, A!B is (!B){divide}(!A){times}!B-A. With the example 10!21, this yields:

1 2 3 4 5 ... 19 20 21 ---------------------- 1 2 ... 10 1 2 ... 11

The 1..11 in the denominator cancels 1..11 in the numerator, leaving us with 12..21 on top and 1..10 on the bottom. (And we can drop the 1 in the denominator.) Vectors N and D representing the terms in the numerator and denominator can be computed as:

N{<-}(A{max}B-A){drop}{iota}B D{<-}1{drop}{iota}A{min}B-A 12 13 14 15 16 17 18 19 20 21 N ----------------------------- 2 3 4 5 6 7 8 9 10 D

(#IO is assumed to be 1 here.) Although no more terms can be cancelled, a reduction of some terms is possible. For example, the 20 in N and the 10 in D can be reduced to 2 and 1, and likewise for 18:9, 16:8, 14:7, etc., and the 15:3 can be reduced to 5:1, yielding:

2 13 2 3 2 17 2 19 2 21 ----------------------- 2 3 4 1 1 1 1 1 1

After this, more cancellation is possible (and the 1s can be dropped), giving:

13 2 2 17 2 19 2 21 ------------------- 4

Finally, 2:4 can be reduced to 1:2 and then a 2:2 term cancelled.

For automatic processing, one way of doing the reductions is by replacing each term with its prime factors. The problem is then simplified to one of matching up and cancelling identical terms. The ALLFACTORS function (listed below) takes a vector and returns the prime factors for all the numbers in the vector:

+ F{<-}ALLFACTORS N 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 5 5 7 7 13 17 19 + G{<-}ALLFACTORS D 2 2 2 2 2 2 2 2 3 3 3 3 5 5 7

Corresponding items can be cancelled by dualling subtraction with a mapping to frequency space.

+ H{<-}ONUB F,G @ distinct factors 2 3 5 7 13 17 19 + T{<-}+/H{jot}.=F @ frequency of each factor in F 10 5 2 2 1 1 1 @ (how many 2s 3s 5s 7s etc.) + U{<-}+/H{jot}.=G @ same for G 8 4 2 1 0 0 0 + C{<-}T{min}U @ common factors (which will cancel) 8 4 2 1 0 0 0 + X{<-}(T-C)/H @ uncancelled terms in F 2 2 3 7 13 17 19 @ Replicate (/) maps back out of frequency space + Y{<-}(U-C)/H @ uncancelled terms in G @ (none)

The result of A!B is given by ({times}/X){divide}{times}/Y.

After coding this up and experimenting a bit, I noticed that all the denominator terms were cancelled in every case; Y was always empty. Thinking about this, I realized that this must be true in general. A!B returns an integer result for integer arguments, and X and Y are lists of primes with no numbers in common. If Y wasn't empty, the ratio of the times reductions couldn't possibly be an integer, so Y must always be empty. Incorporating this observation in the algorithm yielded the following program:

{del} Z{<-}A BINOM B;D;F;G;H;N;P;T;U [1] @Computes {alpha}!{omega} with maximal cancellation of terms [2] N{<-}(A{max}B-A){drop}{iota}B @ numerator terms [3] D{<-}1{drop}{iota}A{min}B-A @ denominator terms [4] P{<-}PRIMESTO {floor}B*.5 [5] F{<-}P ALLFACTORS N @ all factors of the terms [6] G{<-}P ALLFACTORS D [7] H{<-}ONUB F,G @ distinct factors [8] T{<-}+/H{jot}.=F @ frequency of each factor [9] U{<-}+/H{jot}.=G [10] Z{<-}{times}/(T-U)/H @ cancel denom. terms, multiply the rest {del}

On line [4], the list of distinct primes is precalculated and passed to ALLFACTORS to save a bit of time. The subroutines are listed below. (Yes, I know factorization and nub are primitives in J!)

And then, while browsing in Knuth, I came across an amazing algorithm for factoring !N:

{del} Z{<-}P FACTORNF N;M [1] @Returns the prime factors of !{omega} [2] @ From Knuth, The Art of Comp. Prog., Vol. 1, p. 46 [3] {->}(2=#NC'P')/L1 [4] P{<-}PRIMESTO N @ compute primes if caller didn't provide them [5] L1:M{<-}{times}\{transpose}(({floor}2{log}N),{rho}P){rho}P [6] Z{<-}(+/{floor}N{divide}M)/P {del}

Using this little gem, the binomial program can be written as:

{del} Z{<-}A BINOM2 B;E;F;G;H;T;U;V [1] @Computes {alpha}!{omega} with maximal cancellation of terms [2] P{<-}PRIMESTO B [3] E{<-}P FACTORNF B @ factor !B [4] F{<-}P FACTORNF A [5] G{<-}P FACTORNF B-A [6] H{<-}ONUB E,F,G @ distinct factors [7] T{<-}H FRCOUNT E @ frequency of each factor [8] U{<-}H FRCOUNT F [9] V{<-}H FRCOUNT G [10] Z{<-}{times}/(T-U+V)/H @ cancel denom. terms, multiply the rest {del}

I replaced the frequency-counting idioms (e.g., +/H{jot}.=F) with a more efficient subroutine (FRCOUNT) because the outer product is rather slow when you're working with, say 5000 BINOM2 6000. (Which is around 1.58e1172. You can't do the final {times}/, but you can look at the terms and add up the logarithms.) The nifty FACTORNF algorithm doesn't save as much time as I thought it would because you have to compute more primes to use it. For arguments like 5000 6000, the cost of finding the extra primes just about equals the savings from more efficient factoring.

Nice problem, Ken! I'm curious about what Larry Breed's algorithm was. Anyone care to elaborate?

Jim

Here are the subroutines:

{del} Z{<-}PRIMESTO N;B;P;#IO [1] @Returns primes up to {omega} [2] Z{<-}{iota}#IO{<-}0 [3] N{<-}N+1 [4] B{<-}N{take}1 1 [5] L1:P{<-}B{iota}0 [6] {->}(P=N)/0 [7] Z{<-}Z,P [8] B{<-}B{or}N{rho}P{take}1 [9] {->}L1 {del}

PRIMESTO is cute (it finds primes without using any arithmetic operations, using the Sieve of Eratosthenes), but it's not very fast. A more efficient algorithm would help BINOM2, which needs lots of primes.

{del} Z{<-}P ALLFACTORS A;B;I;J;N [1] @Returns the prime factors of the numbers in {omega} [2] {->}(2=#NC'P')/L1 @ If primes list wasn't provided, [3] P{<-}PRIMESTO {floor}(0{max}{max}/A)*.5 @ generate it [4] L1:Z{<-}0{rho}A{<-},A [5] I{<-}1 & N{<-}{rho}P [6] {->}L3 @ dive on in [7] L2:{->}(N<I{<-}I+1)/L4 @ Loop for each prime [8] B{<-}0=P[I]|A @ indices of As that are divisible by P[I] [9] {->}(~1{epsilon}B)/L2 @ on to next prime if none [10] J{<-}B/{iota}{rho}B @ indices of divisible As [11] Z{<-}Z,({rho}J){rho}P[I] @ add the P[I]s to the result [12] A[J]{<-}A[J]{divide}P[I] @ remove the factors [13] L3:Z{<-}Z,(B{<-}A{epsilon}P)/A @ append As which are prime [14] A{<-}(~B)/A @ and remove them from further consideration [15] N{<-}{neg}1+(P{<=}{floor}(0{max}{max}/A)*.5){iota}0 {+ +} @ index of largest needed prime [16] I{<-}I-1 @ try same prime again [17] {->}L2 @ Endloop [18] L4:Z{<-}Z,A @ leftovers are prime, too [19] Z{<-}Z[{gradeup}Z] @ standardize the order {del} {del} Z{<-}ONUB A [1] @Distinct values of {omega}, in ascending order [2] A{<-},A [3] A{<-}A[{gradeup}A] [4] Z{<-}(1,1{drop}A{/=}{neg}1{rotate}A)/A {del} {del} Z{<-}D FRCOUNT V;B;F;I [1] @Counts frequency of values {alpha} in sorted vector {omega} [2] B{<-}V{/=}{neg}1{rotate}V & B[{iota}{signum}{rho}B]{<-}1 @ first marks [3] I{<-}B/{iota}{rho}B @ indices of firsts [4] F{<-}(1{drop}I,1+{rho}B)-I @ frequency of each kind [5] Z{<-}({rho}D){rho}0 [6] Z[D{iota}V[I]]{<-}F @ put them where they belong {del}

A prime-finder seems like a generally useful tool to have available, so I wrote an assembler version for APL*PLUS II/386 and +III/Windows. The technique is based on Algorithm P in section 1.3.2 of Knuth's "The Art of Computer Programming", vol 1.

{del} Z{<-}PRIMESTO N;T [1] @Returns primes up to {omega} [2] {->}(T{<-}''{rho}(1,N,#STPTR'Z T')#CALL PRIMESTO{delta}OBJ){drop}0 [3] #ERROR(3 5 7 8{iota}T){pick}'LENGTH ERROR' 'RANK ERROR' 'VALUE ERROR'{+ +} 'WS FULL' 'DOMAIN ERROR' {del} {del}. PRIMESTO{delta}OBJ{<-}1345730611 858915563 {neg}1991316349 {neg}19{+ +}93202588 {neg}1992940476 1715086428 35933321 609519974 {neg}957576422 {neg{+ +}}972945595 {neg}972939451 {neg}973076923 1711283781 1711282360 1711293833 {+ +}1712866697 3163591 28861952 15224832 1476395008 {neg}1267171282 74799184 8{+ +}2561229 1378112767 50873799 {neg}2097152000 2114139773 7179574 178278 5947{+ +}2 777519104 1351186560 {neg}855345832 {neg}16454712 1917985876 611093357 7{+ +}5333682 {neg}2141752062 1963001213 142443353 {neg}1957727742 1160460869 13{+ +}2406 {neg}388956160 {neg}1995946749 1166616645 474334752 491111938 5246663{+ +}69 409832705 243814 59472 777519104 {neg}13518720 1481703423 {neg}92608807{+ +}5 1425999083 376590884 841247883 28515819 62066923 95620331 129173739 1459{+ +}49931 {neg}1070399253 1441383306 {neg}1962934271 2105746557 192807734 {neg{+ +}}1962117749 {neg}201586611 {neg}954471515 519 71812864 3 84428743 {neg}956{+ +}301312 472645 1837957120 79193600 15224832 1476395008 {neg}1128759250 1358{+ +}954494 {neg}855345832 {neg}16454712 {neg}1957551020 {neg}1959648148 278011{+ +}57 {neg}1992294027 {neg}768392635 955 {neg}2081163520 74776826 37897603 69{+ +}9 {neg}1949158656 1077953093 953 {neg}2081294592 57999610 {neg}1962753149 {+ +}2106278477 {neg}406761720 612172546 {neg}947712533 51349764 116621771 9898{+ +}55744 478100045 2238 611648256 {neg}1047801293 {neg}92064009 1004565504 {n{+ +}eg}2083029498 {neg}320142138 {neg}1054573269 {neg}2092498193 2130983549 21{+ +}05757455 142541370 981304143 1325498113 33834438 17122758 17253830 {neg}19{+ +}95407991 1837959293 62416384 15224832 1476395008 {neg}189235154 1358954493{+ +} {neg}855345832 {neg}16454712 257041492 {neg}76414 611093503 611683122 {ne{+ +}g}1962115701 {neg}201586611 409832869 309350 59472 777519104 {neg}37898112{+ +} 1481703423 {neg}926088075 1425999083 1821069860 {neg}1070910940 {neg}1018{+ +}248061 960314 105807 @ Adjust the version signature for +II/III: {del}: PRIMESTO{delta}OBJ[#IO]{<-}(1345730611 2000042035)[1+1{epsilon}{+ +}#SYSID #SS'Win']

This runs pretty fast. On my 486/66, it takes 0.0105 secs to find primes up to 5000. (Compared with 2.66 secs for the APL sieve version, making it about 250 times faster.)

The assembler code is able to examine a range of numbers for primes, utilizing the prime list computed in a previous run. Here's a function that uses this feature to find the first N primes:

{del} Z{<-}PRIMESN N;F;P;Q;S;T [1] @Returns the first {omega} primes [2] P{<-}{iota}F{<-}0 [3] Q{<-}N{times}2 @we'll need to check at least this many numbers [4] @ Mem req is 2 copies of N+Q{divide}2 elt int vecs at 4 bytes/elt (8{+ +}{times}N)+4{times}Q [5] Q{<-}Q{min}((#WA-20000)-8{times}N){divide}4 @avoid WS FULL [6] S{<-}8 & {->}(Q<10)/L3 @signal WS FULL if not enough memory [7] @ Find primes in the range [F..F+Q-1]: [8] L1:S{<-}''{rho}(F,(F+Q-1),#STPTR'P T')#CALL PRIMESTO{delta}OBJ [9] {->}(S{/=}0)/L3 @jump if error [10] {->}(N{<=}{rho}P)/L2 @jump if we have enough [11] F{<-}F+Q @go on to next interval [12] {->}L1 [13] L2:Z{<-}N{take}P @toss extras at end [14] {->}0 [15] L3:#ERROR(3 5 7 8{iota}S){pick}'LENGTH ERROR' 'RANK ERROR' 'VALUE {+ +}ERROR' 'WS FULL' 'DOMAIN ERROR' {del}

It took 395 seconds to find the first million primes. (If you call 2
the first prime, the millionth is 15,485,863.) Of course, this is
nothing compared with Roger Hui's feat of finding the first 10 **billion**
(1e10) primes, but that took 20 hours on 60-70 workstations. (See
*Vector* Vol. 10, No. 4, April 1994, pp 110-113.)

Jim

P.S. The APL sieve function doesn't seem to be terribly inefficient. I haven't found a faster technique in APL, including a parallelized version of Knuth's algorithm.

J has a function that finds the n-th prime, p: . And it's quite a fast implementation, too. Setting i=.i.669 and finding primes to 5000 with p:i took just 0.0035 secs (with J 2.06 on Win3.1), which is 3 times faster than my assembler routine. Finding the first million primes was also several times faster, but this task caused almost continuous disk activity, and repeated timings ranged from 92 secs to 160 secs.

I think Roger must have learned something from finding billions and billions of primes that weekend at Morgan Stanley.

Jim

Randy MacDonald wrote:

Jim Weigang gives a solution to this problem, but I found it too programmatic, and not mathematical enough.

Assume a function "PVEC", which takes an integer, and returns its representation as a product of powers of primes, only the exponents would be shown. . . . multiplication becomes addition under PVEC, division becomes subtraction.

Nice solution! The math could be highlighted even more if the prime power vectors weren't of variable length. Although this sacrifices generality, you could have functions PFN (powers from number) and NFP (number from powers) use a global variable PRIMES and return a fixed- length vector of prime exponents. PRIMES would be set up beforehand to contain all the necessary primes. Defining PFNF N as PFN !N would allow you to state the original formula

(!B){divide}(!A){times}!B-Aas:

NFP (PFNF B)-(PFNF A)+PFNF B-A

For example:

PRIMES{<-}PRIMESTO 13 N{<-}115615500 + P{<-}PFN N 2 1 3 2 2 1 @ prime powers representation NFP P 115615500 @ convert back to N PFNF 12 10 5 2 1 1 0 0 0 @ prime powers of !12 NFP PFNF 12 479001600 !12 479001600 10 BINOM6 21 352716 10!21 352716

The new functions are listed below. See my previous messages for any missing subroutines. Enhancing these functions to handle higher rank arguments is left as an exercise for the reader!

Jim

{del} Z{<-}A BINOM6 B;PRIMES [1] @Computes A!B as (!B){divide}(!A){times}!B-A [2] PRIMES{<-}PRIMESTO B [3] Z{<-}NFP (PFNF B)-(PFNF A)+PFNF B-A {del} {del} P{<-}PFNF N;M [1] @Computes prime powers representation of !{omega} [2] M{<-}{times}\{transpose}(({floor}2{log}1{max}N),{rho}PRIMES){rho}PRIMES [3] P{<-}+/{floor}N{divide}M {del}

PFNF here uses Knuth's algorithm for factoring !N. (See the FACTORNF function in my previous message, and note a recent bug fix: 1{max} needs inserted in FACTORNF before taking the log.) However, given a fast factoring routine (like q: in J), I think it would be more efficient to factor each element of {iota}N separately, because this would require primes only up to Sqrt(N) instead of N. (Time for another Fastfn...)

{del} N{<-}NFP P [1] @Converts prime-powers vector {omega} back to a number [2] N{<-}{times}/P/PRIMES {del} {del} P{<-}PFN N [1] @Converts number {omega} to a prime-powers vector [2] P{<-}PRIMES FRCOUNT ALLFACTORS N {del}

Appended below is a fast utility for finding prime factors on APL*PLUS II/386 and +III/Windows. It mimics monadic q: in J. Some examples:

FACTOR 12 2 2 3 FACTOR 1 33 42 100035 0 0 0 0 0 0 0 3 11 0 0 0 0 0 2 3 7 0 0 0 0 3 3 3 3 5 13 19 {times}/1{max}FACTOR 1 33 42 100035 1 33 42 100035 FACTOR 0 DOMAIN ERROR @ the domain is positive integers <2*31

The speed of the factoring code is respectable (for example, on my 486/66 running +II/386 v5.2, factoring 130000+{iota}5000 takes 0.212 secs, compared with 4.12 secs for J2.06), but the PRIMESTO function (another FastFn, included in one of my March 14 postings) can't hold a candle to J's prime finder. For example, FACTOR takes 0.18 secs to factor (2*31)-1 (the largest 32-bit integer, itself a prime), whereas J does it in 0.034 secs. Almost all the time is spent in PRIMESTO, which takes 0.174 secs to find primes up to 46,340. In contrast, finding these primes with p: in J takes 0.0242 secs, seven times faster. I should have used a sieve instead of Algorithm P in PRIMESTO.

Jim{del} Z{<-}P FACTOR A;S;#IO [1] @Returns the prime factors of {omega} [2] @ The optional left arg is a list of primes up to at least {floor}({+ +}{max}/A)*.5 [3] @ The result has shape ({rho}A),n and each row holds a list of prime [4] @ factors. Zero is used as fill to pad out short rows. [5] @ [6] #IO{<-}1 [7] L1:{->}(''{rho}S{<-}(#STPTR'Z A P')#CALL FACTOR{delta}OBJ)/L2 [8] Z{<-}((''{rho}{rho}Z),2{pick}S){take}Z @chop off all-zero columns at {+ +}the end [9] Z{<-}(({rho}A),2{pick}S){rho}Z @give it the right shape [10] {->}0 [11] L2:{->}(~(7=''{rho}S)^0=#NC'P')/L3@If caller didn't provide primes, [12] P{<-}PRIMESTO {floor}(0{max}{max}/,A)*.5@ generate them [13] {->}L1@ try again [14] L3:#ERROR(5 7 8{iota}1{take}S){pick}'RANK ERROR' 'VALUE ERROR' 'WS {+ +}FULL' 'DOMAIN ERROR' {del} {del}. FACTOR{delta}OBJ{<-}1345730611 858915563 {neg}1990005629 171613501{+ +}2 35931273 610044262 1284072986 {neg}326421980 34031046 35603910 37176774 {+ +}411078 1983942 3556806 1620070 4557158 407210342 809863526 1212532582 {neg{+ +}}1201274880 {neg}397410303 0 2021666392 1968722091 {neg}339161852 60955008{+ +}4 409832806 178278 59472 777519104 1351448704 {neg}855345832 {neg}16454712{+ +} 1919296596 612139129 762577436 472138950 112748034 15224832 1476395008 16{+ +}89813038 1358954495 {neg}855345832 {neg}16454712 1936073812 87909896 {neg}{+ +}347376640 611093305 812485962 178278 59472 777519104 {neg}13256576 1481703{+ +}423 {neg}926088075 1425999083 510813732 1243901067 36994432 2105542773 175{+ +}440181 28515819 78844139 95620331 129173739 145949931 {neg}1070399253 {neg{+ +}}202783862 {neg}973078528 {neg}972946363 {neg}972946107 {neg}1962801339 11{+ +}66614597 340117264 32 {neg}697244681 1962998403 138774993 1711304077 13421{+ +}78232 232 {neg}2144446464 {neg}82248 1968722175 {neg}339161852 609550084 {{+ +}neg}1951698330 {neg}1958075284 1300958333 {neg}54512888 1166781427 4828392{+ +}4 {neg}1957804663 2106133629 541428562 50520257 1166615621 944081750 50520{+ +}257 1166621765 {neg}1983892646 1569414725 860416804 {neg}137851950 1641759{+ +}0 104534132 {neg}964486018 1002539780 {neg}411936139 378212587 {neg}947710{+ +}071 81156 125042688 260691435 {neg}1962621053 1383959495 1161545515 {neg}1{+ +}996261794 2097372741 79921998 {neg}1957528183 1564163189 {neg}1961659562 8{+ +}1155 {neg}1945174016 {neg}258 {neg}930360972 1569436139 49004894 {neg}9979{+ +}98549 50018 960317 105017 @ Adjust the version signature for +II/III: {del}: FACTOR{delta}OBJ[#IO]{<-}(1345730611 2000042035)[1+1{epsilon}{+ +}#SYSID #SS'Win']

Below is an APL model of the assembler function. For the 130000+{iota}5000 test case, it runs about 660 times slower than the assembler routine. (And this understates the APL:asm ratio because 35% of the execution time for the FastFn is spent doing the two APL lines [8] and [9].) It's a pity that on PCs we can't compile scalar-iterative programs like this to produce something that runs nearly as fast as the hand-written assembler code. (RBE: Tell me I'm wrong!)

{del} Z{<-}P FACTOR{delta}APL A;C;I;J;K;N;R;S;KMAX [1] @APL model for the FastFn version of FACTOR [2] {->}(0{/=}#NC'P')/L2 [3] P{<-}PRIMESTO {floor}(0{max}{max}/,A)*.5 [4] L2:S{<-}{rho}A [5] A{<-},A [6] Z{<-}(({rho}A),32){rho}0 @no integer has >32 factors [7] KMAX{<-}0 [8] I{<-}K{<-}1 [9] {->}L1 @dive on in [10] [11] L3:C{<-}N @save a reference copy of the original number [12] R{<-}P[J]|N & N{<-}{floor}N{divide}P[J] @remainder and quotient (1 {+ +}machine instruction!) [13] {->}(R=0)/L10 @jump if division was even [14] {->}(N{<=}P[J])/L8 @jump if quotient is small [15] J{<-}J+1 [16] N{<-}C @restore original pre-divide number [17] L6:{->}(J{<=}{rho}P)/L3 @try next prime unless we've run out [18] {->}L8 [19] [20] @ The prime went into the number evenly. Record P[J] as a factor [21] L10:Z[I;K]{<-}P[J] [22] K{<-}K+1 [23] {->}(N=1)/L5 @do next A[I] if we're down to 1 [24] {->}L3 @else, try P[J] again [25] [26] @ The division wasn't even, but the quotient is less than the prime. [27] @ (Or we have run out of primes.) [28] @ The dividend must be prime, and we're done factoring this number. [29] L8:Z[I;K]{<-}C [30] K{<-}K+1 [31] [32] @ Go on to the next A[I] [33] L5:KMAX{<-}KMAX{max}K-1 @remember last column of Z we used [34] K{<-}1 [35] I{<-}I+1 [36] L1:J{<-}1 [37] {->}(I>{rho}A)/L4 [38] N{<-}A[I] @the next number to factor [39] #ERROR(N<1)/'DOMAIN ERROR' [40] {->}(N=1)/L5 @no factors if 1 [41] C{<-}N @put it in the ref copy in case P is empty [42] {->}L6 [43] L4: [44] [45] Z{<-}(({rho}A),KMAX){take}Z @chop off unused rows [46] Z{<-}(S,KMAX){rho}Z @restore original dimensions {del}

Roger Hui wrote:

With a few other improvements to q:, the time for q: v is now down to 0.1152, a factor of 15.1 improvement over J3.01, but still short of the factor of 19.4 achieved by the hand-coded assembler version.

Impressive! The remaining difference from assembler is practically insignificant compared to the improvement so far. In achieving a 15.1x speedup you've gone from 1.743 secs down to 0.115 secs; achieving a 19.4 ratio would involve saving only 0.025 additional secs, a mere 1.5% of the original figure. (As Clark Wiedmann used to remind me, speed ratios can be a deceptive measure of time savings.)

I wrote:

Below is an APL model of the assembler function. For the 130000+{iota}5000 test case, it runs about 660 times slower than the assembler routine.

Roger wrote:

I was curious about the analogous ratio between the primitive q: and a J model of it (i.e. the ratio between hand-coded C and J), ...

For v=.130000+i.5000, qco v takes 10.245 seconds, compared to 0.1152 seconds for q: v; in this case the hand-coded C is faster by a factor of 89.

My FACTOR{delta}APL function was a direct translation of the assembler program to APL using only scalar operations, and it was provided mostly to document the algorithm used by FACTOR. Roger's qco program uses vector operations, speeding it up greatly over pure scalar code, so it's not really comparable to FACTOR{delta}APL. Here's an APL version of qco:

{del}Z{<-}QCO N;P [1] @APL version of Roger Hui's qco function, which models J's q: primitive [2] P{<-}PRIMESTO {floor}(0{max}{max}/,N)*.5 [3] Z{<-}{disclose}({enclose}P)QA{each}N {del} {del} Z{<-}P QA N [1] @Factors a number {omega} given primes list {alpha} [2] Z{<-}P{<-}(0=P|N)/P [3] L1:{->}(0={rho}P)/L2 [4] N{<-}{floor}N{divide}{times}/P [5] Z{<-}Z,P{<-}(0=P|N)/P [6] {->}L1 [7] L2:Z{<-}Z,N~1 [8] Z{<-}Z[{gradeup}Z] {del}

(Note: QCO uses APL2-style or APL*PLUS Evlevel 2 nested array operations. The {disclose} on line [3] is a mix-type disclose, not first.)

For the test case of 130000+{iota}5000, QCO runs in 10.78 secs (486/66, +II/386 v5.2), which I estimate to be about 40 times slower than Roger's latest q: . (But perhaps Roger could time QCO on +III using the same computer as for the J3.02x timings.) It is this factor of 40, not the 660, which should be compared with J's factor of 89.

Jim

Home Page