From: jimw@chilton.com (Jim Weigang)
Newsgroups: comp.lang.apl
Date: Tue, 28 Nov 1995 04:15:23 GMT
Subject: IEEE conversion

Tom Chwastyk wrote on 22 Nov 1995:

Does anyone have a small, fast utility for APL III to convert IEEE floats (32 bit) to IEEE double (64 bit)?

I sent him a #CALL assembler program to do the job, and he replied on 27 Nov 1995:

(4) The slightly more complex "DfromSok" runs about 13.2K conversions/second (13 times slower than Jim's) by using zero setting and subscripting in place of expansion.

I'm really surprised that (4), which explicitly doesn't use the FPU, is within about an order of magnitude of (1) [the assembler code].

And I replied with the following posting to comp.lang.apl:

* * * *

The execution time of a O(N) [i.e., linear] program can be modeled as:

     RunTime = SetupTime + (N {times} PerElementTime)

where N is the number of elements in the argument. You can estimate the parameters by making two timings, one with small N and one with large N. If TS and TL are the execution times and NS and NL are the N values, the parameters can be computed using:

     SetupTime PerElementTime {<-} (TS,TL) {domino} 1,[1.5] NS,NL

But these parameters are usually so small that it's more convenient to discuss their reciprocals, which can be interpreted as calls-per-second (overhead) and elements-per-second (flat-out processing speed).

On my 486/66 using APL*PLUS III v1.2 and Windows 3.1, Tom's DfromSok function takes about 0.0034 secs to with N=1 and about 0.65 secs with N=10,000. My F64TO32 function (appended below) takes about 0.0015 secs with N=1 and about 0.043 secs with N=100,000. So the per-second parameters for these two functions are:

                      Calls/Sec        Elts/Sec 

         DfromSok        300             15,464

         F64TO32         667          2,409,614

In a particular application, the speed ratio between these two functions may be anything from 2.2 (the calls/sec ratio) to 156 (the elts/sec ratio), depending on the size of the arguments. The speeds Tom reported are consistent with an argument size of about 300 elements.

Jim

     {del} Z{<-}F32TO64 C;T
[1]    @Converts IEEE 32-bit reals {omega} to 64-bit reals
[2]    @ The argument is a character vector containing the 32-bit reals, in
[3]    @   standard Intel byte-reversed order (i.e., 1{take}C holds the low {+
   +}8 bits
[4]    @   of the first number).  The result is a numeric vector.
[5]    @ This program can be used on either APL*PLUS II/386 or III/Windows
[6]    @
[7]    T{<-}0 858915563 {neg}1992758141 1714562148 35931273 610044262 {neg}{+
   +}957576422 {neg}972945595 {neg}972939451 {neg}973076923 1711283781 {+
   +}1711282360 1711293833 1712866697 3163591 28861952 15224832 1476395008 {+
   +}{neg}1132953554 74799184 82561229 1009014015 841247883 {neg}949558309 {+
   +}{neg}51643 1714847197 2134271361 1717990656 3556807 914217216 {+
   +}912621926 {neg}2090467265 1967076989 409832784 178278 59472 777519104 {+
   +}{neg}9324416 1481703423 {neg}926088075 1425999083 628243492 841247883 {+
   +}1867136 2105544565 292880669 606619019 {neg}351898365 {neg}352210148 {+
   +}{neg}352079086 {neg}351948018 {neg}351816950 {neg}351751418 856470274 {+
   +}{neg}339506496 71681624 88458755 541428481 {neg}1996298047 1166610501 {+
   +}122013192 7179521 243814 59472 777519104 {neg}16664448 1481703423 {neg}{+
   +}926088075 1425999083 {neg}1183695836 841247883 {neg}1960544885 {+
   +}1300958333 {neg}653466872 74878214 2139955165 871686664 {neg}{+
   +}2082284608 12794052 951127 120349
[8]    T[#IO]{<-}(1345730611 2000042035)[1+1{epsilon}#SYSID #SS'Win']
[9]    {->}(T{<-}''{rho}(#STPTR'Z C')#CALL T){drop}0
[10]   #ERROR(3 5 7 8 12{iota}T){pick}'LENGTH ERROR' 'RANK ERROR' 'VALUE {+
   +}ERROR' 'WS FULL' 'MATH PROCESSOR ABSENT' 'DOMAIN ERROR'
[11]   @ Copyright 1995 by Jim Weigang
     {del}

The loop in the assembler code is:

   L1:
    FLD DWORD PTR [ESI]     ; load C[i]
    LEA ESI,[ESI+4]         ; point to next element
    FSTP QWORD PTR [EDI]    ; store into Z[i]
    LEA EDI,[EDI+8]         ; point to next element
    LOOP L1

[I have since discovered that on my 80486, plain old ADD instructions run nearly twice as fast as the LEA instructions used above.]


Home Page