APL-ASCII Workspace Transfer

by Jim Weigang


This article was submitted for publication to APL Quote Quad in December 1995 and slightly revised in March 1997. Copyright © 1995 by Jim Weigang. All rights reserved.


The APLASCII transliteration workspace, originally designed to allow APL statements and programs to be included in e-mail messages, has evolved into a full-fledged workspace transfer system. The transfer files produced by this system are human-readable, can be edited using ordinary non-APL text editors, and can be transferred easily using e-mail.


APLASCII and WSIS

My primary goal in designing the APLASCII transliteration scheme [1] was to produce output that could be understood easily by human readers. Compact representation and machine efficiency were deliberately given a lower priority. One reason I did this was because I thought a different representation, designed for efficient machine processing, would be appropriate for workspace exchange. I intended to develop a new Unicode-based version of the Workspace Interchange Standard (WSIS) and port the package to all modern APL systems. In this new WSIS format, the exporting APL's #AV would be described by its Unicode indices, allowing all characters (including non-APL characters such as @ # % and &, lowercase letters, line-drawing and national language characters) to be represented.

However, this plan was motivated by a faulty assumption: I thought WSIS could not represent non-APL characters. This is true for the first version of the standard, WSIS0 [2], but when I found descriptions of the later versions (WSIS1, in the ISO-APL standard [3], and WSIS2, in the draft extended APL standard [4]), I found that these versions actually could represent non-APL characters. My surprise turned to dismay, though, when I read how this was accomplished. The exporting APL's #AV is specified using characters drawn from any registered ISO character set. This is a very flexible scheme, to the point of being too flexible. Each non-APL character of #AV could legitimately be represented using the codes of a different ISO character set. Because the APL character set is described as "number 68", there are possibly 67 other character sets that a full- blown WSIS package must be aware of. Furthermore, new character sets are undoubtedly introduced occasionally, so a WSIS package would never be complete. And I had no success trying to find a listing of any of these character sets at the University of Massachusetts library.

These revelations gave me new insight as to why the WSIS standard isn't more widely implemented and why Unicode was developed, and it dealt a crippling blow to my plans. While I was willing to propose a new format to extend WSIS0 capabilities, I wasn't so enthusiastic about revising something that wasn't exactly broken. Nor was I enthusiastic about implementing a WSIS1 or WSIS2 package. Not knowing what to do, I shelved the project.


APLASCII Workspace Transfer

Meanwhile, other people were finding my APLASCII transliteration programs useful, and they wanted a workspace transfer capability. Bob Bernecky, who was using APLASCII to translate input to his APL compiler, told me that he had looked in vain for a WSTOFILE function that he knew just had to be in the workspace. I mentioned my plans for an enhanced WSIS, and Bob pointed out that one advantage of the APLASCII format is that he can easily edit the transfer files from outside APL. Dick Holt wrote a set of add-on programs that used APLASCII to transfer functions and variables between APL*PLUS /PC and APL.68000 on a Macintosh. These programs served their purpose, but they weren't written to handle nested arrays or arrays containing arbitrary characters, and the transfer file was designed for machine processing, not human viewing or editing.

These users convinced me that APLASCII was going to be used for workspace transfer, one way or another. I decided that the capability should be built into the APLASCII workspace, rather than provided as an add-on, and it should meet the following objectives:

  1. The format should be fairly obvious, even to APLers who haven't seen it before.

  2. The process should be fully reversible, allowing any program or variable to be transferred between compatible APL systems without alteration of its value.

  3. When data is transferred between different APL systems, characters that are available on both systems (e.g., line-drawing characters) should be translated correctly. Other characters should be identified in some way for manual translation by the receiving party.


Format for Variable Listings

The existing transliteration programs already provided a format for transferring functions, essentially listing the function in del editor format. A similar mechanism was needed for variables. The format I adopted was based on expressions for defining variables. For example, here is the transfer form for two variables, NVEC and CMAT:

{del}. NVEC{<-}2 4 6 8

{del}. CMAT {<-} 3 5 {rho} 'ONE TWO THREE'

An escape phrase marks the start of a listing in a message, serving a role similar to the {del} in a function definition. The escape was initially chosen as {<-} (suggestive of assignment) but was later changed to del period ({del}.). (More about this below.) The escape phrase is followed by the name of the variable, an assignment arrow, and the value. Spaces may be inserted freely. If the value is a scalar or non-singleton vector, it is written directly, without an explicit shape. Otherwise, the assignment arrow is followed by the shape of the array, a rho symbol ({rho}), and the raveled value. The variable definition is conceptually one line, but it may be wrapped into multiple lines using the same {+ +} convention employed when listing functions. (A line that ends with {+ is continued on the next line, which begins with +}.)

Nested, heterogeneous, or boxed arrays are represented recursively by surrounding the individual items with parentheses. For example:

{del}. NEST {<-} (1 2 3) ((4 5) ('SIX') (7))

This describes a two-item vector whose first item is 1 2 3 and whose second item is a three-item vector containing the arrays 4 5, 'SIX', and 7.

The appropriate format for one-item nested arrays was unclear at first. Initially, they were represented in a manner consistent with other nested arrays: parentheses surrounding a value indicated its nesting, so a nested scalar containing the array 1 2 3 was represented as N1{<-}(1 2 3). With use, however, it soon became apparent that this format was a human engineering blunder. Olivier Lefevre pointed out that, except for the representation of one-item nested arrays, the APLASCII format was a valid APL2 expression for defining the array. But parentheses surrounding a solitary expression do not result in nesting in APL2. Having this one difference between the two formats was confusing and led to some non-obvious APLASCII representations, such as V1{<-}1{rho}(1 2 3) for a one-item nested vector. Upon re-examination, the original reasons for using parentheses did not seem compelling [5], so the format for a one-item nested array was changed to be enclose ({enclose}) followed by the contents of the item, as in:

{del}. N1 {<-} {enclose}1 2 3

A one-element nested vector is represented as V1{<-}1{rho}{enclose}1 2 3.

The raveled value in the transfer form is never an empty array. If the shape describes an empty array, the value is a single item, the prototype for the array. For example:

{del}. EMPTY {<-} 3 0 4 {rho} {enclose} 3{rho}0 0 0

The one exception to this rule is that an empty character vector may be represented as either '' or 0{rho}' '.

APL2 users should note that, although this format is a valid APL2 expression, not all APL2 expressions are valid APLASCII transfer forms. The APLASCII notation requires parentheses around nested items that are simple scalars or character vectors. So, while you can write V{<-}1 '2' 'THREE' in APL2, the APLASCII format requires V{<-}(1) ('2') ('THREE').

Because these transfer forms are extracted using a program, not the execute ({execute}) primitive, they can be used to transfer nested arrays between APL2- style and Sharp APL systems, which have a different philosophy regarding notation and the enclosure of simple scalars. Users should be aware that Sharp APL may export values such as {enclose}3 and {enclose}'A', which represent boxed simple scalars. On APL2, the nesting of these objects will be lost, resulting in the simple scalars 3 and 'A'.

This format is not a particularly compact representation for heterogeneous arrays (simple arrays containing both character and numeric data). For example, the array V{<-}3,'TEST' is represented as:

{del}. V{<-}(3)('T')('E')('S')('T')

Although far from optimal, this is actually more compact than WSIS format, and, except for keyword expansion, it may require less space than the internal representation of the array. For example, the array 3,(999{rho}'A') occupies 10,024 bytes of storage in APL*PLUS II. The WSIS representation is 7,014 bytes in size, and the APLASCII form is 5,348 bytes.


Workspace Transfer

The INSERTFNS function, which inserts function listings into a message, was enhanced to allow variable definitions to be inserted as well. As before, a line in the message that begins with {del} indicates an insertion point. INSERTFNS examines the #NC of the name following the {del} to decide whether to insert a function or variable listing. Thus, INSERTFNS'{del}',#NL 2 3 captures the definition of all functions and variables in the workspace. The DEFINEFNS function, which defines functions from their listings in a message, was extended to recognize and extract variable definitions.

Two functions, DUMPWS and LOADWS, are provided for writing and reading entire workspaces. Essentially, DUMPWS writes the output from INSERTFNS'{del}',#NL 2 3 to a file, but it also includes workspace-related quad variables and omits objects that are part of APLASCII. (And it writes comments giving the name of the workspace and originating APL system.) LOADWS reads a file and passes it to DEFINEFNS. In some versions, these functions work with one object at a time to avoid WS FULL errors.


Representing Arbitrary Characters

Characters that aren't part of the APL or printable ASCII character set are represented by keywords of the form "{U+xxxx}", where xxxx is the hexadecimal Unicode index for the character. (U+xxxx is the standard notation for identifying Unicode characters.) These {U+xxxx} keywords may be used for APL symbols, too, if the English keywords are a problem or if a unique representation for each symbol is desired. (They've been added to the translate tables.)

As a backup measure, the program that builds the translation tables automatically provides keywords for any characters in #AV that don't have keywords specified for them. (In practice, this is done only for unused elements of #AV which contain a fill symbol.) These generated keywords have the form "{avN}", where N is the zero-origin decimal #AV index. (For example, {av31}.) This convention permits arbitrary character values to be exchanged between APL systems having identical #AVs. However, it is not useful for exchanging data between APLs with different #AVs. In general, "binary" character values, in which characters are used to represent integers in the range 0 to 255, are not portable between different APL systems. (For example, it is difficult to see how any transfer program could decide which variables should be translated so as to preserve #AV indices and which should be translated to preserve character appearance.) Such values are best transferred by converting them to numeric indices before transfer and restoring them to character form afterwards.

The newline character, which performs a carriage return and linefeed on output, is handled specially because it is a "logical" character rather than an ASCII character. On different systems it may be represented as carriage return, linefeed, or some other character. In variables, newline is represented by the keyword "{nl}". Internally, the transliteration programs must handle it carefully because newline is used as a line delimiter in the argument to INSERTFNS and DEFINEFNS.


Expressions

Lee Dickey suggested that the various versions of the APLASCII workspace for different APLs be distributed in the form of a single transliterated master file together with a set of small, system-dependent "instantiator" workspaces containing enough code to read the master file. I told him that I thought this could be done if DEFINEFNS could execute APL statements contained within the file, in order to prepare the workspace for the particular APL system being used. Adding the executable-statement feature to DEFINEFNS was simple. Initially, I used execute ({execute}) as the escape character to mark statements, but I realized this was a poor choice since {execute} can occur at the beginning of a valid expression. So I reconsidered both the {<-} and {execute} escape characters and came up with the following escape phrases:

      {del}    - function  
      {del}.   - variable  
      {del}:   - statement 
      {del}-   - marker

With this change, the only reserved character is del ({del}) as the first non-blank character on a line. Del by itself denotes the beginning or end of a function, del followed by a period indicates a variable definition, and del followed by a colon indicates a statement to be executed. (The period and colon suffixes come from J, which uses this notation to multiply the number of "symbols" available for operation names.) In addition to del, the phrase "[0]" is recognized as an alternative method of marking the beginning of a function definition.

Del followed by a minus sign indicates a "marker", a feature similar to pseudovariables in WSIS. Markers are currently used to indicate the start and end of a workspace and give the name of the originating APL system. A workspace listing is surrounded by the markers:

{del}---- Workspace: ws name

{del}---- End of workspace

The originating APL system is identified by a marker such as:

{del}---- Source APL system: APL*PLUS II/386 with Evlevel=1

Other markers will be added in the future to support enhancements such as component file transfer.

The ability to execute statements means that text files (such as e-mail messages) can be thought of as "scripts" or programs. One difference between APLASCII scripts and scripts in other languages is that, by default, text is interpreted as comments and is ignored. This allows any e-mail message or news posting to be used as an APLASCII script: ordinary text will be ignored and only specially marked blocks will be interpreted as code, data, or executable statements. This is generally more convenient than having to preface all descriptive text with comment markers.


Efficiency

One reason I planned to write a separate WSIS package was to be able to produce compact files. However, to my surprise, a comparison of actual WSIS and APLASCII files shows that there is not much size difference between the two, either in compressed or uncompressed form.

Two workspaces were saved to file in three formats: WSIS (produced using the SLT workspace supplied with APL*PLUS), APLASCII (using the DUMPWS function), and binary (using the interpreter's )SAVE command). The first workspace, A2A, consisted of 73 functions and variables; the second, A2ASRC, consisted of 195 functions and variables. The following table gives the sizes of the transfer files before and after compression (using pkzip 2.04g):

                                  Workspace

                 Format    |    A2A      A2ASRC
   ------------------------+---------------------
   Uncompressed  WSIS      |  100,468   452,768
                 APLASCII  |  115,819   442,823
                 )SAVE     |  101,694   412,250
                           |
   Compressed    WSIS      |   28,649   113,937
                 APLASCII  |   32,347   130,071
                 )SAVE     |   40,264   159,302

For these workspaces, the APLASCII files are no more than 15% larger than the WSIS files. Although the {keywords} used in the APLASCII form are much larger than the single-character representation used in WSIS, the APLASCII form saves space by representing functions as newline-delimited vectors instead of matrices with numerous trailing blanks. Compression recovers most of the storage occupied by trailing blanks in the WSIS form, but it also greatly compacts the keywords. The result is that both types of files were reduced to about 27% of their original size by compression.

In execution speed, however, there is no competition. Keyword translation is much slower than the character translation performed by WSIS. The following table gives the execution time in seconds to write and read transfer files on a 66 MHz 80486 running APL*PLUS II/386 v5.2:

                           Workspace

            Method    |   A2A   A2ASRC
   -------------------+----------------
   Writing  WSIS      |   1.3     3.7
            APLASCII  |  22.0    58.0
            )SAVE     |   0.6     1.2
                      |
   Reading  WSIS      |   3.9     9.0
            APLASCII  |  20.0    41.0
            )LOAD     |   0.2     0.6

If workspace transfer is a relatively uncommon occurrence and performance isn't an issue, the APLASCII execution times shouldn't be a problem. But if data is moved frequently and speedy execution is important, transliteration is clearly not the fastest solution.


Discussion

In the September 1995 issue of APL Quote Quad, James Boyd proposed using MIME (the Multipurpose Internet Mail Extensions) to represent APL symbols on the Internet [6]. While this is certainly possible, there are several problems:


1. There is no universally agreed upon 8-bit character set that contains all APL symbols, including overstrikes. (The ISO APL character set includes only the "strikes", not the overstrikes, and it doesn't include all ASCII symbols.) Many layouts have been proposed; none has seen widespread acceptance for an extended period of time. There is one fundamental problem with an 8-bit APL character set: it's not big enough. Some 8-bit codes cannot be used by certain computer systems (e.g., 160 in Windows and 128-159 in some communications software). Together with ASCII and APL symbols, there are so many national language and line-drawing characters that they cannot all fit in the remaining positions. The result is that some users will not be able to represent certain programs in a "universal" 8-bit character set.

By providing a way of representing Unicode characters (a 16-bit character set), APLASCII allows all characters to be represented. Any program or data can be transferred without loss between compatible APL systems. Characters are mapped to the extent possible when moving between APLs with different #AVs. And this is done using only printable ASCII characters in the transfer files.


2. The technique Boyd proposed, of using the "mpack" program to represent a message containing APL symbols, would result in text that is utterly incomprehensible until it is decoded. For example, a short APL program would look something like this:

ICAgIOwgWgZEIE1BVFJJRlkgVjtJO1c7TTuV
SU8NClsxXSAgIKZGb3JtcyBhIG1hdHJpeCBm
cm9tIOAtZGVsaW1pdGVkIGxpc3Qgb2YgbmFt
ZXMgFw0KWzJdICAglUlPBjENClszXSAgIBoo
MTz7+1oGVikvMA0KWzRdICAg9SgwPZVOQydE
JykvJ0QGJycgJycnDQpbNV0gICBWBihJBn5E
BlbuRCkvVg0KWzZdICAgSQYoSS/9MRkxLEQp
L+L7Vg0KWzddICAgTQYwjY0vVwYoMRlJLDEr
+1YpLUkNCls4XSAgIFoGKCj7VyksTSn7KCxX
+C7y4k0pXFYNCiAgICDsDQoa

A better alternative would be to use MIME's "quoted printable" format. But even in this format, APL symbols would still be incomprehensible:

    =EC Z=06D MATRIFY V;I;W;M;=95IO
[1]   =A6Forms a matrix from =E0-delimited list=
 of names =17
[2]   =95IO=061
[3]   =1A(1<=FB=FBZ=06V)/0
[4]   =F5(0==95NC'D')/'D=06'' '''
[5]   V=06(I=06~D=06V=EED)/V
[6]   I=06(I/=FD1=191,D)/=E2=FBV
[7]   M=060=8D=8D/W=06(1=19I,1+=FBV)-I
[8]   Z=06((=FBW),M)=FB(,W=F8.=F2=E2M)\V
    =EC

New users and people casually visiting APL areas on the Internet wouldn't be able to make sense of APL statements represented in this way. To read the APL, they would first have to find and set up software (written for whatever computer they're using) to translate the message, and find an APL character set that can be used by the software. This "setup" effort is likely to discourage people from joining in the APL discussions.

APLASCII avoids this problem by using a representation of APL characters that is meaningful to humans. It isn't MIME standard, and it's not particularly compact, but you can easily identify APL symbols without having to run the message through a translation program.

    {del} Z{<-}D MATRIFY V;I;W;M;#IO
[1]   @Forms a matrix from {alpha}-delimited list{+
    +} of names {omega}
[2]   #IO{<-}1
[3]   {->}(1<{rho}{rho}Z{<-}V)/0
[4]   {execute}(0=#NC'D')/'D{<-}'' '''
[5]   V{<-}(I{<-}~D{<-}V{epsilon}D)/V
[6]   I{<-}(I/{neg}1{drop}1,D)/{iota}{rho}V
[7]   M{<-}0{max}{max}/W{<-}(1{drop}I,1+{rho}V)-I
[8]   Z{<-}(({rho}W),M){rho}(,W{jot}.{>=}{iota}M)\V
    {del}

APLASCII also makes it easier to compose APL messages: Short expressions can be entered by hand, without having to use a translation program. And it avoid problems when printing messages: From your e-mail manager, newsreader, or Web browser, messages can be printed in ASCII form, making it unnecessary to configure the program to print APL symbols on your particular brand of printer. Listings in symbolic form can be made from within your APL system after reading the message into APL and untransliterating it.

The bottom line is that getting APL characters displayed and printed can still be a substantial nuisance. Getting a whole slew of programs on a wide variety of platforms to display and print APL symbols is a huge chore. APLASCII provides a way of avoiding this headache, allowing users to see things in meaningful plain ASCII outside their APL system, and providing software that makes it easy to restore the APL characters from within an APL environment.


3. With the enhancements described in this paper, APLASCII is not only a transliteration technique, it is also a workspace exchange format. Adopting a standard 8-bit APL character set and MIME format would not by itself allow the exchange of programs and data between APL systems. (And the compromises inherent in any 8-bit character set would make it an unattractive medium for workspace exchange.) Using WSIS as the workspace exchange format, even with a viewable character set, would virtually require the use of mpack encoding. (Because it was designed for machine processing, not human viewing, WSIS format does not include any line breaks.) APLASCII provides an effective alternative that allows APL source code and data to be included intelligibly in e-mail and news postings.


Conclusions

The APLASCII workspace greatly simplifies the process of transferring programs and data between different APL systems. The transfer files, being ordinary ASCII text, can be edited and printed from outside APL and can be directly included in e-mail, news postings, or Web pages. Currently, APLASCII workspaces are available for the following APL systems:

- APL*PLUS /PC
- APL*PLUS II/386
- APL*PLUS III/Windows
- IBM APL2
- Dyalog APL
- APL.68000 for the Macintosh
- Sharp APL and ISI APL

In the spirit of the Internet, this software is available free of charge. It can be used, imbedded in applications, modified, and distributed practically without restriction. APL vendors are being encouraged to include the workspace with their APL systems. Users with Internet access can download the software from:

ftp://archive.uwaterloo.ca/languages/apl/workspaces/aplascii/

The readme file in this directory has installation instructions. The software is also available on the SIGAPL 1995 Software Exchange disks. Contact Dick Holt (dholt@capaccess.org) for more information. For the latest information about the software, see my Web pages at:

http://www.chilton.com/~jimw

Programmers who use other languages are taking advantage of the global communication provided by the Internet to collectively advance the level of their programming resources. I believe APLASCII is a useful first step in allowing APL programmers to do likewise.


References

[1] Weigang, Jim. "APL-ASCII Transliteration." APL Quote Quad, Vol. 25, No. 3, March 1995.

[2] Cartwright, Dana E. "The Workspace Interchange Standard." APL Quote Quad, Vol. 8, No. 2, December 1977. (This describes WSIS0.)

[3] Programming Language APL, ISO 8485, 1989. (WSIS1 is described in Section 17.)

[4] Eklof, Mark D., and McDonnell, Eugene, ed. Programming Language APL, Extended (Committee Draft 1, January 6, 1993). (WSIS2 is described on pages 263-283.)

[5] Weigang, Jim. "APLASCII format of singleton nested arrays." Posted in comp.lang.apl on October 9, 1995.

[6] Boyd, James H. "APL on the Internet." APL Quote Quad, Vol. 26, No. 1, September 1995.



Home Page