This article originally appeared in APL Quote Quad, Vol. 25, No. 3, March 1995.
Note: In this Web page, APL symbols are represented by keywords such as {iota} and {rho}. Literal keyword phrases are represented by keywords with doubled braces, as in {{iota}} and {{rho}}. So "{rho}" represents the APL symbol rho, but "{{rho}}" represents the phrase "{rho}". The printed version of the article uses proper APL symbols and keywords without doubled braces.
In November of 1993, I sent an e-mail message to a colleague giving him instructions for updating an APL application I had written. The message told him to execute the statement ]UMAKE M{delta}FILE. He wrote back saying that he tried the statement, but it produced an error message saying that MFILE could not be found. What had happened? The problem is that APL symbols such as {delta} cannot, in general, be included directly in e-mail. Depending on where they occur in #AV, the symbols may be transmitted as control characters and discarded by the communications software. (This is what happened to the delta in my message.) And even if a symbol does get through unscathed, it may not be displayed correctly. A symbol that starts out as #AV[i] on the sender's computer will be displayed as whatever character is found in #AV[i] (or possibly #AV[128|i]) on the recipient's computer, not necessarily the same symbol as on the sender's computer.
I sent the colleague another message telling him that what he really needed to type was ]UMAKE M{{delta}}FILE, where {{delta}} was the APL symbol. I also began to think about how I could modify the ]SEND command that transmits my e-mail so as to avoid similar problems in the future. I realized that a scheme for representing APL symbols in ASCII would have other uses. For example, when an application traps and logs an error, any APL symbols in the error message should be converted to ASCII so users can print the message or read it over the telephone.
My first inclination was to use pound-sign keywords similar to those used in the APL*PLUS /PC system [1]. However, while this produces acceptable results in statements such as "#iota #rho N," there are problems in other cases. For example, is M{delta}FILE represented as M#deltaFILE? If so, how do you determine where the keyword ends? (For example, if {del} is #del, M{del}tada will be translated as M#deltada, but when restored to APL it may become M{delta}da.) And if M{delta}FILE is translated as M#delta FILE, how do you know this is one name and not two? How would the two names M{delta} FILE be represented? These and other difficulties led me to give up on pound-sign keywords and use braces around the keywords as I did, practically without thinking, in my second e-mail message. The braces made it unnecessary to insert or remove spaces; M{delta}FILE obviously should be represented as M{{delta}}FILE. I modified my ]SEND program to translate APL characters to these phrases and went on to other projects.
Shortly after this, I learned how to access USENET newsgroups and began to follow the discussions on comp.lang.apl, the APL newsgroup. Most of the postings were about the language J, and for an obvious reason. Because J uses only ASCII characters, its users have no difficulty including J programs in news postings. APLers were not so lucky. On December 18, 1993, Mike Kent posted a message in which he provided a nested-array program to translate APL symbols to ASCII phrases. Phrases enclosed in braces, at that. On December 24, I posted a message seconding his proposal and including the set of keywords I was using at the time. I mentioned my goals in designing a transliteration scheme:
It should be obvious even to APLers who haven't seen it before. When people begin following comp.lang.apl or get an e-mail message describing some APL operations, they shouldn't have to obtain a workspace of translation software to decode the message. They might be using a version of APL for which the software is not available, or they might not even have access to an APL interpreter.
The process should be completely reversible. Translation to ASCII and then back to APL should not alter the original input, even if that input contains some of the phrases that are used as keywords, and it should not add or remove spaces, which can be significant in character constants.
It should be attractive. Although this goal is rather subjective and fuzzy, it is fairly easy to identify instances of unattractiveness. The scheme should avoid these.
I got no response to the message, probably because of unfortunate timing. News postings are retired after a few days, and people who were on vacation the week after Christmas may have missed the posting. (Caveat poster!)
On February 7, 1994, Michael Friendly posed an APL puzzle, asking how to compute a multivariate design matrix. I wrote a program and posted it in transliterated form. While composing the message, I experimented with alternative keywords and found that one set I had proposed, that of using "symbolic" phrases such as {{O\}} for transpose, produced output that was . . . well, ugly. The blizzard of symbols made it hard to see the braces. Using words instead made it much easier to visually parse the phrases.
Michael thanked me for my solution and asked what transliteration software I had used. I told him about my implementation, which used the APL*PLUS TEXTREPL function (an assembler-coded "fastfn") to do the character replacement. This wasn't helpful to Michael because he uses APL2. Although he could have used Mike Kent's program, other people who use non-nested or Sharp-style APLs would not be able to use it because of the nested array operations. So, on February 16, I wrote a pair of transliteration functions in standard non-nested APL and posted them on comp.lang.apl. The APL2ASCII function replaces symbols with keywords, and ASCII2APL restores the symbols. Unlike my previous posting, this one resulted in numerous replies, both via e-mail and in comp.lang.apl. Most people liked the scheme, and many suggestions were made.
The main elements of the APLASCII transliteration are as follows:
Most APL symbols are replaced with brace-enclosed keywords. For example, {iota} becomes {{iota}}. No spaces are added around the braces, so '{iota}{rho}{enlist}' becomes '{{iota}}{{rho}}{{epsilon}}', which is clearly a three-element vector.
An APL symbol may have any number of keywords associated with it. For example, {rho} may be either {{rho}}, {{shape}}, or {{reshape}}. When translating to ASCII any keyword can be used, but obviously, using {{shape}} for monadic {rho} and {{reshape}} for dyadic {rho} is helpful for the human reader. When translating back to APL, all three keywords are mapped to the {rho} symbol.
A few common APL symbols are replaced with single ASCII characters. For example, quad becomes #, so quad-IO is represented as #IO. The other two single-character replacements are {@} for @ and {&} for &. If desired, these symbols can be represented as {{quad}}, {{lamp}}, and {{diamond}}. (The keyword form is helpful when the symbol occurs outside its normal context, as in X='{{lamp}}'.)
Several ASCII characters are transliterated in order to make the process reversible. A pound sign in the input becomes {#} in the output, @ becomes {@}, and & becomes {&}. Braces in the input become {leftbrace} and {rightbrace} in the output. Thus, the (six-character) phrase "@{rho}" is transliterated as "{@}{leftbrace}rho{rightbrace}".
(Note: The preceding paragraph contains no APL symbols, and it appears here exactly as it appears in the printed paper, not following the usual brace-doubling conventions applied elsewhere in the paper. Sheesh!)
Keywords may be written in upper- or lowercase letters, and underscores or hyphens may be inserted for clarity. For example, {{gradeup}}, {{GradeUp}}, {{grade_up}}, and {{GRADE-UP}} are all equivalent.
A few symbols are commonly represented by symbolic keywords. For example, {<-} and {->} may be represented as {{<-}} and {{->}}, but {{is}}, {{gets}}, {{assign}}, and {{goto}} and {{branch}} are available as alternatives. The symbols {/=}, {<=}, and {>=} are commonly represented as {{/=}}, {{<=}}, and {{>=}} because the alternative keywords are either very long or confusingly short. For example, although {{ne}} is recognized for {/=}, the phrase "{{ne}}1" might be misinterpreted as {neg}1.
If unrecognized keywords are encountered when translating back to APL, they are displayed in a warning message and are left intact. Manual translation is required in such cases. For example, although the keyword {{zilde}} could be translated to ({iota}0) on systems without the symbol {zilde}, the replacement would not work in cases such as 1 1{rho}'{zilde}'.
The programs use a character matrix to drive the translation process. This matrix has one or more rows for each keyword. The first column contains the character to be translated, the second column is blank, and remaining columns hold the keyword. Here's a portion of the table:
APL2ASCII looks up APL symbols in the first column and replaces them with the first or second keyword in the list. It uses the first keyword if it thinks the symbol is monadic, or the second keyword (if any) if the symbol is dyadic. A simple algorithm is used to distinguish between monadic and dyadic cases. (It isn't always correct, but the mistakes don't affect the result after translation back to APL.) Using the table shown above, {iota}{rho}{enlist} would be translated to {{iota}}{{shape}}{{enlist}}.
When translating back to APL, the ASCII2APL function looks up keywords in the table and replaces each phrase with the corresponding symbol in the first column. Thus, {{iota}}{{rho}} {{epsilon}}, {{iota}}{{shape}} {{memberof}}, and {{iota}}{{reshape}} {{enlist}} are all translated to {iota}{rho}{enlist}. Rows of the table containing the third or later occurrence of a symbol are used for recognizing alternative keywords generated by other users; they are not used when translating to ASCII.
One consequence of allowing multiple keywords for each symbol is that various name preferences can be accommodated. Some people refer to {basevalue} as "decode," while others prefer the term "base value." The table can include both {{decode}} and {{basevalue}}. A user can identify his preference by moving it above the alternatives in the table; APL2ASCII will then use that keyword when translating to ASCII. Producing a universal translate table does not require agreement on a single set of phrases; it requires only collecting all the alternatives that are in common use. An effort has been made to do just that. If you wish to manually transliterate some APL expressions, you don't need to refer to a table of keywords--you can simply write the symbol or function name between braces. Chances are good that your keyword is already in the list.
Many questions were raised about the transliteration proposal on comp.lang.apl. This section summarizes some of the questions and gives my responses.
On 21 Dec 93, Doug White wrote:
> ...what's wrong with using the standard workspace interchange > system?
The Workspace Interchange Standard (WSIS) does not produce human-readable text. Even uppercase letters in a WSIS file may be represented as unprintable characters, and the format used to represent functions (raveled #CR matrices) is designed for machine processing, not viewing. The goals of workspace exchange and representing APL symbols in e-mail are sufficiently different that I think two separate methods are appropriate. Plus, WSIS is somewhat out of date: it provides no way of representing ASCII characters, such as @ # % & " ` or lowercase letters, that do not occur in the standard (ca. 1970) APL character set.
[I have since learned that this limitation applies only to WSIS0, not to later versions of WSIS.]
On 16 Feb 94, Pedro Conte de Barros wrote:
> What is wrong with the APL-ASCII transliteration proposed by > Mitloehner?
The scheme developed by Sam Sirlin [2] and Johann Mitloehner [3] is implemented by the PP workspace, which is available for a variety of APL systems. PP uses ".xx" keywords. Although the initial escape character can be changed, period is most commonly used. Although both short and long keywords are available (for example, {delta} can be either .ld or .delta), the short ones are most commonly used. A keyword phrase begins with period and ends with a space. Thus, M{delta}FILE is represented as "M.ld FILE". My objections to PP are:
The short phrases are cryptic. Who would guess that ".ld" stands for delta? Although the long phrases are not cryptic, the mere existence of a more compact form means that it will be used. Indeed, the short form is used by the PP software by default.
The space inserted at the end of a keyword means that names like #IO and M{delta}FILE are split by a space in the ASCII form: ".bx IO" and "M.ld FILE". I think it is unattractive and confusing to have whitespace introduced into a single token. It may not be obvious to APLers unfamiliar with PP that the space following each keyword should be deleted. This can lead to confusing results in cases such as 2 2{rho}'{drop}{disclose}', which is transliterated as 2 2.ro '.da .lu '.
In order to make the process reversible, periods occurring in the input are translated to ".<space >". Thus, the number 3.14 becomes "3. 14" after transliteration, which is confusing. Parsing PP output visually can be very difficult. For example, the following two transliterations look similar:
2.so . .ti 3. 1 2. so. ti 3. 1
but they represent two very different expressions:
2{jot}.{times}3.1 2.so.ti 3. 1
The APLASCII form of the first expression is "2{{jot}}.{{times}}3.1"; the second is unchanged by transliteration.
On 19 Feb 94, Andrew C. LaRoy wrote:
> IBM mainframes in general don't like curly braces. Why not use > another symbol that IS supported by big blue boxes and isn't > used in APL? A pipe [|] comes to mind.
I think it's easier to parse keywords visually if the delimiters are a pair of symmetric characters. Left and right brace clearly mark the start and end of something, and they are used for this purpose in C, Pascal, and many other languages. If braces are unavailable on some computers, surely some workaround has been devised for the other languages that use them. For example:
On 21 Feb 94, Phil Heink wrote:
> I have been following this discussion on two different IBM > mainframes running VM/CMS and have not noticed any problems > recognizing the curly braces ("{{" and "}}"). Downloads (as > ASCII) preserve them just fine also. Could this problem have > been remedied since so many people began programming in "C"?
On 20 Feb 94, Raul Deluth Miller wrote:
> I need to use a system which, for instance, has two independent > vertical bar symbols. And, I have code that really needs to use > both vertical bar symbols.
On Raul Miller's APL*PLUS /VM system (and some other APLs, including APL*PLUS /PC and Dyalog) you can enter a number of similar-appearing characters two different ways. For example, you can press Shift-M in APL keyboard mode to get | (the APL stile), or you can press Shift-Backslash in ASCII mode to get | (the ASCII stile). Although the symbols may look similar on output, they are different characters. They're located in different places in #AV, and the expression '|'='|' will return 0 if one is ASCII and the other is APL. The ASCII stile displays correctly on non-APL terminals but the APL stile does not, so the difference is important. I call these "doubled characters" because they occur twice in #AV. Some APL systems merge similar-appearing APL and ASCII symbols into a single character. Characters that are doubled on some APL systems include | \! ^ ' and ~.
The solution is to transliterate the ASCII stile as {{|}} and represent the APL stile as |. (The APL symbol must be represented as an ASCII stile because the transliterated output must consist only of ASCII characters.) This allows code containing both characters to be moved between compatible APLs without losing the distinction between the two characters. On APL systems that have only one stile, both | and {{|}} are converted to the APL stile when translating back to APL. If code from a one-stile APL is imported into a two-stile APL, the stiles will be mapped to the APL stile, allowing programs that use absolute value and residue to function correctly.
A similar problem is underscored letters. Most APLs have only two alphabetic cases, but some have three: lower, upper, and underscored. How can underscored letters be represented? As {{A_}}, {{B_}}, {{C_}}. This allows triple-case programs to be transferred between compatible APLs. On double-case APLs, {{A_}}{{B_}}{{C_}} is translated to abc (or ABC).
On 22 Feb 94, Richard Levine wrote:
> ...perhaps you might want to consider adding the ability to put > blanks around the transliterated words, to make the text more > readable.
One problem with automatically putting spaces around keywords is that it's not always clear whether they should be removed when translating back to APL. For example, is ' {{iota}} {{rho}} ' a two, three, four, five, or six-element vector? Is "FOO {{delta}} BAR" one name or three? Even if ASCII2APL knows the rules for space removal, the ambiguity remains for users who don't have the program. Although expressions do look cramped without spaces, they are at least unambiguous and obvious. Plus, users can manually insert spaces as desired in places where they don't matter (e.g., outside quotes).
On 24 Mar 94, Bill Chang wrote:
> By the way, am I the only one who finds APL! easier to read (or > more APLish) than the original [{{keyword}} version]?
APL! ("APL bang") [4] is a transliteration scheme devised by Bill Chang in which the most common APL symbols are mapped to ASCII characters or two-character sequences. Less frequently used symbols are replaced by dyadic function names (e.g., {reverse}A becomes "rotate A"). The most common symbols and phrases are:
! iota ? rho ^ take ~ drop * times @ quad <- gets -> goto ~= uneq `1 {neg}1
APL! has the advantage of producing output that is relatively compact and free of extraneous markers such as braces. Once you get used to the translation, it's not hard to read. But it has several drawbacks:
It is unclear and confusing to anyone who has not memorized the character mapping. Several ASCII symbols that are used as functions in APL are reassigned new meanings.
The process is not reliably reversible. If the original text contains words such as "rotate," "maximum," or other phrases such as <-, they will be converted to APL symbols on untransliteration.
The APLASCII programs have been in use for about a year now and have received a generally positive reaction from APLers on the Internet. Even people without the software have been observed using {keywords} to represent APL symbols. There are now more APL-related postings than ever on comp.lang.apl. It is hoped that these transliteration programs have paved the way for communication among APLers to roll forward on the information superhighway.
Copies of the APLASCII workspace are available on-line for various APL systems, including APL*PLUS, APL2, Dyalog APL, and APL.68000. These can be obtained via anonymous ftp from watserv1.waterloo.edu, in the directory languages/apl/workspaces/aplascii. See the read.me file in that directory for more information.
It is difficult to judge transliteration methods without actually seeing them in practice. Here are samples of the three most common transliteration methods, APLASCII, PP, and APL!. First, the original APL code:
Here is the APLASCII transliteration (without the brace-doubling employed elsewhere in this paper):
[6] L{<-}(L{iota}':'){drop}L{<-},L @ drop To: [7] L{<-}LJUST VTOM',',L @ mat with one entry per row [8] S{<-}{neg}1++/^\L{/=}'(' @ length of address [9] X{<-}0{max}{max}/S [10] L{<-}S{rotate}(-({rho}L)+0,X){take}L @ align the (names) [11] A{<-}((1{take}{rho}L),X){take}L @ address [12] N{<-}0 1{drop}DLTB(0,X){drop}L @ names) [13] N{<-},'{alpha}',N [14] N[(N='_')/{iota}{rho}N]{<-}' ' @ change _ to blank [15] N{<-}0 {neg}1{drop}RJUST VTOM N @ names [16] S{<-}+/^\' '{/=}{reverse}N @ length of last word in name
The PP transliteration, generated by running the PP software:
[6] L.is (L.io ':').da L.is ,L .lmp drop To: [7] L.is LJUST VTOM',',L .lmp mat with one entry per row [8] S.is .ng 1++/.and .bl L.ne '(' .lmp length of address [9] X.is 0.ce .ce /S [10] L.is S.rv (-(.ro L)+0,X).ua L .lmp align the (names) [11] A.is ((1.ua .ro L),X).ua L .lmp address [12] N.is 0 1.da DLTB(0,X).da L .lmp names) [13] N.is ,'.al ',N [14] N[(N='.us ')/.io .ro N].is ' ' .lmp change .us to blank [15] N.is 0 .ng 1.da RJUST VTOM N .lmp names [16] S.is +/.and .bl ' '.ne .rv N .lmp length of last word in name
The APL! transliteration, posted by Bill Chang:
[6] L<-(L!':')^L<-,L o} drop To: [7] L<-LJUST VTOM',',L o} mat with one entry per row [8] S<-`1++/and\L~='(' o} length of address [9] X<-0 max max/S [10] L<-S rotate(-(?L)+0,X)^L o} align the (names) [11] A<-((1^?L),X)^L o} address [12] N<-0 1^DLTB(0,X)^L o} names) [13] N<-,'{{alpha}}',N [14] N[(N='_')/!?N]<-' ' o} change _ to blank [15] N<-0 `1^RJUST VTOM N o} names [16] S<-+/and\' '~=rotate N o} length of last word in name
(APL! has no transliteration for {alpha}, so Bill left it as {{alpha}}.)
[1] Manugistics, Inc. APL*PLUS /PC Reference Manual , version 11. Pages 2-27 to 2-34. (1993)
[2] Sirlin, S. "Proposed Standard of ASCII Representation of APL, Version 3." Posted most recently to comp.lang.apl on February 21, 1994.
[3] Mitloehner, J. "Porting APL-Programs via ASCII-Transliteration," APL92 Conference Proceedings, APL Quote Quad , vol. 23, no. 1, pp 148-155. (1992)
[4] Chang, W. "APL! ("APL bang") and examples." Posted most recently to comp.lang.apl on October 4, 1994.