Global Variables

During the summer of 1996 a discussion about the use of global variables appeared in comp.lang.apl. I meant to post the following article but never got around to it.

Raul Miller wrote on July 11, 1996:

Semi-globals are different. Semi-globals are typically used in APL to work around the limitation of passing only two arrays to a function, or to work around the limitation that name spaces aren't first class objects of the language.

In first-generation APLs this was certainly the case, but nested arrays make it very easy and efficient to pass any number of arrays into or out of a function. Also, I don't think there's much practical difference between true globals (localized nowhere) and semi-globals (global to some functions, but localized in some higher-level function). Although a semi-global ceases to be global at some level, all the problems of globals can be encountered with semi-globals as well. In the following discussion, I use the term "global" to mean any object not localized by the function that uses it.

Sometimes the number of arguments to a function is inconveniently large. Making some of the arguments be global variables allows you to focus attention on the more significant arguments, which can be passed as formal (left and right) arguments. For example, APL's equal function has three arguments: the two usual operands, plus the comparison tolerance. Passing the latter via the global #CT allows you to focus attention on the operands in places where = is used. This technique is also handy for arguments that don't usually change with every call. An example is "symbolic constants", such as variables that specify file names and paths.

Globals are also useful for pass-through communication. A function that uses a subroutine shouldn't have to know about every feature of the subroutine. Some features might logically be the responsibility of a higher-level routine, and only the higher-level routine should know about the feature. Globals variables are a way of implementing direct long-distance communication between routines that have one or more functions between them on the calling stack. #PP is an example here, with the user typically being the ultimate higher-level routine. Functions that use monadic format (explicitly, or implicitly by displaying numeric values) shouldn't generally have to know about the printing precision; they shouldn't have to limit themselves by specifying a fixed value that can be altered only by changing constants within the program. The user can communicate the desired precision to {format} by setting a global variable.

Sometimes a global variable can be thought of as part of a program. For example, some tasks can be implemented more efficiently by precomputing a table and using the table to avoid needless recomputation each time the program is called. Such globals are not really much different from a global subroutine used by a program. An example of this is the SETS3 function (c.l.a, 16 Apr 96), which used a precomputed matrix of combinations. A somewhat different example is my collection of assembler routines (FastFns), which have the machine code stored in global variables. This was originally done on the APL*PLUS/PC system because it is much faster than imbedding the long numeric vector in the function, but it has other advantages as well: It's easier to update the object code, and on the APL*PLUS II/III systems, the correct signature in the first element can be installed once, when the object is loaded into the workspace, instead of having to be done each time the FastFn is called.

Still another use for globals is for private, static data used by a function. ("Static" meaning the previous value will be needed on the next call to the function.) An example is #RL. It would be an nuisance to have to specify #RL as both an argument and a result to every roll or deal operation. Although it might be nice if such data could be imbedded within a namespace for the module, it's not hard to avoid the problems that a namespace would solve by means of some simple naming conventions. And working in a flat namespace environment is more convenient than having to constantly switch from one namespace to another.

Using globals does require a certain amount of discipline to avoid problems:

Even with these conventions, there are still some disadvantages to using globals. If a function has a global argument that needs to be changed from one call to the next, you can't easily build an expression such as (FOO X)+(FOO Y) that involves two calls to the function. An explicit argument would be more convenient in this case. However, many subroutines are never used in compound expressions, so this is not a real problem for them. Global results also make it difficult to build expressions, but returning multiple explicit results does not make it much easier to construct expressions. If you want to be able to use a function freely in expressions, it should return a single result.

As for compilation, I think declarations should inform the compiler about the type/rank of upward globals and should be used to indicate any downward globals that are referenced or set by subroutines. (It would also be useful to be able to specify exactly which variables are used and set by each subroutine, so the compiled code doesn't have to materialize them all for every call and check them on every return.) I think it's a mistake to allow the semantics of a language be driven too much by the needs of a compiler (e.g., altering scope rules or banning globals). APL's success stems in part from implementing what makes sense, rather than what's easy to do.

Using globals presents the programmer with some extra challenges in the goal of writing clear programs, but they are an indispensible tool in writing flexible, efficient, and maintainable applications.

Home Page