- Problem description
- Bob Smith's quad-divide solution
- Correction
- My practical looping solution
- Comparison with Bill Chelew's looping solution

Last quarter's Zark APL Puzzler was the "execute inverse" function, whose job was to return an expression that, when executed, produced an array identical to the function's argument. The most recent newsletter includes Evan Jennings' solution for APL2 (2{drop}2 #TF'A', which was posted in c.l.a), and two less-atomic solutions by Rex Swain, one for flat arrays and one for nested arrays.

The puzzler in the latest issue is a classic: word wrapping. Probably dozens of you lurking APLers out there have word-wrap routines that you've written. Send them in to Zark! (And please post them here, too. You might also mention to Zark that you heard about the contest on comp.lang.apl.) If your solution is published, you might win yourself a complimentary subscription to the Zark APL Tutor Newsletter.

Back when I was working for STSC, I seem to remember hearing that Bob Smith ("Boolean Bob", of partition functions fame, and the author of the pioneering NARS nested array system and a little thing called 386Max) had a particularly interesting word wrap program. It used quad divide! Hey Bob, are you still out there?

JimLIMBERING UP:

Word Wrap

Given a lengthy piece of text, it is sometimes necessary to "wrap" it so it fits within a specified width. The piece of text is broken into subsets, without breaking any words, so that each subset is no longer than the specified width. The following sample has been wrapped within a width of 17:

This is a sample. The last nonblank character in each line fits within the specified width.

Your task is to write a dyadic function WRAP that does wrapping. Its right argument is the character vector to be wrapped. Its left argument is the width within which the text is to be wrapped. Its result is a character vector with newline (carriage return) characters inserted in such a way that the text wraps appropriately when displayed. For example:

17 WRAP 'This is a sample. The ...' This is a sample. The last nonblank (etc.)

Some notes:

The result has the same length as the right argument. The newline character (#TCNL or #TC[2]) REPLACES the first blank after the last word in each line.

If the right argument already contains newline characters, don't ignore them and don't remove them. Each one begins a new line. In a sense, existing newlines are "paragraph" delimiters.

Consider each group of contiguous non-blank characters a word. The aim is to wrap the text without breaking any words. However, if any "word" is longer than the specified width, you should insert one or more newline characters within it to break it into pieces whose length equals the width. If this is the case, the result will naturally be longer than the right argument.

Your primary objective is to write a function that works. You'll receive extra credit (a pat on the back) if the function is efficient, clever, simple, elegant, or can leap tall buildings in a single bound.

Mail your solution on paper or APL*PLUS PC or APL*PLUS II disk to:

Zark Incorporated

Re: Limbering Up Task

23 Ketchbrook Lane

Ellington, CT 06029

U.S.A.

The notable functions and their authors' names will be printed in the next issue of the Zark APL Tutor News. Entries must be received by April 15, 1995. Good luck and happy limbering.

I forwarded a copy of my puzzler posting to Bob Smith, and he replied via e-mail. Here's what he said:

Howdy, Jim! Yes, I'm still kicking, but not in the APL arena anymore.

I did solve the word wrap problem in response to a challenge posed in the Sharp APL newsletter (where the solution was published) way back, sometime in the mid to late '70s.

The solution is non-looping by virtue of using quad-divide to compute the transitive closure of a boolean matrix. The puzzle I solved was simpler than the one you state (no multiple blanks, no embedded newlines, no words longer than the width), but I'm sure it can be modified to suit the form posed. I don't have the code anymore, but here's the idea:

* First of all, transitive closure is a concept from graph theory on directed graphs. A directed graph is a graph where each line has an arrow (direction) associated with it. If point A is connected to point B, there is a line between the two with the arrow pointing towards B. This doesn't mean that B is connected to A, though. If that's the case, there's an additional line from B to A with its arrow pointing towards A.

On a directed graph with N nodes, construct an incidence matrix (which is Boolean and square, N by N) with a 1 in the cell [I;J] iff node I is connected with a single line to node J. We would now like to know the complete connection matrix for the graph. That is, for each node (say I), with which nodes is it connected (perhaps via several lines). This is another square Boolean matrix where there are 1's in row I for each node for which there is a directed path from node I to that node. Consider the original incidence matrix (call it M) to be a connectivity matrix for all length one paths in the directed graph. If you then calculate M {or}.{and} M, you have the connectivity matrix for all length two paths. Length three paths are calculated as

M {or}.{and} M {or}.{and} M

Successive lengthening of this sequence calculates successive length paths. Of course, you don't need to apply this more than N times, but it doesn't hurt as the matrices beyond iteration N are all zero. If you then {or} together all the matrices (starting with the identity matrix as each node is considered to be connected with itself), then M, then M {or}.{and} M, etc., you have the transitive closure of the original incidence matrix.

Although this all sounds terribly like looping, it isn't. Consider the polynomial

1 + X + (X*2) + (X*3) + (X*4) + ...

or in a preferable form

1 + X + (X {times} X) + (X {times} X {times} X) + ...

It's easy to see that this is equivalent to

{divide} 1 - X

with suitable restrictions. Now it's time for a great leap (which is justified by some advanced mathematics): in the above two polynomials, replace the X with M, the + with {or}, the {times} with {or}.{and}, the 1 with the identity matrix, and the {divide} with {quad-divide}, and you have a non-looping APL solution to transitive closure! If you have a long paragraph, don't forget to start with a big workspace.

* Whatever does this have to do with word wrap, you might ask. Here's how:

Each blank in the paragraph can be considered a potential line break. Assume there are N blanks in the paragraph. For each such blank (say node I in a directed graph), calculate the number of the next blank (say node J) such that if node I is a line break, then node J is where the next line break would be (this is easy to do non-looping). In this manner construct an entire incidence matrix for the paragraph with a 1 in [I;J] iff node J is the proper place to put a line break if node I is also a line break.

Calculate the transitive closure of this incidence matrix and read off the first row. It is a mask on the blanks in the paragraph of the line breaks.

-----------------------------------------------------

Thanks for giving me the opportunity to explain this. If you decide to pass
this message on, please clean up the APL symbols as I'm not sure I'm using
the correct notation. [*They're all fine. -JimW*]

------------------------------

Bob Smith -- bobs@access.digex.net

What a solution! It ties together text processing, directed graphs, Boolean algebra, and linear algebra. I certainly didn't know that a transitive closure could be computed by taking the inverse of a matrix. This is the kind of stuff that makes me appreciate APL as a tool of thought.

As can be expected with a jewel of an algorithm like this, it doesn't exactly solve the messy, real-world problem. (It won't insert newlines into the middle of long words, and it doesn't recognize existing newlines.) And it was not designed to be efficient. (I suspect the challenge in the Sharp APL newsletter was to write a non-iterative program, at any cost, using the standard APL operations available back in the late 1970s.) But it's a fascinating and educational technique nonetheless.

Here's my stab at creating a working program from Bob's description:

{del} Z{<-}W WRAP V;C;I;J;M;P [1] @Wraps text vector {omega} to a width of {alpha} columns [2] @ V and Z are character vectors. Z is similar to V, but with [3] @ selected blanks replaced by newline characters. [4] @ [5] V{<-}' ',V @ a blank is required to get it started [6] J{<-}(V=' ')/{iota}{shape}V @ indices of blanks [7] P{<-}(J+W+1){jot}.<J @ 1s mark blanks past the cutoff [8] M{<-}P<1{rotate}P @ mark last blank that fits on the line [9] I{<-}({shape}M){reshape}1,(1{drop}{shape}M){reshape}0 @ identity matrix [10] C{<-}{domino}I-M @ compute transitive closure of M [11] V[C[1;]/J]{<-}#TCNL @ insert the line breaks [12] Z{<-}1{drop}V @ drop the initial blank {del}

M is the incidence matrix representing the directed graph. It has a row and a column for each blank in V, and a single 1 on each row. If M[i;j] is 1, it means that, starting just past the i-th blank, the j-th blank is the last blank that will fit within the width W+1. (W+1 because the blank can hang just beyond the specified width.) M[1;] links the first blank (the start of the first line) to the j-th blank (the end of the first line). Then, M[j;] links the j-th blank (start of the second line) to the k-th blank (end of the second line), and so forth. Computing C, the transitive closure of M, merges all these i, j, k, ... links into a single linkage with the first blank, and so the first row of C gives the line-delimiting blanks.

(What do the other rows of C represent? They tell where you would insert line breaks if you started from places other than the first blank.)

Thanks for the marvelous example, Bob!

Jim

On 9 Mar 1995 Bernhard Strohmeier wrote:

I am afraid the algorithm of Roger Hui has a little flaw, too. I tried 15 fit t and the last line of the text was too long.

Wouldn't you know, my implementation of Bob Smith's algorithm has the
same flaw. The algorithm requires that there be a blank at the start
*and* end of the text string. Here's a corrected version:

{del} Z{<-}W WRAP V;C;I;J;M;P [1] @Wraps text vector {omega} to a width of {alpha} columns [2] @ V and Z are character vectors. Z is similar to V, but with [3] @ selected blanks replaced by newline characters. [4] @ [5] V{<-}' ',V,' ' @ algorithm requires blanks at start and end [6] J{<-}(V=' ')/{iota}{shape}V @ indices of blanks [7] P{<-}(J+W+1){jot}.<J @ 1s mark blanks past the cutoff [8] M{<-}P<1{rotate}P @ mark last blank that fits on the line [9] I{<-}({shape}M){reshape}1,(1{drop}{shape}M){reshape}0 @ identity matrix [10] C{<-}{domino}I-M @ compute transitive closure of M [11] V[C[1;]/J]{<-}#TCNL @ insert the line breaks [12] Z{<-}1{drop}{neg}1{drop}V @ drop the blanks we added {del}

Thanks to Tom Chwastyk, Bernhard Strohmeier, and Andrew Seary for pointing out the limitations of using {domino}I-M to compute transitive closure: The graph M must have no cycles (i.e., all elements on and below the diagonal must be zero) and the result incorrectly has 1s along the main diagonal. These restrictions do not affect WRAP, as there are never cycles and the line break corresponding to the diagonal element gets dropped off by line [12].

Jim

Lest anyone forget, the noniterative domino-based WRAP was, I believe, never intended as a practical, production-grade program. Here's how I would solve this problem in practice. #IO is 1. Standard plain-vanilla APL (except for #TCNL, which is APL*PLUS for newline).

{del}Z{<-}W WRAP2 V;B;C;E;I;J;N;S;T [1] @Wraps text vector {omega} to a width of {alpha} columns [2] @ Arguments: [3] @ V - character vector to be wrapped. May already contain [4] @ newlines; these are left intact. [5] @ W - width for longest line (excluding blanks) [6] @ Result: [7] @ Z - similar to V, but with some blanks replaced by newlines. [8] @ If any words are wider then W, newlines will be inserted, [9] @ making Z longer than V. [10] @ [11] T{<-}{divide}W{max}0 @ width must be positive! [12] V{<-}' ',V,' ' @ simplify edge conditions [13] B{<-}V{epsilon}' ',#TCNL @ mark blanks and newlines [14] C{<-}C/{iota}{rho}C{<-}B>1{drop}B,0 @ last of each group of blanks [15] E{<-}{neg}1+E/{iota}{rho}E{<-}B>{neg}1{drop}0,B @ nonblank before {+ +}each group of blanks [16] N{<-}N/{iota}{rho}N{<-}V=#TCNL @ indices of newlines [17] I{<-}{iota}0 @ where to insert additional newlines [18] S{<-}1 @ points to char before start of current line [19] [20] L1: @ Loop for each line [21] J{<-}{neg}1+(E>S+W){iota}1 @ find word that extends past line end, {+ +}then back up [22] {->}(J={rho}E)/L4 @ quit if at end [23] T{<-}((S<N)^N{<=}C[J])/N @ newlines that fall in this line [24] {->}(0={rho}T)/L2 @ If any, [25] S{<-}{neg}1{take}T @ use last one as the line base [26] {->}L1 @ and repeat [27] L2:T{<-}S @ old base [28] S{<-}C[J] @ new line base (break point) [29] {->}(S>T)/L3 @ If not past old base, word is too long [30] I{<-}I,S{<-}T+W @ remember to insert a newline past S [31] {->}L1 @ Else, [32] L3:V[S]{<-}#TCNL @ replace blank with newline [33] {->}L1 @ Endloop [34] [35] L4:{->}(0={rho}I)/L5 @ If any newlines need inserting (long words), [36] E{<-}(({rho}V)+{rho}I){rho}1 @ form expand mask [37] E[I{<-}I+{iota}{rho}I]{<-}0 @ where the newlines go [38] V{<-}E\V @ insert spaces... [39] V[I]{<-}#TCNL @ and put in the newlines [40] L5:Z{<-}1{drop}{neg}1{drop}V @ drop the blanks we added {del}

What an incredibly different kind of program than domino WRAP. Although some edge conditions were thought of as the algorithm was developed, debugging this kind of beats-as-it-cleans-as-it-sweeps iterative algorithm was mostly a matter of trying the program with lots of different arguments. Prove that it's correct? Ha! Proud of it? Not exactly. Frame it and put it on the wall? Never. But use it? You bet. It's reasonably fast. (Using APL*PLUS II/386 v5.2 on my 486/66, the looping version wraps this 16-line, 180-word paragraph in 0.029 secs; domino WRAP takes 30.5 secs to do the same job, about 1000 times slower.) If WRAP2 has any bugs, they won't be hard to fix. And with its guts laid bare, the program is easy to enhance. For example, I might want to add the requirement that when breaking a long "word" the split should occur, if possible, adjacent to a symbol, not within a group of letters. (A useful enhancement when wrapping a program listing.) Line [30] is the place to change--add some code to pick a break point other than T+W.

JimI meant to send the following message to comp.lang.apl, but other stuff came up and I never got around to it.

The second quarter issue of Gary Bergquist's Zark APL Tutor News contains solutions to the Word Wrap puzzle posted here on March 2. Published programs include the Bob Smith nonlooping wonder, an enhancement of Bob's program written by Rex Swain, my looping solution, and another looping solution from Bill Chelew. I've written programs to solve this problem several times in the past, but I wrote a version especially for the Zark puzzle, in part to avoid slow execution caused by repeated catenation in the loop. Bill's program uses the straightforward technique that I've used in the past (grab a line, scan backwards for a blank, and catenate the line to the result), and what do you know, his program runs twice as fast as mine (for large arguments)! So much for my attempt at writing blazingly fast code.

I decided to take a closer look at the execution time. APL*PLUS has
a "monitoring facility" (#MF) that allows you to see how much time is
spent on each line of a function. So I timed the functions using
arguments having the same size (if not the same content) as Gary used.
When I did the timings with APL*PLUS II/386, I was surprised to find
that my program was *not* twice as slow as Bill's; it was a little
slower for small arguments and a bit faster for large arguments. I
thought about this and realized that Gary may have done his timings on
the APL*PLUS /PC system, which does not have a Boolean datatype. The
/PC system stores Boolean data as 16-bit integers, and my version of the
program uses more Boolean operations than Bill's. So I timed the
programs on the /PC system and found that, indeed, on that system, my
program ran twice as slowly as Bill's.

As a comparison of the programs, and of the APL*PLUS /PC and /386 systems, here are the execution times (in seconds):

| 200 2000 20000 ------------+------------------------ /386 WRAP2 | 0.011 0.058 1.102 WRAP3 | 0.010 0.055 1.297 | /PC WRAP2 | 0.018 0.118 3.688 WRAP3 | 0.014 0.078 1.877

The timings were done on my 66 MHz 486, using APL*PLUS II/386 v5.2 and APL*PLUS /PC v11.0. WRAP2 is my function, as posted here in c.l.a., and WRAP3 is Bill Chelew's function. The arguments were as follows:

A{<-}'This is a sample. The last nonblank character in each {+ +}line fits within the specified width. ' 200: 35 WRAP 200{rho}A 2000: 65 WRAP 2000{rho}A 20000: 65 WRAP 20000{rho}A

(But the right argument was reshaped before doing the timing.)

By the way, the versions written by Bob Smith and Roger Hui are both in the "demonstration program" category, turning in execution times of 246 and 29 seconds, respectively, for the 2000-element case. (Using APL*PLUS II/386 and my new J 2.06 system.) They each failed with WS FULL on the 20,000-element case.

The following output shows the line-by-line execution time of the two programs for a 2000-element argument. The programs were run 10 times each to build up enough resolution. The "/386" and "/PC" columns are execution times in milliseconds (total for the 10 runs). The "Iter" column gives the number of times each line was executed. When monitoring a function, a certain amount of the effort spent collecting times leaks into the totals. I have attempted to correct for this effect by subtracting a factor times the iteration count from the line times so the total time agrees with the figures shown in the tables above (which was measured with monitoring off). Garbage collects occur randomly, causing anamolously high times for some lines. (This may account for line [14] taking significantly longer than line [15] in the /PC version of WRAP2.)

/386 /PC Iter {del} Z{<-}W WRAP2 V;B;C;E;I;J;N;S;T 0 1 0 [1] @Jim Weigang's version [...] 2 1 10 [11] T{<-}{divide}W{max}0 6 5 10 [12] V{<-}' ',V,' ' 35 112 10 [13] B{<-}V{epsilon}' ',#TCNL 70 175 10 [14] C{<-}C/{iota}{rho}C{<-}B>1{drop}B,0 72 114 10 [15] E{<-}{neg}1+E/{iota}{rho}E{<-}B>{neg}1{drop}0,B 15 60 10 [16] N{<-}N/{iota}{rho}N{<-}V=#TCNL 1 1 10 [17] I{<-}{iota}0 1 0 10 [18] S{<-}1 0 0 0 [19] 0 12 0 [20] L1: 124 326 320 [21] J{<-}{neg}1+(E>S+W){iota}1 46 53 320 [22] {->}(J={rho}E)/L4 58 108 310 [23] T{<-}((S<N)^N{<=}C[J])/N 43 56 310 [24] {->}(0={rho}T)/L2 0 0 0 [25] S{<-}{neg}1{take}T 0 0 0 [26] {->}L1 1 15 310 [27] L2:T{<-}S 29 35 310 [28] S{<-}C[J] 40 45 310 [29] {->}(S>T)/L3 0 0 0 [30] I{<-}I,S{<-}T+W 0 0 0 [31] {->}L1 25 38 310 [32] L3:V[S]{<-}#TCNL 4 8 310 [33] {->}L1 0 0 0 [34] 1 2 10 [35] L4:{->}(0={rho}I)/L5 0 0 0 [36] E{<-}(({rho}V)+{rho}I){rho}1 0 0 0 [37] E[I{<-}I+{iota}{rho}I]{<-}0 0 0 0 [38] V{<-}E\V 0 0 0 [39] V[I]{<-}#TCNL 6 8 10 [40] L5:Z{<-}1{drop}{neg}1{drop}V {del} 579 1177 10 Totals /386 /PC Iter {del} Z{<-}W WRAP3 V;N;CR;L 0 0 0 [1] @Bill Chelew's version of WRAP 1 1 10 [2] CR{<-}#TCNL 1 1 10 [3] Z{<-}'' 50 64 320 [4] LOOP:{->}(0=N{<-}{rho}V)/0 61 89 320 [5] L{<-}(N{<-}N{min}W+1){take}V 39 54 320 [6] {->}(N>W)/L1 5 3 10 [7] Z{<-}Z,L 0 0 10 [8] {->}0 71 55 310 [9] L1:{->}(~CR{epsilon}L)/L2 0 0 0 [10] Z{<-}Z,(N{<-}L{iota}CR){take}L 0 0 0 [11] V{<-}N{drop}V 0 0 0 [12] {->}LOOP 66 88 310 [13] L2:{->}(' '={neg}1{take}L)/L4 58 96 300 [14] N{<-}(W+2)-({reverse}L){iota}' ' 36 41 300 [15] {->}(N=0)/L3 20 36 300 [16] L[N]{<-}CR 80 106 300 [17] Z{<-}Z,N{take}L 58 102 300 [18] V{<-}N{drop}V 0 28 300 [19] {->}LOOP 0 0 0 [20] L3:Z{<-}Z,(W{take}L),CR 0 0 0 [21] V{<-}W{drop}V 0 0 0 [22] {->}LOOP 2 4 10 [23] L4:{->}({or}/L{/=}' ')/L5 0 0 0 [24] Z{<-}Z,CR 0 0 0 [25] {->}L6 1 2 10 [26] L5:L[W+1]{<-}CR 2 2 10 [27] Z{<-}Z,L 4 6 10 [28] V{<-}(W+1){drop}V 2 3 10 [29] {->}(' '{/=}1{take}V)/LOOP 0 0 0 [30] L6:N{<-}(V{/=}' '){iota}1 0 0 0 [31] V{<-}(N-1){drop}V 0 0 0 [32] {->}LOOP {del} 557 781 10 Totals

In my version, I managed to avoid operations such as catenation or take/drop that processed the entire text vector on each iteration of the loop. But I didn't eliminate all such operations--for example, on line [21], the entire vector of word-end positions is processed on each iteration. And the comparison and searching operations involved in this statement are apparently rather slow on the /PC system.

The next time I need a paragraph-wrapping subroutine, I think I'll stick with Bill's technique. It's nice to know that you don't always have to be clever to get reasonably fast execution time in APL.

Jim

Home Page