From: Jim Weigang <jimw@chilton.com>
To: Leonard Howell <leonard.howell@msfc.nasa.gov>
Newsgroups: aus.mathematics,comp.lang.apl,comp.lang.fortran,
comp.programming.contests,microsoft.public.vb.visual_modeler,sci.stat.math
Date: Sun, 24 May 1998 10:10:07 -0400
Subject: Re: Interesting Programming Problem, Need Some Help

Leonard Howell wrote on May 21:

> Here is a interesting problem that perhaps someone has solved before
> Hor has an idea of how to program the solution.  Given a square array
> H(NxN) filled with zeros and ones (in practice, there are many more
> Hzeros than ones), what is the distribution of patterns of size 1, 2,
> H3, ...., where a pattern is defined any string of adjacent 1's,
> Heither horizontally or /and vertically distributed, and the number
> Hof "singlets" is also of interest  To simply the problem, I'm
> Hignoring diagonal connections. For example, in the following array,
> Hthere are:  5 ones, 1 two, 1 three, 2 fours, and 1 six.

This is a variation/subset of the "blob finder" problem.  Once you know
the frequency of different size blobs, it seems likely that a followup
request will be to identify the individual blobs in the matrix.

Here's my algorithm for finding blobs:

* Visit each nonzero cell (top to bottom, left to right)

* If the cell is 1, assign it a new blob index (>= 2)

* Examine the following neighbors of the cell:
. . .
. X n
n n n

* If a neighbor is 1, replace it with X's blob index

* If a neighbor is not 1 or 0, remember that blob n is
equivalent to blob X

* After visiting all the cells, make another pass through the
matrix, recoding the initial blob indices to merge equivalent
blobs and use sequential indices

The blob population counts can be accumulated by bumping a counter
whenever you change a 1 to a blob index.

This algorithm has enough inherent iterativeness that I didn't spend
much time trying to "parallelize" it for fast APL execution.  A
straightforward iterative coding (for the APL+ dialect) is given below.
It would make an interesting test for Bob Bernecky's APEX APL compiler,
or it could be translated to a conventional compiled language.  If it's
really going to be run using an interpreter, further improvements could
be made.  (In particular, the :For K loop could be parallelized easily
if Dyalog APL's P[i]{<-}.+1 feature were available in APL+.)

to vertical and horizontal lines.  Line [16] can be modified if you
want to ignore diagonal neighbors.  If you don't want the blob indices
and can't afford the space taken by the integer version of the matrix,
the code could be modified to work with just two rows of the Boolean
matrix at a time (as Bill Huber suggested).

Here are some examples:

B{<-}1=?10 15{rho}4
B
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 1 0 0 1 1 0 0 0 0 1 1 1 0 1
0 0 1 0 0 0 1 1 1 1 0 0 0 0 0
0 1 1 0 0 0 1 0 0 0 0 0 0 0 0

DISP BLOBS B
. . . . . . B . . . . A . . .     3  1    <- three singletons
. . . . B B B . . . . . . . .     2  2    <- two doubles
. . . . . . . . . . . . . . .     2  4
C C . . . D D . . . . . E . E     1  5
. . . . . . . . . . . . . E .     1 13    <- one 13-item blob
. . . . . . . . . F . . E . E
. . . . F . . G . F . . . . .
. H . . F F . . . . F F F . I
. . H . . . F F F F . . . . .
. H H . . . F . . . . . . . .

B8
0 0 0 0 0 0 1 0                 Leonard's example
1 0 0 1 1 0 1 1
0 1 1 0 1 1 0 0
1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 1 1 1 0 0 0
1 0 1 0 0 0 0 1

DISP BLOBS1 B8             BLOBS1 ignores diagonal neighbors
. . . . . . A .     5 1
B . . C C . A A     1 2
. D D . C C . .     1 3
D D D . . . E .     2 4
. . D . . . . F     1 6
. G . . . . . F
. . H H H . . .
I . H . . . . J

And here's the code:

Jim

 Leonard wrote back and described what the program would be used for: For our application, the elements of the array can be thought of as pixels of an imaging system looking down on the Earth, and the connected "1"s are flashes caused by cosmic rays interacting with the atmosphere. We will be interested in making inferences regarding the longer tracks and bigger blobs as they are associated with the more energetic nuclei that have greater scientific interest. We will also want to investigate the inherent noise of the detector and the statistics of signal and noise, etc. BTW, to simplify the problem, I had originally stated that one could look at only the adjacent 1's that are beside, above, and below, but as it turns out the diagonal adjacents will need to be included too. [The program] will most likely be an important part of a large Monte Carlo simulation of the performance characteristics of the Orbiting array of Wide-angle Light collectors (OWL): a Pair of Earth Orbiting "Eyes" to Study Air Showers Initiated by 10^20 Electron Volt Quanta. The matrix cells can be thought of as pixels of the detector and the blobs will be flashes caused by incident cosmic rays and photon noise in the atmosphere (e.g., star light reflections). We will probably want to add other features for discriminating the blobs, e.g., look for straight line patterns, at a later date. We hope to first fly a proof of concept instrument on a high altitude balloon with something like a 50 by 50 pixel detector (that should run very fast!), and then later a spaceborne detector on a satellite with a 1000 by 1000 pixel imager. We will likely have a homepage for this project before too long and I'll post things of interest there too. (You can take a look at one of our other project homepages at http://science.msfc.nasa.gov/cosmicray/ica/default.htm). He also reported a couple of bugs that have been fixed in the listing above. In the process of fixing the bugs I discovered something I hadn't known about the APL+ interpreter, as described in the following message: