Nimble Computer Corporation

16231 Meadow Ridge Way

Encino, CA 91436

(818) 986-1436

(818) 986-1360 FAX This work was supported in part by the U.S. Department of Energy Contract No. DE-AC03-88ER80663

Copyright (c) 1989-90 by Nimble Computer Corporation

In this paper we show various techniques for the efficient implementation of the various functions of Common Lisp involving bit-vectors and bit-arrays. Bit-vectors are extremely useful for computing everything from the Sieve of Eratosthenes for finding prime numbers, to the representation of sets and relations, to the implementation of natural language parsers, to the performance of

We show how to organize the wide variety of bit-vector operations in Common Lisp into a few generic types of processing, and show how word-wide parallelism can be utilized for almost all of these operations to achieve speedups over similar character operations of approximately the size of the word. The major exception to our presentation will be Common Lisp's "SEARCH" function for bit-vectors, which can also be speeded up [Baker89], but the subtlety and complexity of its implementation is outside the scope of this paper.

Symbolic processing can be roughly compared with numeric processing, where
there are *dense* matrix techniques and *sparse* matrix techniques.
Dense matrix techniques are best suited for relatively small matrices, where
the underlying physical model is very closely coupled. Sparse matrix
techniques are best suited for larger matrices where the underlying physical
model is very loosely coupled. Dense matrix techniques assume that most matrix
elements are non-zero, and hence that most operations involving those matrix
elements are non-trivial. Sparse matrix techniques try to take advantage of
the fact that a large fraction of the matrix elements are zero, and hence
operations involving these elements are trivial. However, sparse matrices may
still have relatively complex structure, even though a large portion of their
elements are zero, so flexible methods must be used to represent and manipulate
these sparse matrices. Most sparse matrix techniques, therefore, make use of
*pointers* and linked structures which are much more reminiscent of
symbolic than of numeric processing.

In symbolic processing, the underlying relationships are almost always sparse,
hence the predominance of pointers and linked structures. Nevertheless, there
are occasions where the relationships are so unstructured or so dense that
pointer techniques are not optimal for symbolic processing. For example, the
*transitive closure* of a binary relation is often dense, even if the
original relation is not. In cases such as these, bit-vectors and bit-arrays
can often be useful, if their implementation is efficient enough.
([Aït-Kaci89] performs transitive closures of object-oriented hierarchies
using bit-vector techniques.)

Below is a straight-forward implementation in Common Lisp of the "or-and"
multiplication of a bit-matrix by a bit-vector, which is useful in computing
the *image set* resulting from the application of a binary relation to a
*domain set*.

Below is an implementation of Warshall's famous "in-place" algorithm for computing the transitive closure of a bit-matrix in Common Lisp. (Unfortunately, due to bugs in the implementation of displaced bit-arrays, this elegant program will not run on some Common Lisp implementations.)(defun mult-mat-vec (a v) "Post-multiply bit-matrix a by bit-vector v." (declare (type (array bit 2) a) (bit-vector v)) (assert (= (array-dimension a 1) (length v)) (let* ((m (array-dimension a 0)) (result (make-sequence 'bit-vector m :initial-element 0))) (dotimes (i m) (when (intersection-test (nmatrix-row a i) v) (setf (elt result i) 1))) result)) (defun nmatrix-row (a i) "Returns a vector displaced to the i'th row of a matrix a." (declare (type (array * 2) a)) (let ((n (array-dimension a 1))) (make-array n :displaced-to a :displaced-index-offset (* i n)))) (defun intersection-test (v1 v2) "Returns true if bit-vectors intersect one another." (declare (bit-vector v1 v2)) (assert (= (length v1) (length v2)) (some #'logtest v1 v2))

Common Lisp provides a number of representations and operations for bit vectors and arrays. The most obvious is the "BIT-VECTOR" datatype, which is a subtype of "ARRAY" in which the elements are restricted to "BIT"'s, i.e., the integers zero and one, and in which the array rank (number of dimensions) is one. Bit-vectors of this type (which we will call "array" bit-vectors) are also Common Lisp "sequences" because they are one-dimensional arrays, so they inherit all of the operations of the Common Lisp sequence functions. However, there are a number of additional Common Lisp functions which operate on all "bit-arrays"--i.e., arrays of bits. Curiously, even though Common Lisp has an abbreviation for "(ARRAY BIT (*))" -- "BIT-VECTOR" -- and an abbreviation for "(SIMPLE-ARRAY BIT (*))" -- "SIMPLE-BIT-VECTOR" -- it does(defun nwarshall (a) "Implements Warshall's algorithm for the transitive closure." "Transitively closes the square bit matrix a in place." "[Baase, p.223]" (declare (type (array bit 2) a)) (assert (= (array-dimension a 0) (array-dimension a 1))) (let ((n (array-dimension a 0))) (dotimes (k n) (let ((ak (nmatrix-row a k))) (dotimes (i n) (let ((ai (nmatrix-row a i))) (unless (zerop (bit a i k)) (bit-ior ai ak t))))))) a)

Unfortunately for a potential user of bit-vectors in a Common Lisp program,
these different representations and function suites have not been rationalized,
so it may be difficult to decide *a priori* which representation to use in
an application. To make things worse, some Common Lisp vendors provide such
inefficient implementations of some of the bit-vector functions that they are
useless for real programming.

Before delving into the efficient implementation of the various bit-vector
operations, we first give some perspective on the relationships among these
different operations through the following chart, which compares the operations
available for "integer" bit-vectors with those available for "array"
bit-vectors. (See [Eliot89] for a discussion of various representations for
sets in Common Lisp; note particularly the chart in his Figure 3.)

We can see by this chart that there is a great deal of overlap between the types of operations available for the various forms of bit-vectors. However, there are also some obvious holes, such as the lack of a quick intersection test for "array" bit-vectors, the lack of a subset test, and the lack of a reverse operation on integers. (A reverse operation on integers would come in handy for the implementation of FFT's, and would also aid in certain other recursive decomposition tasks.)Operation Integer Version Array Version Sequence Version Copy v unnecessary make-array :initial-contents v copy-seq Make-all-zeros 0 make-array :initial-element 0 make-sequence :initial-element 0 Make-all-ones -1 make-array :initial-element 1 make-sequence :initial-element 1 Test all zeros zerop, not ldb-test N/A not find 1 Test all ones eql -1 N/A not find 0 Size integer-length array-dimension(s) length Access i'th bit logbitp bit, sbit elt Boolean fns boole N/A N/A logand bit-and map #'logand logior bit-ior map #'logior logxor bit-xor map #'logxor logeqv bit-eqv map #'logeqv lognand bit-nand map #'lognand lognor bit-nor map #'lognor logandc1 bit-andc1 map #'logandc1 logandc2 bit-andc2 map #'logandc2 logorc1 bit-orc1 map #'logorc1 logorc2 bit-orc2 map #'logorc2 lognot bit-not map #'lognot Intersect test logtest N/A some #'logtest Subset test N/A N/A every #'<= Select portion ldb,ash make-array :displaced subseq Count bits logcount N/A count Leftmost 1 integer-length N/A position Mask a field mask-field N/A :start, :end Reverse N/A N/A reverse, nreverse Concatenate logior+ash N/A concatenate Compare =, equal, equalp equal, equalp mismatch Search subseq N/A N/A search Block init N/A :initial-element ? fill "BLT" blk trans dpb :initial-contents ? replace, setf-subseq Chart of analogous operations for integers, bit-arrays and bit sequences.

The largest difference, however, between array-type bit vectors and
integer-type bit vectors is in the fact that integer bit-vectors are
*functional*, while array bit-vectors allow for *side-effects*.
Thus, if `n` different pointers exist to an integer bit-vector, and one
applies the "lognot" function to that bit-vector, then this operation only
affects the *result*, and not any of the `n` existing pointers to
the integer. On the other hand, "bit-not" with a second argument of "t" will
flip all of the bits in the bit-vector given as its first argument--not only
for the result bit-vector, but also for every other holder of a pointer to that
argument bit-vector. Therefore, integer bit-vectors are better for functional
programming, while array bit-vectors are better for state-oriented
programming.

(Those in the functional programming camp may argue that the functional integer
bit-vectors are good enough for all purposes. However, I am not aware of any
Lisp compilers which are smart enough to solve the "aggregate update problem"
for integer bit-vectors [Schwartz75][Hudak86], and therefore the manipulation
of integer bit-vectors results in quite a bit more *cons*'ing than might
be used when programming in a more imperative style utilizing array
bit-vectors. Without a generational garbage collector [Moon84], the excellent
efficiency of the integer bit-vector operations is nullified by the large
amount of garbage collection that results from their use.)

Given that we would like a complete complement of bit-vector operations for either style of programming, there are two ways to fill the holes in our chart--add additional operations for both types of bit-vectors and/or add a conversion function which offers the efficient conversion of one type into the other. Since we want to allow for the widest possible flexibility in the use of the Common Lisp language, we suggest doing both--filling the holes in the chart with new functions and providing for the interconvertability of integers and array bit-vectors (see also [Eliot89] for a similar proposal).

(There are often other differences between integer bit-vectors and array
bit-vectors due to the vagaries of an implementation. E.g., because Coral
Common Lisp on the Macintosh allocates temporary space from the run-time stack
for all integer operations, its operations are very fast, but its integers are
also limited to 32K bytes, which is quite large enough for most *integer*
calculations, but not large enough for many multi-dimensional *bit-array*
applications.)

Some languages other than Common Lisp offer bit-vector facilities for
bit-vectors which exceed the word length of the underlying hardware. The
original Pascal implementation offered "sets" up to size 60, thanks to Control
Data. The Ada language [AdaLRM] offers the equivalent of Common Lisp's
*bit-not*, *bit-and*, *bit-ior*, *bit-xor*, *equal*,
*subseq*, *copy*, *concatenate*, *fill*, and *replace*
as builtin capabilities for arbitrary-length bit-vectors. Interestingly
enough, efficient Ada compilers must deal with the full complexity of
*packed* bit-vectors which can occur with any bit-offset to a word
boundary, just as we describe below. Ada also offers one builtin capability
not available in Common Lisp--a lexicographic comparison, although such a
comparison can easily be emulated using *mismatch*. Of course, the APL
language--as the inspiration for many Common Lisp functions--offers many
interesting and useful functions for manipulating bit-vectors, some of which
are missing in Common Lisp--*scan* and *bit-vector-reduce*--which can
often be quite useful.

By efficient, we mean that we would like to take advantage of the SIMD-type "word parallelism" that exists on all modern serial computers. In other words, if we can operate on 16-bit words at a time (called "Whiz-Along-By-Words" in [White86]) instead of single bits, we would like to achieve a 16-fold speedup over the fully serial operation operating on only a single bit at a time. Clearly, the larger the word size that we can utilize, the faster these operations should go.

There are some complications, however. Very few computers offer addressing capabilities down to the individual bit, and demand that words be addressed on "word boundaries", so there are the possibilities that incomplete words need to be processed, and that two bits which are to be combined may reside at different bit locations within a word. Both of these complications will require additional processing time, and add substantial complexity to the task of efficient implementation. Yet the payoff of 16-64 X speedups is far too important to ignore.

Virtually all processing of bit-vectors in Common Lisp is stream-like,
involving one or more input streams of bits, and zero or one output streams of
bits. (One can easily conceive of interesting operations involving multiple
output streams of bits, such as a dual complement/reverse complement operation
which treats the bit-vectors as sets, but none of the standard Common Lisp
bit-vector operations requires more than *one* output bit stream.)
Depending upon the operation, these bit-streams are derived from the bit-vector
by processing from the start of the vector to the end, or from the end of the
vector to the start. The bit-stream may also need to be inverted in the
process of being read or written.

The key to the efficient processing of bit-vectors is in the ability to process more than one bit at a time. Therefore, these bit streams need to be able to read or write more than one bit at a time (White calls these chunks "bigits" [White86]). Ideally, one would like to specify the number of bits to be processed--or "byte size"--perhaps 8 bits at a time for one operation, or 32 bits at a time for another operation. The complications of word boundaries occurs when processing multiple bits--sometimes one must process less than the full word width, and sometimes the word to be processed straddles a word boundary.

(By *word*, we mean the smallest unit which can be *efficiently* read
and written to memory. This does not necessarily mean the smallest addressable
unit, as many modern "CISC" chips can address a 32-bit word occurring on any
8-bit boundary, but the maximum efficiency is reached when 32-bit words are
read and written on 32-bit boundaries. By *byte*, we mean some number of
bits which has been chosen as convenient for a particular type of processing;
our *bytes* do not necessarily have 8 bits.)

Vanilla Common Lisp already has an excellent model for bit streams which could theoretically be used for these purposes--the "binary" file. Binary files in Common Lisp are homogeneous sequences of "bytes", but since the byte size need not remain the same for all uses of the file, the only portable implementation is as a sequence of bits. One could conceive of setting up a binary file associated with each bit-vector which needed to be processed, which would in turn set up a "byte-stream" which could be read or written, queried for end-of-file, etc. Such a stream could easily be set up to read either from the beginning or the end, and to perform on-the-fly complementation.

A true Common Lisp "stream" implementation of the bit-vector operations would
be very easy to write code for, and the code would be quite readable. However,
such an implementation would be even more inefficient than a straight-forward
serial implementation of the same functions. This is because most
implementations of streams in Common Lisp systems implement stream operations
as function calls, and expect to perform quite a bit of processing per byte
read or written, so the relative overhead of the stream operations is not
normally objectionable. However, in the implementation of bit-vector
operations, such overhead would be intolerable, since the cost of a
bit-operation is essentially *free* compared with the cost of reading and
writing the data.

However, rather than "throw out the baby with the bath water", we can keep the
"byte stream" model of processing, but implement it with some extremely
efficient *macros* instead of true Common Lisp streams, and thereby
achieve intellectual economy together with high execution speed.

(A problem occurs in trying to implement these byte stream macros in
*portable* Common Lisp--there is no efficient way to access the
representation of a bit-vector where the actual bits are stored. Coral Common
Lisp offers an escape--some open-coded "sub-primitives" which enable us to
access 8-bit, 16-bit and 32-bit "bytes" at any offset within an
object--irrespective of the type of the object. While using these operations
involves delving unnecessarily deeply into the gory details of the
representation of bit-vectors, Common Lisp provides no other intermediate level
access to the actual bits. The basic nature of these routines is roughly
equivalent to *ldb* for a particular subset of sizes and positions, except
that we operate on bit-vectors instead of integers. Perhaps a future Common
Lisp revision can incorporate such an operation.)

At the same time that we implement a new set of specialized and efficient
byte-stream macros, we can also handle one of the complications of word
addressing at the same time. While it is obvious that the *last* byte of
a sequence may sometimes not be a *full* byte, but only a *partial*
byte, there are times when it is more efficient to also process the
*first* byte as only a partial byte. This is for "synchronization"
reasons, when the processing of the middle (neither first nor last) bytes is
thereby speeded up. Since we expect that bit-vectors will sometimes be quite
long, it is important that the middle bytes be processed in the most efficient
manner. Therefore, in the setting up of such a byte stream, we must separately
specify both the *offset* within a byte of the beginning of the first byte
as well as the *length* of the first byte.

What different capabilities will be required of our "byte-streams"? We will obviously need to read and write them, but less obvious is the need to "update" them in place. Due to the possibility of the "t" argument in the "bit-xxx" bit-array logical operations, it will be necessary to be able to read, then write back into the same location, the result of an operation. We will also have to read backwards (although we will keep the order of the bits within the byte the same as the forwards order), and we may also want to be able to complement a byte after reading or before writing.

Due to the fact that during any builtin Common Lisp bit-vector operation, at
most *one* bit stream will be written (or updated), we can *always*
"synchronize" all of the bit streams being read to the one being written. Once
this synchronization has been performed, we can always *write* words on
word boundaries, and thereby avoid a large amount of complexity and
inefficiency, because writing across word boundaries entails more inefficiency
than reading across word boundaries. Therefore, only the first and last byte
on a write (or update) stream will be less than a full byte, and even that byte
will never cross a word boundary. Thus, for implementing just the standard
Common Lisp bit-vector functions, we need not implement the most general
possible write/update stream.

(Note that even if a computer offered hardware addressing to the single bit, and allowed the hardware reading and writing of words at any bit boundary, it would still be much more efficient for the software to operate on full-word boundaries. This is because while the hardware reduces the software complexity in terms of the number of instructions executed, it is still slower, due to the larger number of cache misses and memory accesses.)

*Synchronization* of different byte streams is the process of choosing a
stream which will be processed (except for its first and last bytes) on a word
boundary, and then computing the appropriate offsets for the other streams so
that the corresponding bits become aligned for processing. With a single
stream, we can always synchronize the stream to itself, while if there are two
streams, we can synchronize one to the other. (As we pointed out above, we
will always synchronize to the write stream because writing unaligned streams
is more expensive than reading unaligned streams.) With the possibility of up
to two streams for reading and one stream for writing required to implement
some Common Lisp bit-vector operations, we have the possibility that all three
streams will have different alignments, so we force both read streams to
synchronize to the write stream. We call the case of similar alignments
"aligned" in our benchmarks, and the case of different alignments "unaligned".

Many implementors who first approach Common Lisp bit-arrays assume that because the definition of the "bit-xxx" functions (e.g., "bit-and", "bit-ior") require the same rank and the same dimensions, that they will always be "synchronized" in the sense that corresponding bits from the different arrays will occur at the same location within a word. However, this is not the case when one or more of the arrays is "displaced", as the "index-offset" of a displaced array can be any non-negative integer, which can therefore position the first element of the array at any bit within a word. In addition, these functions must also be careful not to disturb bits earlier in the word before the first element or later in the last word, because the underlying simple-array may also be accessible to the user, and bits which are not participating in the "bit-xxx" function should not be affected. The "nwarshall" algorithm shown above depends critically upon the correct implementation of "bit-xxx" functions on displaced arrays.

(We note that most modern workstations already have extremely efficient
implementations of the "bit-xxx-t" functions under the guise of the "bitblt"
function used to manipulate the two-dimensional bitmap of the screen. In fact,
Symbolics Common Lisp utilizes these functions to implement the "bit-xxx"
functions, and they are very fast; curiously, Symbolics sequence functions on
bit-vectors are abysmally slow (ca.Rel. 7.0). However, *bitblt* (at least
as originally envisioned by the Xerox Alto) is a *two*-dimensional
operation involving only two arguments, and translating its use to a single
dimension and three arguments for use within Common Lisp can require some care.
Coral Common Lisp, for example, does not handle the case of *un*aligned
arguments correctly. When special purpose *bitblt* hardware is available
for use within normal memory, its could provide the effect of an array
processor for these Common Lisp functions. One must be careful, however, of
potentially long *startup overheads* for these 2-dimensional operations;
Common Lisp bit-vectors which are short also want to be efficiently
manipulated.)

We note that several proposals [Moon] have been made to modify the nature of the standard Common Lisp bit-vector functions. We feel that our techniques will not be affected in any major way by these proposals, so we have retained the original 1984 Common Lisp semantics [Steele84].

We will now go through the Common Lisp bit-vector functions in turn and indicate how they can be efficiently implemented. All times given are for Coral Common Lisp 1.2 running on a Macintosh Plus with a 16MHz Radius 68020 Accelerator card and 4Mbytes of main memory.

Our implementation of *find/position* in Coral Common Lisp utilizes a
byte-size of 16 bits (hence processes the bit-vector 16 bits at a time), and
operates in about .5usec/bit processed.

Our implementation of "find1" in Coral Common Lisp utilizes a byte-size of 16 bits (hence processes the bit-vector 16 bits at a time), and operates in .46usec/bit processed, while "find0" requires .62usec/bit processed.

*Equal/mismatch* requires two read streams, with the second being
synchronized to the first, so that at least one stream is processed on word
boundaries. When a non-equality is found, then *mismatch* calls
"position1" on the logxor of the two words to find the position of mismatch.

Our implementation of *equal/mismatch* requires .82usec/bit for aligned
streams, and 1.41usec/bit for unaligned streams.

Our implementation of *fill* requires .45usec/bit processed.

*Replace* is the closest operation Common Lisp has to a "bitblt" or
"bit-block-transfer" operation. Since Common Lisp requires that replace "work
correctly" even when source and destination overlap, we will require both a
normal and a "from-end" version of *replace*, since when we need to move a
subsequence of a bit-vector *up* within the same vector, and the
destination overlaps the source, we will lose information if we transfer
information from the start rather than from the end.

Whether replace-ing from the start or from the end, we always synchronize the source to the destination.

Both versions of *replace* take the same amount of time in our
implementation: .78usec/bit when source and destination are aligned, and
1.3usec/bit when unaligned.

Bit-not-t takes .89usec/bit, while bit-not-2 takes .99usec/bit aligned and 1.57usec/bit unaligned.

The *bit-xxx-3* functions have two read streams and a write stream, and
come in both from-start and from-end versions. If a source overlaps with the
destination, and both sources are *higher* than the destination, then we
use the from-end version. The worst case occurs when the destination overlaps
both sources, and yet lies in between them. In this case, we can neither
process from the beginning or from the end without getting into trouble.
Therefore, we either allocate a temporary on the stack or on the heap
(depending upon the size), and move the offending source to the temporary
location (using replace) and then utilize the *bit-xxx-3* function on the
now-tractable situation.

(We note that it is theoretically possible to perform bit-xxx-3 in-place without using additional storage even in the worst case of overlap of both sources and the destination when the destination is between the two sources. However, this algorithm requires a group-theoretic approach to performing the updates in certain cycles, in a manner similar to algorithms which have been proposed for transposing non-square matrices in place without additional storage [Brenner]. Such a method is far too complicated (and slow, since it tends to operate on a single element at a time) for use in a real Common Lisp implementation.)

*Bit-and-t* takes 1.04usec/bit for aligned sources and 1.55usec/bit for
unaligned sources. *Bit-and-3* takes 1.1usec/bit for aligned sources and
destination and 2.16usec/bit for all three unaligned. Add an additional
.78usec/bit for the additional copy for the worst case.

Despite their similar names, a different technique is required in order to
perform *reverse* and *nreverse*. We actually denote the functions
"reverse-2" and "reverse-t". *Reverse-2* is called to move and reverse a
bit-vector to a destination known not to overlap with the source. One
"from-end" stream is used to read, while a normal "from-start" stream is used
to write the destination. The read stream is synchronized to the write stream.
A 256-element table is used to reverse the bits within an 8-bit byte, and this
table is used twice to reverse the bits in a 16-bit word.

*Reverse-t* requires a subsidiary function which reverses a
*byte*-vector in place. We first save the non-participating bits from the
first and the last bytes, then reverse the entire byte-vector in place.
However, the bits in this vector may not be properly aligned, so we then call
*replace* to shift the appropriate bits up or down, if required. Finally,
the non-participating bits are restored.

*Reverse-2* takes 1.8usec/bit for word-aligned bit-vectors and 2.4usec/bit
for non-word-aligned bit-vectors. *Reverse-t* takes 1.7usec/bit for
byte-aligned bit-vectors and 3.1usec/bit for non-byte-aligned bit-vectors.

We utilize two read streams similar to *equal* or *mismatch*, with
the second stream synchronized to the first. Whenever the logical and of the
two bytes read is non-zero, then we immediately stop. In the case where the
intersection is likely to occur towards the end of the vectors, there may be a
requirement for a "from-end" version.

Our test-bit-vector requires .95usec/bit for aligned bit-vectors, and 1.41usec/bit for unaligned bit-vectors.

Our implementation of count in Coral Common Lisp requires .467-1.53 usec/bit processed.

For these reasons, we do not perform in-place deletions, and "delete" and "delete-duplicates" are just additional names for "remove" and "remove-duplicates", respectively.

*Nsubstitute* is easier than substitute, as we need only call "fill" on
the affected portion of the bit-vector to make the affected portion either all
0's or all 1's.

*Merge* of two bit-vectors is slightly more efficient, since we need only
find the *first* 1 in each sequence rather than count the 1's before
constructing the answer.

CCL CCL Nimble Nimble Speedup Operation simple dsplcd aligned unaligned Factor (X faster) subseq 102 112 --use replace-- 78-144 copy-seq 86 94.2 --use replace-- 66-121 reverse 3.4 84 1.8 2.4 1.4-47 nreverse 2.3 93 1.7 3.1 .74-55 make-seq :initial 45 -- --use fill-- 100 fill 46.8 56 .45 .45 104-124 replace 95.2 117 .78 1.3 73-150 remove nothing 133 142 .933-1.983 67-152 remove everything 94 106 .467-1.533 61-227 remove-duplicates 635 665 .5-.733 866-1330 substitute nothing 135 142 --use replace-- substitute everything 140 147 --use fill-- find1 90.2 100.3 .46 .46 196-218 find0 90.2 100.3 .62 .62 145-162 position 89.5 98.3 .5 .5 179-197 count 90.8 100.3 .467-1.53 .467-1.53 59-215 mismatch 154 175 .917 1.43 107-191 equal 70.3 93 .85 1.43 49-109 sort appr 94*n*log2n .92*n to 2.0*n 103-infinite intersection test 260 -- .95 1.4 186-274 bit-not-t .94 bad ans..89 .89 1.06 bit-not-2 .94 bad ans..99 1.57 .95 bit-and-t .99 bad ans.1.04 1.55 .95 bit-and-3 1.0 bad ans.1.1 2.16 .91 mult-mat-vec 10^3x10^3 approx. 5 min. 1 second worst case nwarshall 100x100 bad ans. 1.6 seconds worst case Timing Comparison of Coral Common Lisp (CCL 1.2) with Nimble bit-vector operations (All times are in usec/bit processed on 4Mbyte Mac+ w/16MHz 68020 accelerator)

This implementation is not the fastest possible for the Macintosh hardware: perhaps an additional factor of 5-8 could be accomplished through a combination of moving from a byte-size of 16 to a byte-size of 32 and recoding in assembly language. On the new pipelined RISC processors, additional factors can be gained through the proper scheduling of operations within the pipeline and defeating the cache (which is no help when accessing FIFO streams). We estimate that the new generation of 25+ mips serial processors now coming on stream should achieve times which are 100-1000 times as fast as these shown for the Macintosh -- i.e., flirting with 1 GBOP (giga-bit-operations/second). Finally, bit-vector operations on true SIMD architectures such as the "DAP" and the "Connection Machine" can gain addition factors of 100 on bit-vector operation speeds.

Aït-Kaci, Hassan; Boyer, Robert; Lincoln, Patrick, and Nasr, Roger.
"Efficient Implementation of Lattice Operations". *ACM TOPLAS 11*,1 (Jan.
1989),115-146.

Baase, Sara. *Computer Algorithms: Introduction to Design and Analysis*.
Addison-Wesley, Reading, Mass., 1978.

Baker, Henry. ML:HGB;BBOOLE >. MIT AI Lab., 1975.

Baker, Henry. "The Efficient Implementation of Common Lisp's SEARCH Function on Bit-vectors". Internal Memorandum, Nimble Computer Corporation, 1989.

[Baker92]
Baker, Henry. "A Decision Procedure for Common Lisp's SUBTYPEP Predicate".
*J. Lisp and Symbolic Computation 5 *(1992), 157-190.

Brenner, Norman. "Algorithm 467: Matrix Transposition in Place". *CACM
16*, 11 (Nov. 1973),692-694.

Eliot, Christopher R. "Manipulating Sets in Common Lisp". *Lisp Pointers
2*, 3&4 (Jan.-June 1989),5-14.

Hudak, Paul. "A Semantic Model of Reference Counting and its Abstraction".
*Proc. 1986 ACM Conf. on Lisp and Funct. Prog.*, August,351-363.

Moon, David. "Garbage collection in a large Lisp system". *Proc. 1984 ACM
Conf. on Lisp and Funct. Prog.*,235-246.

Moon, David. "BIT-ARRAY-FUNCTIONS", proposal for modifying the Common Lisp standard. June, 1989.

[Pratt73]
Pratt, Vaughan R. "A Linguistics Oriented Programming Language". *Proc.
IJCAI 3*, (Aug. 1973),372-380.

Schwartz, J.T. "Optimization of very high level languages, Part II: Deducing
relationships of inclusion and membership". *Computer Languages 1*,3
(1975),197-218.

[Steele84]
Steele, Jr., Guy L. *Common Lisp: The Language*. Digital Press,
Burlington, Mass., 1984.

White, Jon L. "Reconfigurable, Retargetable Bignums: A Case Study in
Efficient, Portable Lisp System Building". *Proc. 1986 ACM Conf. on Lisp and
Funct. Prog.*, August,174-191.