(818) 986-1436 (818) 986-1360 (FAX) This material is based upon work supported by the National Science Foundation under Grant No. III-9261682.

Copyright (c) 1993 by Nimble Computer Corporation.

The

We have programmed several versions of Quicksort in a linear fragment of Common Lisp, and--contrary to the conclusions of [Wakeling91]--find that Linear Lisp produces a very fast Quicksort routine. In fact, a linear Quicksort routine for lists (which doesn't require garbage collection) is considerably faster than a non-linear Quicksort routine for lists (which does require garbage collection), while the linear Quicksort routine for vectors is only 3.5% slower than the non-linear Quicksort routine. We do not find the linear style to be particularly burdensome for programming Quicksort, again disputing the conclusions of [Wakeling91], especially when arrays are subdivided instead of indexed.

The `identity` function is linear, but `five` must dispose of its
argument before returning the value `5`:

The(defun identity (x) x) (defun five (x) (kill x) 5) ; a true Linear Lisp would simply use "x" instead of "(kill x)".

The `square` function requires *two* occurrences of its argument,
and is therefore non-linear. The second copy may be obtained by using the
`dup` function, which accepts one argument and returns *two*
values--i.e., two copies of its argument. (See Appendix for a definition of
`dup`). The `square` function is as follows:

Conditional expressions--e.g.,(defun square (x) (let* ((x x-prime (dup x))) ; Use Dylan-style syntax for multiple values [Shalit92]. (* x x-prime)))

The boolean expression part of an `if`-expression requires more
sophistication. Strict linearity requires that any name used in the boolean
part of an `if`-expression be counted as an occurrence. However, many
predicates are "shallow", in that they examine only a small (i.e., shallow)
portion of their arguments (e.g., `null`, `zerop`), and therefore
a modified policy is more efficient. We have not yet found the best syntax for
shallow predicates, but provisionally use several new `if`-like
expressions: `if-atom`, `if-null`, `if-zerop`, etc. These
`if`-like expressions require that the boolean part be a simple name,
which does not count towards the "occur-once" linearity condition. This
modified rule allows for a shallow condition to be tested, and then the name
can be reused within the arms of the conditional.[1]

We require a mechanism to *linearly* extract both components of a Lisp
cons cell, since any use of `(car x)` precludes the use of `(cdr
x)`, and vice versa, due to the requirement for a single occurrence of
`x`. We therefore introduce a "destructuring let" operation
`dlet*`, which takes a series of binding pairs and a body, and binds the
names in the binding pairs before executing the body. Each binding pair
consists of a pattern and an expression; the expression is strictly evaluated
to a value, and this value is matched to the pattern, which consists of list
structure with embedded names. The list structure must match to the value, and
the names are then bound to the portions of the list structure as if the
pattern had been *unified* with the value. As expected, linearity demands
that a name appear only once within a particular pattern. Linearity
furthermore requires that each name bound by a `dlet*` binding pair must
occur exactly once--either within an expression in a succeeding binding pair,
or within the body of the `dlet*`.

Using these constructs, we can program the familiar `append` and
factorial (`fact`) functions:

(defun lappend (x y) ; append for linear lists. (if-null x (progn (kill x) y) ; trivial kill. (dlet* (((carx . cdrx) x)) (lcons carx (lappend cdrx y))))) ; lcons will be optimized. (defun fact (n) (if-zerop n (progn (kill n) 1) ; trivial kill. (let* ((n n-prime (dup n))) (* n (fact (1- n-prime))))))

(defun qs (x l) ; sort the list x onto the list l. (if (null x) l (let* ((i (car x)) (restx (cdr x)) (high low (highlow restx i nil nil))) (qs low (cons i (qs high l)))))) (defun highlow (x i h l) ; select the high and low elts of x onto h and l. (if (null x) (values h l) (let* ((firstx (car x)) (restx (cdr x))) (if (< firstx i) (highlow restx i h (cons firstx l)) (highlow restx i (cons firstx h) l)))))

(defun lqs (x l) (if-null x (progn (kill x) l) (dlet* (((i . restx) x)) (let* ((high low i (lhighlow restx i nil nil))) (lqs low (lcons i (lqs high l))))))) (defun lhighlow (x i h l) (if-null x (progn (kill x) (values h l i)) (dlet* (((firstx . restx) x)) (let* ((truth firstx i (l< firstx i))) (if truth (lhighlow restx i h (lcons firstx l)) (lhighlow restx i (lcons firstx h) l)))))

(defmacro lcons (a d c) `(rplacd (rplaca ,c ,a) ,d)) (defun lqs (x l) (if-null x (progn (kill x) l) (dlet* (((i . restx) x)) (let* ((high low i (lhighlow restx i nil nil))) (lqs low (lcons i (lqs high l) x)))))) (defun lhighlow (x i h l) (if-null x (progn (kill x) (values h l i)) (dlet* (((firstx . restx) x)) (let* ((truth firstx i (l< firstx i))) (if truth (lhighlow restx i h (lcons firstx l x)) (lhighlow restx i (lcons firstx h x) l))))))

We tested this version of Quicksort on Coral Common Lisp 1.2 on a Macintosh
Plus with a 16MHz Radius 68020 accelerator card. We sorted lists of random
fixnums of various lengths, and it took approximately 45*n*log2n usecs to sort
a random list of n fixnums, or 12.73 secs. for 20,000 fixnums. This timing was
2.16X faster than the built-in `sort` function, although some of this
difference can be attributed to the fact that the built-in `sort` does
not know until run-time what function to use for comparison. Our linear
Quicksort was also 1.84X faster than the built-in `sort` routine for a
*vector* of 20,000 random fixnums.

We found that the Coral Common Lisp compiler did not do a good job of
optimizing tail recursion. In fact, when we moved the `lhighlow`
function inside the `lqs` function--normally an optimization--this
sorting routine got much slower due to the creation of unnecessary closures.
In order to estimate the speed with properly optimized tail recursion, we
performed this optimization by hand by converting it into a Common Lisp
`prog` form with labels and `go`'s. This version was 1.24X
faster with a timing of about 36*n*log2n usecs. With a generic comparison
predicate, it then took 52.1*n*log2n usecs, or 1.81X faster than the built-in
generic `sort` routine for lists.

We thus have produced an extremely efficient linear Quicksort algorithm. In
fact, we are not aware of any further optimizations possible, outside of better
register optimization, or perhaps more in-lining (loop unrolling). Thus, our
linear Quicksort is not only competitive with the best non-linear algorithm, it
*is* the best *non*-linear algorithm.

(defun vqs (v k m) ; Quicksort vector v from k up to m. (if (>= k m) v (let* ((x (aref v k)) ; Create a hole at k. (i (split1 v k (1- m) x))) ; Do partition. (setf (aref v i) x) ; Put x back into the hole. (vqs v k i) ; Quicksort v from k up to i. (vqs v (1+ i) m)))) ; Quicksort v from i+1 up to m. (defun split1 (v i j x) ; hole is at v(i). (if (= i j) i (let* ((vj (aref v j))) ; Copy elt. to vj. (if (< vj x) (progn (setf (aref v i) vj) (split2 v (1+ i) j x)) (split1 v i (1- j) x))))) (defun split2 (v i j x) ; hole is at v(j). (if (= i j) i (let* ((vi (aref v i))) ; Copy elt. to vi. (if (>= vi x) (progn (setf (aref v j) vi) (split1 v i (1- j) x)) (split2 v (1+ i) j x)))))

*Swapping* instead of *copying* elements from a vector can introduce
an inefficiency into linear Quicksort unless the partitioning loop is tuned to
take advantage of this behavior. In a nonswapping Quicksort, each vector
element is implicitly copied to a register and then compared to the test
element. If the vector element is already in its proper partition, it does not
have to be restored. However, if not, it is stored into the hole created by a
previous movement. In a linear Quicksort, each vector element is swapped into
a register and then compared to the test element. It is therefore better to
have the register hold an element value before the swap, so that the swap can
simultaneously perform both reading and writing. We can "prime the pump" by
having the register initialized with an element that should be stored in the
location of the next swap. Thus, the circuitry transferring data in both
directions is utilized.

The requirement for continually passing around the vector is not onerous. However, the requirement to explicitly copy the index values is pedantic. A type system allowing both linear and non-linear types, with the non-linear types used for things like array indices, would allow for a more pleasant style for this task. Even without such types, however, a compiler should find it easy to optimize trivial duplications.(defun lvqs (v k m) ; Quicksort vector v from k up through m. (let* ((truth k m (l>= k m))) (if truth (progn (kill k) (kill m) v) (let* ((x v k (laref v k 'hole)) ; Get test element. (k1 (1+ k)) (ve v k1 (laref v k1 'hole)) ; Get lower element. (ve x (sort2 ve x)) ; Conditionally exchange ve, x so that ve<=x. (k1 k1-prime (dup k1)) ; Trivial dup. (m m-prime (dup m)) ; Trivial dup. (v i ve x (lsplitlow v k1-prime m-prime ve x)) ; Do partition. (ve v k1 (laref v k1 ve)) ; Put back lower element. (x v i (laref v i x)) ; Put in test element. (k (1- k1)) (x v k (laref v k x)) ; Put back lower element. (i i-prime (dup i))) ; Trivial dup. (lvqs (lvqs v k (1- i-prime)) (1+ i) m))))) (defun lsplitlow (v i j ve x) ; ve<=x, i<=j. (let* ((truth i j (l= i j))) (if truth (progn (kill j) (values v i ve x)) (let* ((i (1+ i)) (ve v i (laref v i ve)) ; Swap lower elt. with unknown elt. (truth ve x (l<= ve x))) (if truth (lsplitlow v i j ve x) (lsplithigh v i j ve x)))))) (defun lsplithigh (v i j ve x) ; ve>x, i<=j. (let* ((truth i j (l= i j))) (if truth (let* ((ve v i (laref v i ve))) ; Swap upper elt. with lower elt. (kill j) (values v (1- i) ve x)) (let* ((ve v j (laref v j ve)) ; Swap upper elt. with unknown elt. (j (1- j)) (truth ve x (l<= ve x))) (if truth (lsplitlow v i j ve x) (lsplithigh i j ve x))))))

As in the list Quicksort case, our vector Quicksort was slow due to poor tail-recursion optimization by the Coral Common Lisp compiler. We performed appropriate tail-recursion optimizations by hand for both the non-linear and linear versions of our vector Quicksort, and got substantial speedups. Since Coral Common Lisp does not handle the returning of multiple values efficiently, we performed additional hand optimizations to understand their value in this context. We found that the efficient handling of multiple values is critical to achieving acceptable performance on our linear benchmark. In particular, we assumed that built-in linear versions of predicates which returned multiple values could be compiled as efficiently as in a normal applicative system where the additional values did not need to be returned. We also assumed that the built-in array reference swapping instructions would be similarly optimized.

The final linear generic[2] swapping Quicksort is only 3.5% slower than the non-linear generic non-swapping Quicksort. We could achieve parity if we could eliminate the redundant index calculations which result from the emulation of swapping on an assignment-oriented architecture. Furthermore, if a hardware swap operation takes the same amount of time as a hardware read operation (we have every reason to believe it can), then a swapping Quicksort can be strictly faster than a non-linear Quicksort.

(defun lvqs (v) ; Quicksort the linear vector v in-place. (if-empty v v ; If v is empty, we are done. (let* ((hole v (first&rest v)) ; Split vector into subvectors of lengths 1 and n-1. (x hole zero (laref hole 0 'empty)) ; Get test element from hole. (x lo hole hi (part1 x 0 'empty '#() hole v '#())) ; Partition vector. (hv hole zero (laref hole zero x)) ; Put test element back into hole. (lo (lvqs lo)) ; Sort low partition. (hi (lvqs hi))) ; Sort high partition. (kill zero) (kill hv) ; Trivial kills. (catenate lo hole hi)))) ; Return reconstructed vector. (defun part1 (x zero hv lo hole1 mid hi) ; Partition mid using x; v = lo:hole1:mid:hi. (if-empty mid (progn (kill zero) (kill hv) (kill mid) (values x lo hole1 hi)) (let* ((mid hole2 (rest&last mid)) ; Split off last element from mid. (y hole2 zero (laref hole2 zero hv)) ; Extract element y from hole2. (truth y x (l< y x))) ; Compare y with test element. (if truth (let* ((hv hole1 zero (laref hole1 zero y))) ; Put y into hole1. (part2 x zero hv (catenate lo hole1) mid hole2 hi)) ; Reuse lo,hole1 storage. (let* ((hv hole2 zero (laref hole2 zero y))) ; Put y into hole2. (part1 x zero hv lo hole1 mid (catenate hole2 hi))))))) ; Reuse hole2,hi storage. (defun part2 (x zero hv lo mid hole2 hi) ; Partition mid using x; v = lo:mid:hole2:hi. (if-empty mid (progn (kill zero) (kill hv) (kill mid) (values x lo hole2 hi)) (let* ((hole1 mid (first&rest mid)) ; Split off first element from mid. (y hole1 zero (laref hole1 zero hv)) ; Extract element y from hole1. (truth y x (l< y x))) ; Compare y with test element. (if truth (let* ((hv hole1 zero (laref hole1 zero y))) ; Put y into hole1. (part2 x zero hv (catenate lo hole1) mid hole2 hi)) ; Reuse lo,hole1 storage. (let* ((hv hole2 zero (laref hole2 zero y))) ; Put y into hole2. (part1 x zero hv lo hole1 mid (catenate hole2 hi))))))) ; Reuse hole2,hi storage.

A vector subdivision Quicksort is ideal for a parallel processing machine--whether using parallel functional units (VLIW) or parallel instruction streams--because the vector is subdivided into exclusively-owned partitions. These subvectors can then be sorted completely independently and in parallel. The linear style, in which all events are ordered solely by dataflow constraints, makes trivial the compiler's job of detecting interference. The linear style recognizes that every access to a register--e.g., a read used to perform a comparison--ties up the register and therefore requires that the register be passed as an argument and returned as a value to guarantee proper ordering. Yet the linearity constraint also allows the factoring of the binding environment so that subexpressions can easily be reordered or executed in parallel. Quicksort may not be the ideal parallel sorting routine, but this linear subdividing Quicksort automatically extracts as much parallelism as possible.

Many may find onerous the linearity requirement that all names be explicitly consumed. However, we have found that this linearity check is an excellent programming aid, because--for example--it makes sure that every parameter of a function is either used or given a reason why not.

Linearity in a language is closely related to the *typestates* [Strom83]
of NIL/Hermes, and to *islands* [Hogg91].

Aho, A.V., *et al*. *The Design and Analysis of Computer
Algorithms*. Addison-Wesley, Reading, MA, 1974.

Baase, S. *Computer Algorithms: Introduction to Design and Analysis*.
Addison-Wesley, Reading, MA, 1978.

[Baker92BD]
Baker, H.G. "The Buried Binding and Dead Binding Problems of Lisp 1.5: Sources
of Incomparability in Garbage Collector Measurements". ACM *Lisp Pointers
V*,2 (Apr.-June 1992), 11-19.

[Baker92LLL]
Baker, H.G. "Lively Linear Lisp -- 'Look Ma, No Garbage!'". ACM *Sigplan
Notices* *27*,8 (Aug. 1992), 89-98.

[Baker92REV]
Baker, H.G. "NREVERSAL of Fortune--The Thermodynamics of Garbage Collection".
*Int'l. W/S on Memory Mgmt*., St Malo, France, Sept. 1992, Springer LNCS
637.

Chirimar, J., *et al*. "Proving Memory Management Invariants for a
Language Based on Linear Logic". *Proc. ACM Conf. Lisp & Funct.
Prog*., San Francisco, CA, June, 1992, also ACM *Lisp Pointers V*,1
(Jan.-Mar. 1992), 139.

Friedman, D.P., and Wise, D.S. "Aspects of applicative programming for
parallel processing". *IEEE Trans. Comput. C-27*,4 (Apr. 1978),
289-296.

Girard, J.-Y. "Linear Logic". *Theoretical Computer Sci. 50*
(1987),1-102.

Harms, D.E., and Weide, B.W. "Copying and Swapping: Influences on the Design
of Reusable Software Components". *IEEE Trans. SW Eng. 17*,5 (May
1991),424-435.

Hesselink, W.H. "Axioms and Models of Linear Logic". *Formal Aspects of
Comput. 2*,2 (Apr-June 1990), 139-166.

Hogg, J. "Islands: Aliasing Protection in Object-Oriented Languages".
*Proc. OOPSLA'91, Sigplan Not. 26*,11 (Nov. 1991), 271-285.

Knuth, D.E. *The Art of Computer Programming, v. 3, Sorting and
Searching*. Addison-Wesley, Reading, MA 1973.

Lafont, Y. "The Linear Abstract Machine". *Theor. Comp. Sci. 59* (1988),
157-180.

Martí-Oliet, N., and Meseguer, J. "From Petri nets to linear logic".
*Math. Struct. in Comp. Sci. 1*,1 (Mar. 1991).

Mendelson, A. "A Single Cached Copy Data Coherence Scheme for Multiprocessor
Systems". *Comput. Arch. News 17*,6 (Dec. 1989), 36-49.

Shalit, A. *Dylan(TM): An object-oriented dynamic language*. Apple
Computer, Camb., MA, 1992.

Strom, R.E. "Mechanisms for Compile-Time Enforcement of Security". *Proc.
10th ACM POPL*, Jan. 1983.

Suzuki, N. "Analysis of Pointer 'Rotation'". *CACM 25,*5 (May
1982)330-335.

Wadler, P. "Is there a use for linear logic?". *Proc. ACM PEPM'91*, New
Haven, June 1991, 255-273.

Wakeling, D., and Runciman, C. "Linearity and Laziness". *Proc. Funct.
Progr. & Computer Arch.*, LNCS 523, Springer-Verlag, Aug. 1991,
215-240.

(defun kill (x) ;;; Return no values. (if-atom x (kill-atom x) (dlet* (((carx . cdrx) x)) (kill carx) (kill cdrx) (values)))) (defun dup (x) ;;; Return 2 values. (if-atom x (dup-atom x) (dlet* (((carx . cdrx) x)) (let* ((carx carx-prime (dup carx)) (cdrx cdrx-prime (dup cdrx))) (values (lcons carx cdrx) (lcons carx-prime cdrx-prime))))))

[1]
Although this rule seems a bit messy, it is equivalent to having the shallow
predicate return *multiple* values: the predicate itself and its
unmodified arguments. This policy is completely consistent with linear
semantics.

[2]The comparison predicate passed to a linear generic
Quicksort returns 3 values: a boolean value and the two unmodified arguments.
Such a predicate allows the linear Quicksort to avoid *all* copying of
array element values.