Faster Loops

Originally written JAN 2023.
Last updated 06 FEB 2023.
See bug list and revision history below.

Virtually all interesting data processing requires loops. Speed is a feature. Speed is valuable. The speed of loops can sometimes be improved by spatial data locality and by temporal locality. The difference is eminently practical. For example, Adam Drake achieved a speedup of factor 235 by leveraging locality. His simple trick seems to work, because his data fits in his laptop's RAM. Servers have a surprising amount of RAM nowadays.. Note, that it seems unlikely for this trick to work on unbounded data, because storing data requires physical space. This in turn, implies a lower bound on the minimum distance and therefore, a minimum bound on the achievable latency and locality.

Repeated computations, meaning loops, can be implemented in many different technical ways, for example recursion, recursion-trampolines or iteration. Recursion saves state on something known as the call-stack. Trampolines put function objects onto something known as the heap as opposed to the call-stack. Well-designed iteration only touches the heap and conceptually avoids the additional function objects that trampolines require.

This sounds like iteration looks preferable from this angle. One solution to transform recursion into iteration is kown as tail-call optimization (TCO) or tail-call elimination. Some programming languages implement TCO, however, JavaScript de-facto has no tail-call optimization. And de-facto standards seem to be the most effective and most impactful standards. They are observed in the real world.

Another way of implementing loops iteratively is pipelining. With caches and prefetchers modern CPUs seem to be amenable to this approach.

Pipelining loops takes Adam Drake's trick one step further by folding his chain of command line tools into one programming language subroutine or function. This folds them into a single operating system process. In contrast, unix shell pipelines involve multiple unix processes and system call overhead for communicating between them. This overhead may hamper data locality. The speedup comes from limiting indirection. See also discussions on token-threading vs. subroutine-threading.

The SERIES package by Richard C. Waters (1989a, 1989b) provides a compiler and pipeliner for loops. Think Pipelined Relational Query Language (PRQL) but tested for since approximately 1980. Library SERIES went through about ten years of testing, before being listed in Appendix A of Common Lisp the Language, 2nd edition and Appendix B of Common Lisp the Language, 2nd edition. Generator have even made it into Python3.

Example for `SERIES`

Data pipeline that benefits from pipelining.

bash$ sbcl;

sbcl*

;; Dependencies:
;;  1. SERIES ... already available via quicklisp
;;
(ql:quickload :series)

(series::install :macro t :shadow t)

(use-package :series)

;; No intermediate lists
(collect                               ; pipelined by SERIES
  (map-fn T (lambda (x) (* x 2))
    (map-fn T #'1+
      (scan '(1 2 3)))))

;; => (4 6 8)



;; No intermediate lists
(mapcar (lambda (x) (* (1+ x) 2))      ; pipelined by hand
        '(1 2 3))

;; => (4 6 8)



;; Intermediate lists
(mapcar (lambda (x) (* x 2))
        (mapcar #'1+                   ; intermediate list
                '(1 2 3)))

;; => (4 6 8)

Example for Pipelining

bash$ sbcl;

sbcl*

;; Unpack archive and place in #p"~/quicklisp/local-projects/".
;; Dependencies:
;;  1. SERIES ... already available via quicklisp
;;  2. series-expand ... on this web page
;;
(ql:quickload :series-expand)

(series::install :macro t :shadow t)

(use-package :series-expand)

(series-expand
  '(collect
     (map-fn T (lambda (x) (* x 2))
       (map-fn T #'1+
         (scan '(1 2 3))))))

;; => #<Lisp Code> ; pipelined and available for translation

Accessing the Pipeliner in `SERIES`

In the quicklisp package there seems to be one thing missing: A function like (cl:macroexpand ...). Here it is:

(defun series-expand (series-expression)
  (let (series::*renames*
        series::*env*)
    (series::codify
      (series::mergify
        (series::graphify
          series-expression)))))

Remember: In the quicklisp package SERIES there the optimization depends on the collectors! So if there are no collectors then (series-expand ...) won't do much of anything. Likewise, when using multiple values, put (multiple-value-bind (...) (...) (collect ...)) on the outside of (collect ...). Otherwise there will be problems. And, of course, because SERIES does not tie into the Common Lisp's implementation of the compiler and the opaque environments, it is sometimes necessary to use (defmacro ...) instead of (series::defun ...). And don't forget to use once-only and gensyms when programming macros.

Happy Programming.

Shoutout to Ian for being a sounding board on this stuff for over a year.

Bug list

Bug list of `series-expand`
I.D.	Date	Version reported	Version fixed	Notes
—	05 FEB 2023	—	—	Currently no known bugs.

Revision history

Revision history of `series-expand`
I.D.	Date	Version	Archive	Lenght (Human Bytes)	Length (Bytes)	SHA512	MD5
2	05 FEB 2023	1.0.1	series-expand-asdf_v1-0-1.zip	2.5K	`2587`	`4cbdf63066f1210479c63840ccb447ac86f9f21a04ac79cc4ca9bf4925da2ef7f516192219599302f3f665960b333263287af37472866ed9e714e82bdeaed966`	`d7ad5353bbfa90b17d66ae68bd29ca1c`
1	05 FEB 2023	1.0.1	series-expand-asdf_v1-0-1.tar.gz	1.1K	`1118`	`1632583b5bf9f4076ff6b505e784a4420a8f321cee6cf98fb11e81d5c1eea7424905bbb932a97d31222b17d3c2fb6a818c7fb0a393a7d7bd92c5e103afe1d02b`	`3a211c3da71726d68bd787b31157b8f7`