How can we afford to grow the pie?

Faster Loops

Originally written .
Last updated .
See bug list and revision history below.

Virtually all interesting data processing requires loops. Speed is a feature. Speed is valuable. The speed of loops can sometimes be improved by spatial data locality and by temporal locality. The difference is eminently practical. For example, Adam Drake achieved a speedup of factor 235 by leveraging locality. His simple trick seems to work, because his data fits in his laptop's RAM. Servers have a surprising amount of RAM nowadays.. Note, that it seems unlikely for this trick to work on unbounded data, because storing data requires physical space. This in turn, implies a lower bound on the minimum distance and therefore, a minimum bound on the achievable latency and locality.

Repeated computations, meaning loops, can be implemented in many different technical ways, for example recursion, recursion-trampolines or iteration. Recursion saves state on something known as the call-stack. Trampolines put function objects onto something known as the heap as opposed to the call-stack. Well-designed iteration only touches the heap and conceptually avoids the additional function objects that trampolines require.

This sounds like iteration looks preferable from this angle. One solution to transform recursion into iteration is kown as tail-call optimization (TCO) or tail-call elimination. Some programming languages implement TCO, however, JavaScript de-facto has no tail-call optimization. And de-facto standards seem to be the most effective and most impactful standards. They are observed in the real world.

Another way of implementing loops iteratively is pipelining. With caches and prefetchers modern CPUs seem to be amenable to this approach.

Pipelining loops takes Adam Drake's trick one step further by folding his chain of command line tools into one programming language subroutine or function. This folds them into a single operating system process. In contrast, unix shell pipelines involve multiple unix processes and system call overhead for communicating between them. This overhead may hamper data locality. The speedup comes from limiting indirection. See also discussions on token-threading vs. subroutine-threading.

The SERIES package by Richard C. Waters (1989a1989b) provides a compiler and pipeliner for loops. Think Pipelined Relational Query Language (PRQL) but tested for since approximately 1980. Library SERIES went through about ten years of testing, before being listed in Appendix A of Common Lisp the Language, 2nd edition and Appendix B of Common Lisp the Language, 2nd edition. Generator have even made it into Python3.

Example for SERIES

Data pipeline that benefits from pipelining.
Data pipeline that benefits from pipelining.

bash$ sbcl;

sbcl*

;; Dependencies:
;;  1. SERIES ... already available via quicklisp
;;
(ql:quickload :series)

(series::install :macro t :shadow t)

(use-package :series)

;; No intermediate lists
(collect                               ; pipelined by SERIES
  (map-fn T (lambda (x) (* x 2))
    (map-fn T #'1+
      (scan '(1 2 3)))))

;; => (4 6 8)



;; No intermediate lists
(mapcar (lambda (x) (* (1+ x) 2))      ; pipelined by hand
        '(1 2 3))

;; => (4 6 8)



;; Intermediate lists
(mapcar (lambda (x) (* x 2))
        (mapcar #'1+                   ; intermediate list
                '(1 2 3)))

;; => (4 6 8)

Example for Pipelining


bash$ sbcl;

sbcl*

;; Unpack archive and place in #p"~/quicklisp/local-projects/".
;; Dependencies:
;;  1. SERIES ... already available via quicklisp
;;  2. series-expand ... on this web page
;;
(ql:quickload :series-expand)

(series::install :macro t :shadow t)

(use-package :series-expand)

(series-expand
  '(collect
     (map-fn T (lambda (x) (* x 2))
       (map-fn T #'1+
         (scan '(1 2 3))))))

;; => #<Lisp Code> ; pipelined and available for translation

Accessing the Pipeliner in SERIES

In the quicklisp package there seems to be one thing missing: A function like (cl:macroexpand ...). Here it is:


(defun series-expand (series-expression)
  (let (series::*renames*
        series::*env*)
    (series::codify
      (series::mergify
        (series::graphify
          series-expression)))))

Remember: In the quicklisp package SERIES there the optimization depends on the collectors! So if there are no collectors then (series-expand ...) won't do much of anything. Likewise, when using multiple values, put (multiple-value-bind (...) (...) (collect ...)) on the outside of (collect ...). Otherwise there will be problems. And, of course, because SERIES does not tie into the Common Lisp's implementation of the compiler and the opaque environments, it is sometimes necessary to use (defmacro ...) instead of (series::defun ...). And don't forget to use once-only and gensyms when programming macros.

Happy Programming.

Shoutout to Ian for being a sounding board on this stuff for over a year.

Bug list

Bug list of series-expand
I.D.DateVersion reportedVersion fixedNotes
Currently no known bugs.

Revision history

Revision history of series-expand
I.D.DateVersionArchiveLenght (Human Bytes)Length (Bytes)SHA512MD5
21.0.1series-expand-asdf_v1-0-1.zip2.5K25874cbdf63066f1210479c63840ccb447ac86f9f21a04ac79cc4ca9bf4925da2ef7f516192219599302f3f665960b333263287af37472866ed9e714e82bdeaed966d7ad5353bbfa90b17d66ae68bd29ca1c
11.0.1series-expand-asdf_v1-0-1.tar.gz1.1K11181632583b5bf9f4076ff6b505e784a4420a8f321cee6cf98fb11e81d5c1eea7424905bbb932a97d31222b17d3c2fb6a818c7fb0a393a7d7bd92c5e103afe1d02b3a211c3da71726d68bd787b31157b8f7