S-expression syntax for Python with 1st-class "macros"

Beni Cherniavsky, OSDC Israel 2006

Latest design and code will appear at http://cben-hacks.sf.net/python/py-expr/

A historical bow to Lisp

I want to particularly credit Paul Graham's (unreleased) Lisp dialect called Arc, and his musings on Lisp nature.

Agenda

S-expressions

Lisp expresses all programs as nested lists:

(if (> x 0)
  (print (sqrt x)))

Benefits of S-expressions

The syntax is very regular.

Python syntax has constraints

Python syntax is very good ("readable pseudocode"). But it could it be simpler. Even if not better, simpler is instructive.

Let's design an S-expression syntax without these drawbacks.

Statement/expression duplication

Statement form Expression form
if x < 0:
    abs_x = -x
else:
    abs_x = x
abs_x = (-x if x < 0 else x) #2.5
def square(arg):
    return arg**2
lambda arg: arg**2
for i in range(5):
    l.append(i**2)
[i**2 for i in range(5)]
for i in range(5):
    yield i**2
(i**2 for i in range(5)) #2.4

Expressions everywhere

So let's allow it:

(def abs (x)
  (if (< x 0) (- x)
      x))

Statements as expressions

What if we do want a sequence of statements?

Docstings

In Python, a docstring is the first statement in a block. What about functions without (do)?

P.S. The idea of docstrings being part of function code (and accessing them at run-time) was a Lisp invention ;-).

The value of def

We made (def) accept an expression. Can it be used in an expression?

Decorators

Python has special syntax for decorators:

@DECORATOR
def FNAME(ARG):
    BODY

(equivallent to doing FNAME = DECORATOR(FNAME) after the def).

But that still repeats the name twice.

Decorators: not repeating the name

Imports

Imports should be expressions too, to allow direct use:

((. (import MODULE) FUNCTION))

Complex imports

But what about importing many modules / names?

try..except/finally

try is a complex statement with many parts, all of them optional:

try:
    BODY
*except ?EXC ?,VAR:
    HANDLER
?else:
    ELSE
?finally:
    FINALLY

The order is right. But detecting meaning by position as in (if) won't be good enough.

try..except: the common case

try..except/finally: rare cases

break, etc.

yield

Python's behaviour of changing any function containing yield into a generator is too magic.

Loops and comprehensions

The syntax and imperative behaviour is pretty obvious:

(for VAR ITERABLE BODY ?ELSE_EXPR)
(while COND BODY ?ELSE_EXPR)

The question is, what does it mean as an expression?

(for x xs (for y ys (f x y)))
Open point:

It must be flat, otherwise there is no way to avoid nesting. But then the following should also be flat:

(def for_y (x)
  (for y ys (f x y)))
(for x xs (for_y x))

What does returning really mean? How can it work?

This is strongly tied to how yield works which I mention later as an open point ;-). I won't go into them here but I will figure it out sooner or later.

Writing custom special forms

Now for the second benefit of S-expressions: they are easy to extend and transform.

Lisp Macros

A Lisp macro is a function recieving its arguments as unevalauted S-expressions (nested lists) and returning a new S-expression to execute/compile instead of it. Example (our syntax):

(defmacro nif (expr pos zero neg)
  "Execute one form according to sign of `expr`."
  (do (= g (gensym))
      (, do (, "=" g expr)
            (, "if" (">" g 0) pos
                    ("==" g 0) zero
                    neg))))

The (gensym) invents a new unique variable name to avoid collisions with variables used in the code.

Want to resolve special forms at run time

One complication arising from macros in Lisp is the need to define all macros before the rest of the code can even be compiled.

No reserved words

Not macros

If we execute the special form's implementation at run time, we don't need to generate code that will replace our code. We can directly execute it.

Fexprs

Non-local exits

When the implementation of break is executed, it has to exit the while loop inside which it runs. How?

Now the hard one: yield should return from the gen that defined it, yet remain restartable. How the hell can this be implemented?

Open point:To really work, this needs either continuations or transforming all special forms into generators. Both are untrivial and I'm not decided yet how to do them right. Both complicate the design so I won't even try presenting them here. Watch http://cben-hacks.sf.net/python/py-expr/ for a design that will include them.

Variable Scoping

Many special forms do tricks with the variable environment:

The freedom is almost endless (but endless use would be a nightmare for the programmer).

What is needed and how can it be implemented?

Assignment to the current environment

Forms like = and def modify value in the current environment. This is easy: let the form recieve the environment as the first argument (as if it's a method of the environment):

def set(env, name, expr):
    env[name] = env(expr)
builtin_env['='] = form(set)
# the `form` decorator will mark special forms.

(written in plain Python because you can't write in S-expressions until somebody implements the builtins).

A local environment with some new variables

Remember that while needs to bind break and continue? (continue and the ELSE expression omitted for brevity):

def while_(env, cond, body):
    Break = unique_exc_type('Break')
    def break_(env, arg):
        raise Break(arg())

    env2 = env.let({'break': break_})

    while env(cond):
        try:
            env2(body)
        except Break, e:
            return e.arg
builtin_env['while'] = while_

Creating a new scope

def should create a new local scope for assignments (for simplicity, name is not optional and docstring is not allow, and keyword args are ignored):

def def_(env, name, args, body):
    # Introspection will not work right on this.
    def func(*actual_args):
        # 'def_env' explained by class scope below
        env2 = getattr(env, 'def_env', env).new_scope()
        env2[args] = actual_args
        return env2(body)
    env[name] = func
builtin_env['def'] = def_

Class scope

Class scope in Python is tricky. The body of the class definition is executed in a new scope. The variables defined there become the class's __dict__. During the execution you can refer to other variables defined there (so x = 2; x *= 3 will work). But functions defined inside the class don't see the class scope. (They should access class attributes through self to allow overriding in subclasses.)

Dynamic variable localization

Python resolves lexical variable scopes at compile time. Obviously if environment tricks are done at run time, this is impossible. Can we dynamically recreate Python's semantics? Close enough:

This is equivallent for any legal Python code and correctly throughs exceptions in code like this:

def outer():
    def inner():
        print x
        x = 1
    x = 0
    inner()

Dynamic variable localization: gory details

Environment narrowing inside expressions

Solution: every argument of a normal function is evaluated in its own small environment. Only inside constructs like (do) the environment is shared with followed forms.

Syntax Usability

The bare S-expression syntax shown so far is not so convenient. It might look cool to a Lisper but not to a Pythoneer. Some S-expression patterns are just too annoying.

It's OK to consider shorthands for a few specific things. Lisp has some too.

Indentation and statement blocks

Warning: Holy Wars material ahead ;-)

One of the things Python is proud of is code structure by indentation. So far we used a (do) construct that throws us back to the age of ugly delimiters.

Less parentheses

Attributes and subscripting

a.b[c].d looks like (. (item (. a b) c) d) in S-expressions.

Notably, this reduces confusion with method calls. It's hard to tell at once that (. obj method) does not call the method. obj.method obviously is not a call (except for imperative lines) and (obj.method) is.

Advanced argument lists

So far I completely ignored Python's argument list features: default, keyword, *rest and **dict arguments.

Advanced argument lists (cont.)

Conclusions

Future directions

First, this is not a time-proven, polished design. It has mistakes and can be improved.

Some languages (e.g. Smalltalk, Ruby) get well with code blocks (easy lambda) and no macros. Should compare to this approach.

Lessons for Python syntax