******************************************************** S-expression syntax for Python with 1st-class "macros" ******************************************************** Beni Cherniavsky, OSDC Israel 2006 ********************************** Latest design and code will appear at http://cben-hacks.sf.net/python/py-expr/ .. include:: .. # Actually I use it for Python expressions (when I mean the value rather than the code), but the normal defaul role looks OK (italic). .. default-role:: title-reference A historical bow to Lisp ======================== * The year is 2006 and we talk about "P" languages. * The first Lisp interpreter appeared in 1958! * Pypy implements Python in Python; Perl6 eventually wants to be written in Perl6. * Most great Lisps are written in Lisp. * P languages are increasingly used for low-level system tasks. * Lisp Machines were *all Lisp* in the 1980s. .. class:: small I want to particularly credit Paul Graham's (unreleased) Lisp dialect called Arc, and his musings on Lisp nature. Agenda ====== .. class:: small * S-expression syntax for Python: * Remove duplication and restrictions. * Keep Python's implicit variable scoping. * Stick to Python names. They are better anyway ;-). * Regular syntax allows custom control constructs. * Want lookup in run time => first-class macros. * Quest for *readable* macros. Macros are not. * How variable scoping works. * Syntax usability: * Give me back my indentation! * Attributes and subscripting? * Still not good Python but perhaps a good Lisp? S-expressions ============= Lisp expresses all programs as nested lists:: (if (> x 0) (print (sqrt x))) * That's ``['if', ['>', 'x', 0], ['print', ['sqrt', 'x']]]`` — only cleaner. * Prefix notation: function is the head of the list. * Operators and control structures ("special forms" in Lisp jargon) look just like functions! Benefits of S-expressions ========================= The syntax is *very* regular. * No corner cases => freedom of syntax design. * Anything usedable anywhere => freedom of use. * Builtin constructs look like user functions => the syntax can be fully extensible. Python syntax has constraints ============================= Python syntax is very good ("readable pseudocode"). But it could it be *simpler*. Even if not *better*, simpler is instructive. * It has duplication between statements and expressions. * Partial duplication: some things are simply lacking! * It has restrictions forbidding statements inside expressions. * Can't extend the syntax. Some abstractions remain awkward to use. * Decorators and ``with`` are examples of partial solutions. .. class:: handout Let's design an S-expression syntax without these drawbacks. Statement/expression duplication ================================ .. class:: small +-----------------------+------------------------------------------+ | Statement form | Expression form | +=======================+==========================================+ |:: |``abs_x = (-x if x < 0 else x)`` #2.5 | | | | | if x < 0: | | | abs_x = -x | | | else: | | | abs_x = x | | | | | | | | +-----------------------+------------------------------------------+ |:: |``lambda arg: arg**2`` | | | | | def square(arg): | | | return arg**2 | | +-----------------------+------------------------------------------+ |:: |``[i**2 for i in range(5)]`` | | | | | for i in range(5): | | | l.append(i**2) | | | | | +-----------------------+------------------------------------------+ |:: |``(i**2 for i in range(5))`` #2.4 | | | | | for i in range(5): | | | yield i**2 | | +-----------------------+------------------------------------------+ Expressions everywhere ====================== * ``lambda`` is not just anonymous ``def`` — it directly takes an expression. * That's useful — why should we write the ``return`` when we code in functional style? * Similarly for ``if``, ``try``, etc... So let's allow it:: (def abs (x) (if (< x 0) (- x) x)) .. class:: handout * Note the lack of ``(return)``. .. class:: small Statements as expressions ========================= What if we do want a sequence of statements? * Let ``(do STATEMENT1 STATEMENT2 ...)`` execute them all and return `None`. * This keeps with Python's approach of returning `None` from imperative operations (e.g. `list.sort()`) to reduce confusion. Let's assume it's a good idea. * Imperative code now is sprinkled with ``do``-s:: (def abs (x) (do (if (< x 0) (do (return (- x))) (do (return x))))) * That's ugly compared to the functional equivallent. * So don't write imperative code where you don't have to! * Also, we'll eventually introduce shorthand colon+indentation syntax for the ``do``. So ignore the ugliness for now. .. class:: handout * Below the surface, the ugliness is there in Python today — every block of statements is another nested construct in the parse tree to allow for multiple statements. .. class:: small Docstings ========= In Python, a docstring is the first statement in a block. What about functions without ``(do)``? * Solution: it doesn't belong in the ``(do)`` anyway. Let it be part of the ``(def)``:: (def FNAME (ARGS...) ?DOCSTRING EXPR) * It won't hurt to allow docstings inside a ``(do)`` too. It looks better in imperative code. P.S. The idea of docstrings being part of function code (and accessing them at run-time) was a Lisp invention ;-). .. class:: small The value of ``def`` ==================== We made ``(def)`` accept an expression. Can it be used in an expression? * Python's ``def`` sets a variable to the created function. That's nice and useful. * Naming the function is also useful where ``lambda`` is normally used — to allow recursion and to give it a `__name__`. * So we don't need ``lambda``. ``(def)`` should also return the function:: (sorted lst (def (cmp a b) (- b a))) # reverse order * It sounds a good idea to restrict the visibility of the name to the definition. More on this in `Variable scoping`_ below. .. class:: small Decorators ========== Python has special syntax for decorators:: @DECORATOR def FNAME(ARG): BODY (equivallent to doing ``FNAME = DECORATOR(FNAME)`` after the ``def``). * Special syntax needed because ``def`` does not return a value that can be wrapped. * If ``(def)`` returns the function, we need no special syntax:: (= FNAME (DECORATOR (def FNAME (ARG) BODY))) But that still repeats the name twice. .. class:: small Decorators: not repeating the name ================================== * Allowing anonymous ``(def)`` won't help because we want the function to have a `__name__`. * What if ``(def)`` automagically got the name from the ``=``? :: (= FNAME (DECORATOR (def (ARG) BODY))) * But it'd be annoying to use ``(=)`` in *all* normal function definitions:: (= FNAME (def (ARG) BODY)) So let's change ``(def)`` syntax to make the name optional:: (def ?FNAME (ARGS ...) EXPR) * Same behavior for ``class``. .. class:: small Imports ======= Imports should be expressions too, to allow direct use:: ((. (import MODULE) FUNCTION)) * ``import MODULE as NAME`` is then redudant:: (= NAME (import MODULE)) * ``from MODULE import ATTR as NAME`` is also redudant:: (= NAME (. (import MODULE) ATTR)) .. class:: small Complex imports =============== But what about importing many modules / names? * ``(import MOD1 MOD2 ...)`` handles the common case. Can't return a value but renaming is a rare use anyway. * ``(from MODULE EXPR)`` affects any ``(import)`` calls inside EXPR. This is very flexible:: (from xml (= dom (try (import fancydom) # fictional example ImportError (import minidom)))) .. class:: small ``try..except/finally`` ======================= ``try`` is a complex statement with many parts, all of them optional:: try: BODY *except ?EXC ?,VAR: HANDLER ?else: ELSE ?finally: FINALLY The order is right. But detecting meaning by position as in ``(if)`` won't be good enough. .. class:: small ``try..except``: the common case ================================ * Only 15% of ``try`` statements use ``finally``. Less than 10% use more than one ``except``. About 15% want the var in ``except``. * Common case:: (try BODY EXC1 HANDLER1 EXC2 HANDLER2 ... ?ELSE_EXPR) * Of course, it works with expressions: it returns the result of BODY, or of the HANDLER if an exception was caught. * Very handy for things like:: (try (. obj attr) AttributeError 42) .. class:: small ``try..except/finally``: rare cases =================================== * Let the var always be called ``except`` ;-):: (try (foo) Exception (print except)) * What about catch-all ``except:`` clauses? Special case `object` to mean really anything or something like that... * ``try..finally`` can have a separate form (not called ``finally`` because it reads strangely in that position):: (guard BODY *FINALLY) (easy to allow multiple FINALLY exprs because their value is ignored anyway). .. class:: small ``break``, etc. =============== * ``(return ?EXPR)`` is defined inside functions. * Let ``(ret ?EXPR)`` be defined inside any ``(do ...)``, returning from the inner-most one. * ``(break ?EXPR)`` and ``(continue)`` are defined inside loops. * Note that we allow an argument to ``(break)``. It will be returned from the loop it exits. .. class:: small ``yield`` ========= * ``(yield ?EXPR)`` is defined inside ``(gen)``:: (def fib () (gen (= a 0) # (gen) is like (do) (= b 1) (while True (do (yield b) (= (, a b) (, b (+ a b)))))))) Python's behaviour of changing any function containing ``yield`` into a generator is too magic. * An explicit ``(gen)`` allows writing generators without ``(def)``:: (sum (gen (for i (range 10) (yield i)))) .. class:: small Loops and comprehensions ======================== The syntax and imperative behaviour is pretty obvious:: (for VAR ITERABLE BODY ?ELSE_EXPR) (while COND BODY ?ELSE_EXPR) The question is, what does it mean as an expression? :: (for x xs (for y ys (f x y))) * Is it list comprehension — ``[f(x,y) for x in xs for y in ys]``? * Or generator expression — ``(f(x,y) for x in xs for y in ys)``? * Is it nested — ``[[f(x,y) for y in ys] for x in xs]`` or flat? .. class:: hidden slide-display That's an open point. Depends on implemention of `Non-local exits`_. .. class:: handout :Open point: It must be flat, otherwise there is no way to avoid nesting. But then the following should also be flat:: (def for_y (x) (for y ys (f x y))) (for x xs (for_y x)) What does returning really mean? How can it work? This is strongly tied to how ``yield`` works which I mention later as an open point ;-). I won't go into them here but I will figure it out sooner or later. Writing custom special forms ============================ Now for the second benefit of S-expressions: they are easy to extend and transform. * Lisp macros allow easier and more powerful syntax extension than any other language. * This is not just a buzzword — this made Lisps what they are. It is indispensible for writing programs in Lisp and it hs been indispensible for evolving the language itself (the difference is not always clear). .. class:: small Lisp Macros =========== A Lisp macro is a function recieving its arguments as unevalauted S-expressions (nested lists) and returning a new S-expression to execute/compile instead of it. Example (our syntax):: (defmacro nif (expr pos zero neg) "Execute one form according to sign of `expr`." (do (= g (gensym)) (, do (, "=" g expr) (, "if" (">" g 0) pos ("==" g 0) zero neg)))) The (gensym) invents a new unique variable name to avoid collisions with variables used in the code. .. class:: small Want to resolve special forms at run time ========================================= One complication arising from macros in Lisp is the need to define all macros before the rest of the code can even be compiled. * That'd be bad in Python because we might want to import macros from modules and imports are only resolved at run time. * The whole concept of separate "compile time" and "run time" is inconvenient and confusing. We really don't want it Python. * So, let's look up the macros at run time. Evaluating an S-expression will start with evaluating its head and checking whether it's a special form. * If yes, call it with the arguments as-is. * If it's a simple function, evaluate the other argument and call it with their values. * How are they recognized? Suppose that instead of `f.__call__()`, they have `f.eval()`. Something on these lines. .. class:: small No reserved words ================= * One cool consequence is that we don't need reserved words. ``def``, ``while`` and the rest can just be names in the `__builtin__` module. * Keywords existing in some contexts only (like ``break`` inside loops) are easy to arrange: the loop executes its body in an environment containing that additional word. * If done properly, this should work:: (for x in xs (do (= outer_break break) (for y in ys (if (good x y) (outer_break))))) * The ability to pass special forms around can cause subtle bugs if you pass a special form to a function expecting a normal function. When it calls it, unexpected things will happen... .. class:: small Not macros ========== If we execute the special form's implementation at run time, we don't need to generate code that will replace our code. We can directly execute it. * This is notably simpler than `Lisp Macros`_. For one, we don't need to carefully introduce unique variables into the calling code, we can just use local variables of our special form implementation:: (defform nif (env expr pos zero neg) "Execute one form according to sign of `expr`." (do (= val (env expr)) (if (> val 0) (env pos) (== val 0) (env zero) (env neg)))) Notice how all bogus magic has disappeared! * The `env` argument is needed for environment tricks later. It also evaluates S-expressions in the proper environment. .. class:: small Fexprs ====== * This is not a new idea. It was called "Fexpr"s or "Fsubr"s in early Lisps and was evenatually replaced by macros to improve performance. * I think we can now afford what this luxury of ineffiecient simplicicty that was unacceptable then. Just as people work now with Python/Perl/PHP/Tcl without a native compiler, while several decades earlier most wouldn't touch Lisps that *did* have compilers (state of the art ones, faster than Fortran or C for good code!). * "Fexpr" is a bogus name, so let's just call them special forms. .. class:: small Non-local exits =============== When the implementation of ``break`` is executed, it has to exit the ``while`` loop inside which it runs. How? * A an exception can do it. A unique exception type for every instance, if we want to exit the loop which defined *this* ``break``. * ``try`` will have to hide these exceptions from normal "catch-all" clauses. They should have a special base class. * This covers ``break``, ``continue``, ``return``, and ``ret``. Now the hard one: ``yield`` should return from the ``gen`` that defined it, yet remain restartable. How the hell can this be implemented? :Open point: To really work, this needs either continuations or transforming all special forms into generators. Both are untrivial and I'm not decided yet how to do them right. Both complicate the design so I won't even try presenting them here. Watch http://cben-hacks.sf.net/python/py-expr/ for a design that will include them. Variable Scoping ================ Many special forms do tricks with the variable environment: * ``def`` creates a new scope. * ``class`` creates a temporal scope (invisble to methods). * Generator expressions add a single variable binding. The freedom is almost endless (but endless use would be a nightmare for the programmer). What is needed and how can it be implemented? .. class:: small Assignment to the current environment ===================================== Forms like ``=`` and ``def`` modify value in the current environment. This is easy: let the form recieve the environment as the first argument (as if it's a method of the environment):: def set(env, name, expr): env[name] = env(expr) builtin_env['='] = form(set) # the `form` decorator will mark special forms. (written in plain Python because you can't write in S-expressions until somebody implements the builtins). * Note that `name` is not evaluated. It's S-expression form is inspected directly. * The logic of assignments to tuples/items/attributes has to live somewhere. Let's say it's implemented once in `env.__setitem__`. .. class:: small A local environment with some new variables =========================================== Remember that ``while`` needs to bind ``break`` and ``continue``? (``continue`` and the ELSE expression omitted for brevity):: def while_(env, cond, body): Break = unique_exc_type('Break') def break_(env, arg): raise Break(arg()) env2 = env.let({'break': break_}) while env(cond): try: env2(body) except Break, e: return e.arg builtin_env['while'] = while_ .. class:: handout * ``break``-like things follow a constant pattern. They should and can be abstracted more. * Assigments inside the ``while`` body must still go to the parent environment. `env.let()` should bind given variable but share all the rest. .. class:: small Creating a new scope ==================== ``def`` should create a new local scope for assignments (for simplicity, name is not optional and docstring is not allow, and keyword args are ignored):: def def_(env, name, args, body): # Introspection will not work right on this. def func(*actual_args): # 'def_env' explained by class scope below env2 = getattr(env, 'def_env', env).new_scope() env2[args] = actual_args return env2(body) env[name] = func builtin_env['def'] = def_ .. class:: small Class scope =========== Class scope in Python is tricky. The body of the class definition is executed in a new scope. The variables defined there become the class's `__dict__`. During the execution you can refer to other variables defined there (so ``x = 2; x *= 3`` will work). But *functions* defined inside the class don't see the class scope. (They should access class attributes through `self` to allow overriding in subclasses.) * This means an env needs a separate `def_env` attribute to specify which scope ``def``-s will inherit from:: def class_(env, name, bases, body): env2 = env.new_scope() env2.def_env = env # defs wont see env2 env2(body) env[name] = type(name, bases, env2.dict) .. class:: small Dynamic variable localization ============================= Python resolves lexical variable scopes at compile time. Obviously if environment tricks are done at run time, this is impossible. Can we dynamically recreate Python's semantics? Close enough: * When you assign to a variable in an env, it automatically becomes local to this env. * When you look up a variable not assigned in the current env, it's looked up in parent environements and marked as free in the current env. * When you try to assign to variable marked as free, you get exception. This is equivallent for any legal Python code and correctly throughs exceptions in code like this:: def outer(): def inner(): print x x = 1 x = 0 inner() .. class:: small Dynamic variable localization: gory details =========================================== * The exception is thrown by ``x = 1``, not by ``print x``. * This allows writing functions where scoping depends on the code path:: x = 0 def f(localize): if localize: x = 1 print x else: print x both paths work. It can be argued not a bug (harder to say about feature). * It is simpler to understand and implement than Python's static analysis. .. class:: small Environment narrowing inside expressions ======================================== * A ``for`` loop should put its variable in the surrounding environment for inspection after the loop. But a list/generator comprehension should not. * A ``def`` should put the function in the surrounding environment for calling. But when used as an anonymous function, it should not. Yet it should bind the name for the function to see, in case it's recursive. * An ``import`` used in an expression (``(. (import MODULE) ATTR)``) should not set the variable for global use. Solution: every argument of a normal function is evaluated in its own small environment. Only inside constructs like ``(do)`` the environment is shared with followed forms. Syntax Usability ================ The bare S-expression syntax shown so far is not so convenient. It might look cool to a Lisper but not to a Pythoneer. Some S-expression patterns are just too annoying. It's OK to consider shorthands for a few specific things. Lisp has some too. .. class:: small Indentation and statement blocks ================================ *Warning: Holy Wars material ahead ;-)* .. class:: handout One of the things Python is proud of is code structure by indentation. So far we used a ``(do)`` construct that throws us back to the age of ugly delimiters. * Idea: a colon (``:``) followed by indented lines is syntax sugar for ``(do)``. Instead of:: (def abs (x) (do (if (< x 0) (do (return (- x))) (do (return x))))) you can write:: (def abs (x): (if (< x 0): (return (- x)) : (return x))) * Note how it *does* work inside parens. .. class:: small Less parentheses ================ * Outrageous Idea: inside ``(do)``, values are discarded. So practically always we will be calling functions. * Let each line inside a colon block be a function call, without writing the outer parens. * Let the top-level have line list syntax:: def abs (x): if (< x 0): return (- x) : return x * This looks quite like Python! * Interactive prompt is a bit tricky because you do want to evaluate single variables. * A logical line can be continued by indentation even if not terminated by a colon. * This creates a nice style separation between imperative and functional code. .. class:: small Attributes and subscripting =========================== ``a.b[c].d`` looks like ``(. (item (. a b) c) d)`` in S-expressions. * This looks very bad because the operation and index/attr get farther and farther from each other. * Reversing them: ``(. d (item c (. b a)))`` looks very bogus. * A postfix form could be created: ``(access a (. b) (item c) (. d))`` but it's still questionable. * Infix syntax doesn't blend with S-expressions because it creates structure ambiguities. * But attribute lookup is safe becase the attribute is always a string. * And b[c] indexing is safe because one of the expressions is always delimited by brackets. * So we can get away with allowing ``a.b[c].d``! Notably, this reduces confusion with method calls. It's hard to tell at once that ``(. obj method)`` does not call the method. ``obj.method`` obviously is not a call (except for imperative lines) and ``(obj.method)`` is. .. class:: small Advanced argument lists ======================= So far I completely ignored Python's argument list features: default, keyword, ``*rest`` and ``**dict`` arguments. * Interally, every Python callable recieves a tuple of positional args and dict of keyword args. * One approach: let ``(f arg1 arg2 kw1=val1 kw2=val2)`` stand not just for a list but for something like:: ('f', ['arg1', 'arg2'], ('dict', [[[('const', ['kw1']), 'val1'], [('const', ['kw2']), 'val2']]])) * Tuples do function application and lists return lists of their elements' values. .. class:: small Advanced argument lists (cont.) =============================== * Then ``(f *args **kw)`` directly stands for:: ('f', 'args', 'kw') and ``(f arg1 arg2 *args)`` requires:: ('f', ('+', [['arg1', 'arg2'], args])) Conclusions =========== * Erasing boundaries between different syntax elements can simplify the langauge and make it more flexible. * Pythonic macros must be looked up at runtime. * And they need not be macros. * Between Python syntax and pure S-expressions, there is a whole continium of syntax sugar. We can experiment. .. class:: small Future directions ================= First, this is not a time-proven, polished design. *It has mistakes* and can be improved. * I don't claim it is better than Python for writing code. It was rather intended as food for thought on the price we pay for Python's syntax. * I suspect it might be attractive for Lispers, giving access to Python's vast library without giving up Lisp syntax. * Performance will be horrible, of course. * Part of it is worth the flexibility. Python is already very dynamic; some more can't hurt, can it? * Implementing something like this into PyPy might give cool results. Some languages (e.g. Smalltalk, Ruby) get well with code blocks (easy lambda) and no macros. Should compare to this approach. .. class:: small Lessons for Python syntax ========================= * Custom special forms don't really require S-expressions! * Suppose you take existing Python syntax and evaluate anything starting a statement (no parens, colon at end) as a special form? Function calls are not impacted, making it more robust and effecient. * But what about expression use? Having ``myif COND: BLOCK`` I also want ``EXPR myif COND``. * Internally, it means all special forms are explicitly marked, e.g.:: (form myif COND BLOCK) with proper syntax sugar, this is fine.