Latest design and code will appear at http://cben-hacks.sf.net/python/py-expr/
I want to particularly credit Paul Graham's (unreleased) Lisp dialect called Arc, and his musings on Lisp nature.
Lisp expresses all programs as nested lists:
(if (> x 0) (print (sqrt x)))
The syntax is very regular.
Python syntax is very good ("readable pseudocode"). But it could it be simpler. Even if not better, simpler is instructive.
Let's design an S-expression syntax without these drawbacks.
Statement form | Expression form |
---|---|
if x < 0: abs_x = -x else: abs_x = x |
abs_x = (-x if x < 0 else x) #2.5 |
def square(arg): return arg**2 |
lambda arg: arg**2 |
for i in range(5): l.append(i**2) |
[i**2 for i in range(5)] |
for i in range(5): yield i**2 |
(i**2 for i in range(5)) #2.4 |
So let's allow it:
(def abs (x) (if (< x 0) (- x) x))
What if we do want a sequence of statements?
Let (do STATEMENT1 STATEMENT2 ...) execute them all and return None.
Imperative code now is sprinkled with do-s:
(def abs (x) (do (if (< x 0) (do (return (- x))) (do (return x)))))
In Python, a docstring is the first statement in a block. What about functions without (do)?
Solution: it doesn't belong in the (do) anyway. Let it be part of the (def):
(def FNAME (ARGS...) ?DOCSTRING EXPR)
It won't hurt to allow docstings inside a (do) too. It looks better in imperative code.
P.S. The idea of docstrings being part of function code (and accessing them at run-time) was a Lisp invention ;-).
We made (def) accept an expression. Can it be used in an expression?
Python's def sets a variable to the created function. That's nice and useful.
Naming the function is also useful where lambda is normally used — to allow recursion and to give it a __name__.
So we don't need lambda. (def) should also return the function:
(sorted lst (def (cmp a b) (- b a))) # reverse order
It sounds a good idea to restrict the visibility of the name to the definition. More on this in Variable scoping below.
Python has special syntax for decorators:
@DECORATOR def FNAME(ARG): BODY
(equivallent to doing FNAME = DECORATOR(FNAME) after the def).
Special syntax needed because def does not return a value that can be wrapped.
If (def) returns the function, we need no special syntax:
(= FNAME (DECORATOR (def FNAME (ARG) BODY)))
But that still repeats the name twice.
Allowing anonymous (def) won't help because we want the function to have a __name__.
What if (def) automagically got the name from the =?
(= FNAME (DECORATOR (def (ARG) BODY)))
But it'd be annoying to use (=) in all normal function definitions:
(= FNAME (def (ARG) BODY))
So let's change (def) syntax to make the name optional:
(def ?FNAME (ARGS ...) EXPR)
Imports should be expressions too, to allow direct use:
((. (import MODULE) FUNCTION))
import MODULE as NAME is then redudant:
(= NAME (import MODULE))
from MODULE import ATTR as NAME is also redudant:
(= NAME (. (import MODULE) ATTR))
But what about importing many modules / names?
(import MOD1 MOD2 ...) handles the common case. Can't return a value but renaming is a rare use anyway.
(from MODULE EXPR) affects any (import) calls inside EXPR. This is very flexible:
(from xml (= dom (try (import fancydom) # fictional example ImportError (import minidom))))
try is a complex statement with many parts, all of them optional:
try: BODY *except ?EXC ?,VAR: HANDLER ?else: ELSE ?finally: FINALLY
The order is right. But detecting meaning by position as in (if) won't be good enough.
Only 15% of try statements use finally. Less than 10% use more than one except. About 15% want the var in except.
Common case:
(try BODY EXC1 HANDLER1 EXC2 HANDLER2 ... ?ELSE_EXPR)
Of course, it works with expressions: it returns the result of BODY, or of the HANDLER if an exception was caught.
Very handy for things like:
(try (. obj attr) AttributeError 42)
Let the var always be called except ;-):
(try (foo) Exception (print except))
What about catch-all except: clauses? Special case object to mean really anything or something like that...
try..finally can have a separate form (not called finally because it reads strangely in that position):
(guard BODY *FINALLY)
(easy to allow multiple FINALLY exprs because their value is ignored anyway).
(yield ?EXPR) is defined inside (gen):
(def fib () (gen (= a 0) # (gen) is like (do) (= b 1) (while True (do (yield b) (= (, a b) (, b (+ a b))))))))
Python's behaviour of changing any function containing yield into a generator is too magic.
An explicit (gen) allows writing generators without (def):
(sum (gen (for i (range 10) (yield i))))
The syntax and imperative behaviour is pretty obvious:
(for VAR ITERABLE BODY ?ELSE_EXPR) (while COND BODY ?ELSE_EXPR)
The question is, what does it mean as an expression?
(for x xs (for y ys (f x y)))
That's an open point. Depends on implemention of Non-local exits.
Open point: | It must be flat, otherwise there is no way to avoid nesting. But then the following should also be flat: (def for_y (x) (for y ys (f x y))) (for x xs (for_y x)) What does returning really mean? How can it work? This is strongly tied to how yield works which I mention later as an open point ;-). I won't go into them here but I will figure it out sooner or later. |
---|
Now for the second benefit of S-expressions: they are easy to extend and transform.
A Lisp macro is a function recieving its arguments as unevalauted S-expressions (nested lists) and returning a new S-expression to execute/compile instead of it. Example (our syntax):
(defmacro nif (expr pos zero neg) "Execute one form according to sign of `expr`." (do (= g (gensym)) (, do (, "=" g expr) (, "if" (">" g 0) pos ("==" g 0) zero neg))))
The (gensym) invents a new unique variable name to avoid collisions with variables used in the code.
One complication arising from macros in Lisp is the need to define all macros before the rest of the code can even be compiled.
One cool consequence is that we don't need reserved words. def, while and the rest can just be names in the __builtin__ module.
Keywords existing in some contexts only (like break inside loops) are easy to arrange: the loop executes its body in an environment containing that additional word.
If done properly, this should work:
(for x in xs (do (= outer_break break) (for y in ys (if (good x y) (outer_break)))))
The ability to pass special forms around can cause subtle bugs if you pass a special form to a function expecting a normal function. When it calls it, unexpected things will happen...
If we execute the special form's implementation at run time, we don't need to generate code that will replace our code. We can directly execute it.
This is notably simpler than Lisp Macros. For one, we don't need to carefully introduce unique variables into the calling code, we can just use local variables of our special form implementation:
(defform nif (env expr pos zero neg) "Execute one form according to sign of `expr`." (do (= val (env expr)) (if (> val 0) (env pos) (== val 0) (env zero) (env neg))))
Notice how all bogus magic has disappeared!
When the implementation of break is executed, it has to exit the while loop inside which it runs. How?
Now the hard one: yield should return from the gen that defined it, yet remain restartable. How the hell can this be implemented?
Open point: | To really work, this needs either continuations or transforming all special forms into generators. Both are untrivial and I'm not decided yet how to do them right. Both complicate the design so I won't even try presenting them here. Watch http://cben-hacks.sf.net/python/py-expr/ for a design that will include them. |
---|
Many special forms do tricks with the variable environment:
The freedom is almost endless (but endless use would be a nightmare for the programmer).
What is needed and how can it be implemented?
Forms like = and def modify value in the current environment. This is easy: let the form recieve the environment as the first argument (as if it's a method of the environment):
def set(env, name, expr): env[name] = env(expr) builtin_env['='] = form(set) # the `form` decorator will mark special forms.
(written in plain Python because you can't write in S-expressions until somebody implements the builtins).
Remember that while needs to bind break and continue? (continue and the ELSE expression omitted for brevity):
def while_(env, cond, body): Break = unique_exc_type('Break') def break_(env, arg): raise Break(arg()) env2 = env.let({'break': break_}) while env(cond): try: env2(body) except Break, e: return e.arg builtin_env['while'] = while_
def should create a new local scope for assignments (for simplicity, name is not optional and docstring is not allow, and keyword args are ignored):
def def_(env, name, args, body): # Introspection will not work right on this. def func(*actual_args): # 'def_env' explained by class scope below env2 = getattr(env, 'def_env', env).new_scope() env2[args] = actual_args return env2(body) env[name] = func builtin_env['def'] = def_
Class scope in Python is tricky. The body of the class definition is executed in a new scope. The variables defined there become the class's __dict__. During the execution you can refer to other variables defined there (so x = 2; x *= 3 will work). But functions defined inside the class don't see the class scope. (They should access class attributes through self to allow overriding in subclasses.)
This means an env needs a separate def_env attribute to specify which scope def-s will inherit from:
def class_(env, name, bases, body): env2 = env.new_scope() env2.def_env = env # defs wont see env2 env2(body) env[name] = type(name, bases, env2.dict)
Python resolves lexical variable scopes at compile time. Obviously if environment tricks are done at run time, this is impossible. Can we dynamically recreate Python's semantics? Close enough:
This is equivallent for any legal Python code and correctly throughs exceptions in code like this:
def outer(): def inner(): print x x = 1 x = 0 inner()
The exception is thrown by x = 1, not by print x.
This allows writing functions where scoping depends on the code path:
x = 0 def f(localize): if localize: x = 1 print x else: print x
both paths work. It can be argued not a bug (harder to say about feature).
It is simpler to understand and implement than Python's static analysis.
Solution: every argument of a normal function is evaluated in its own small environment. Only inside constructs like (do) the environment is shared with followed forms.
The bare S-expression syntax shown so far is not so convenient. It might look cool to a Lisper but not to a Pythoneer. Some S-expression patterns are just too annoying.
It's OK to consider shorthands for a few specific things. Lisp has some too.
Warning: Holy Wars material ahead ;-)
One of the things Python is proud of is code structure by indentation. So far we used a (do) construct that throws us back to the age of ugly delimiters.
Idea: a colon (:) followed by indented lines is syntax sugar for (do). Instead of:
(def abs (x) (do (if (< x 0) (do (return (- x))) (do (return x)))))
you can write:
(def abs (x): (if (< x 0): (return (- x)) : (return x)))
Outrageous Idea: inside (do), values are discarded. So practically always we will be calling functions.
Let each line inside a colon block be a function call, without writing the outer parens.
Let the top-level have line list syntax:
def abs (x): if (< x 0): return (- x) : return x
A logical line can be continued by indentation even if not terminated by a colon.
This creates a nice style separation between imperative and functional code.
a.b[c].d looks like (. (item (. a b) c) d) in S-expressions.
Notably, this reduces confusion with method calls. It's hard to tell at once that (. obj method) does not call the method. obj.method obviously is not a call (except for imperative lines) and (obj.method) is.
So far I completely ignored Python's argument list features: default, keyword, *rest and **dict arguments.
Interally, every Python callable recieves a tuple of positional args and dict of keyword args.
One approach: let (f arg1 arg2 kw1=val1 kw2=val2) stand not just for a list but for something like:
('f', ['arg1', 'arg2'], ('dict', [[[('const', ['kw1']), 'val1'], [('const', ['kw2']), 'val2']]]))
Then (f *args **kw) directly stands for:
('f', 'args', 'kw')
and (f arg1 arg2 *args) requires:
('f', ('+', [['arg1', 'arg2'], args]))
First, this is not a time-proven, polished design. It has mistakes and can be improved.
Some languages (e.g. Smalltalk, Ruby) get well with code blocks (easy lambda) and no macros. Should compare to this approach.
Custom special forms don't really require S-expressions!
Suppose you take existing Python syntax and evaluate anything starting a statement (no parens, colon at end) as a special form? Function calls are not impacted, making it more robust and effecient.
But what about expression use? Having myif COND: BLOCK I also want EXPR myif COND.
Internally, it means all special forms are explicitly marked, e.g.:
(form myif COND BLOCK)
with proper syntax sugar, this is fine.