PEP:XXX
Title:Refusing to Guess in the String Formatting Operation
Version:$Revision: 32 $
Last-Modified:$Date: 2006-04-12 02:30:20 -0700 (Wed, 12 Apr 2006) $
Author:Beni Cherniavsky <cben at users.sf.net>
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Created:02-Mar-2003
Python-Version:2.3
Post-History:02-Mar-2003

Contents

Abstract

The string formatting operation - format % values - has a wart. In the face of non-tuple values, it assumes a singleton tuple around it. It's tempting to use this since singleton tuples look ugly but code using this easily breaks (when the single object happens to be tuple).

This PEP proposes several ways to fix this wart.

The problem was discussed repeatedly but the specific solutions proposed in this PEP need more feedback.

Motivation

In the face of ambiguity, refuse the temptation to guess.

—The Zen of Python, by Tim Peters

The simple use format % (val1, val2, ...) with a tuple is completely robust. However, since frequently there is only one value and singleton tuples are quite awkward to write and read, a shorthand is allowed - when the right object is not a tuple, it's interpreted as if it was wrapped in a singleton tuple.

This shorthand is bad because:

  1. When you pass a single object without the singleton tuple around it, your code will break it the object happens to be a tuple:

    >>> def decorate(obj):
    ...     return '-> %s <-' % obj
    ...
    >>> decorate(1)
    '-> 1 <-'
    >>> decorate((1,))
    '-> 1 <-'           # instead of '-> (1,) <-'
    >>> decorate((1,2))
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "<stdin>", line 2, in decorate
    TypeError: not all arguments converted during string formatting
    

    Your program can silently emit wrong results or crash. This is the classic example of the imposibility to create a single robust interface that will accept either an object or a sequence and do the right thing. Any attempt to do so invites bugs. This has bitten many Python newbies and was dicussed many times on comp.lang.python.

  2. This temptation will exist forever under the current design. The purpose of this PEP is to provide altenative ways to format a single object that are as convenient as formating several objects, so that the temptation will disappear.

  3. It is not possible to use sequences other than tuples (e.g. lists) for passing multiple values to the formatting operation, because the sequence is interpreted as a single object. For example, it's useful to be able to use a transparent debugging proxy that logs all accesses isntead of an object; if such a proxy for a tuple is used on the right side of a formatting operation, the transparency breaks. More generally, it's unpythonic to discriminate objects by actual type instead of the interface they implement.

The formatting operator also has a third mode: when the right argument should be a mapping. This mode is triggered deterministically by the "%(...)..." syntax in the format rather than by checking the type of the right argument, so it doesn't need fixing.

Single-value operator

Specification

Add a new string operator, /, that will always accept a single value:

>>> '-> %s <-' / 1
'-> 1 <-'
>>> '-> %s <-' / (1 ,)
'-> (1,) <-'

To be always availiable, this would need to be duplicated as both str.__div__ and str.__truediv__.

Rationale

This allows unambiguos expression of formatting with a single value, with minimal hassle when you change the code between one and several values. It's probably the least-intrusive fix for the problem.

The / operator was choosen for this because this is the "closest relative" of %. Mnemonic: / looks like a subset of % but with just one stroke :-).

Downsides: it's an arbitrary punctuation; the mnemonic isn't better than perl's motivations for $] and friends... The div/truediv duplication is a hack that highlights the fact that this has nothing to do with division. Some people might start to wonder what // (floordiv) does with strings... (Should it do mapping formatting? I don't see the need - leave it to %.)

Note: PEP 292 mentions that / was proposed by somebody for string interpolation against a dictionary. This reinforces the concern that it is arbitrary - different people have different ideas for its use. OTOH, this shows that it's the obvious "closest relative" of %...

Deprecation of format % single-value

Once a unambiguos method to format a single value exist, there is little need (except for backward compatibility) for % to accept non-tuple right arguments. Thus it makes sense to deprecate this, eventually removing it completely. This will break a lot of existing code, not all of which is buggy, so it's better to proceed with this slowly.

Alphabetic Method

This is not proposed as a sufficient solution but rather as an additional thing that can optionally be implemented. Actually the PEP author tends to think this is not needed but it is listed for completeness - perhaps others will like it more.

Specification

Add a new string method with an alphabetic (non-magic) name that would accept a variable number of arguments (besides self) that are the values for the formatting operation:

>>> '-> %s <-'.fmt(1)
'-> 1 <-'
>>> '-> %s %s <-'.fmt(1,2)
'-> 1 2 <-'

When the format selects values by names ("%(...)..."), there are several possible interfaces for passing the mapping (options 2 or 3 are preferred):

  1. Keyword arguments:

    >>> '-> %(a)s <-'.fmt(a=1)
    '-> 1 <-'
    >>> '-> %(!@#$)s <-'.fmt(**{'!@#$': 1})
    '-> 1 <-'
    
  2. As a single mapping argument:

    >>> '-> %(a)s <-'.fmt({'a': 1})
    '-> 1 <-'
    
  3. Support both ways.

    Since this becomes very similar to the behaviour of the dict() constructor (since python 2.3), perhaps it makes sense to complete the similirity and also support a sequence of items as an argument:

    >>> # The above two ways work; in addition:
    >>> '-> %(a)s <-'.fmt([('a', 1)])
    '-> 1 <-'
    

    All this could be easily implemented by just passing all arguments of the formatting method to dict(). Note however, that when the argument is already a mapping (has a keys attribute), dict() should not be called on it, because that would copy the whole dictionary (and frequently big mappings are used when the format only accesses several keys).

Suggested names for the new method: fmt, format, sprintf.

Rationale

  • This notation works equally well for one and multiple values thanks to the call syntax of Python (at the price of some verbosity). This benefit will only be realised if either of the following happens:

    • Everybody completely switches to it and the % notation is deprecated - this doesn't look like a good idea.
    • The % notation is fixed by other means so that both notations are applicable at any situation (without dependence on the number of values), subject to the programmer's taste.

    That's why this solution is not sufficient in any case.

  • format.fmt looks more readable than format.__mod__ when you need the bound method.

  • The duplication of operator functionality as an alphabetic method would not be without precedent. Some examples:

    • list.extend vs. list.__iadd__
    • dict.iterkeys vs. dict.__iter__
    • dict.has_key vs. dict.__contains__

    Note that in this case, the signature is not identical.

  • The use of keyword arguments for mapping-based formatting works in simple cases because the limitations on the keys are basically the same - they must be arbitrary strings. However:

    • Unicode formatting operations allow unicode strings as keys for selecting values, whereas keyword arguments are restricted to plain strings. Does anybody use unicode strings as formatting keys at all?
    • When you already have a mapping, you need to pass it using the **mapping syntax. This doesn't work for mappings that are not real dicts and it seems to go over all keys validating they are strings, which unnecesarily slows things down.

    Therefore supporting only the keyword arguments interface would be bad.

  • One alternative to an alphabetic method name is __call__:

    >>> '-> %s <-'(1)
    '-> 1 <-'
    >>> '-> %s %s <-'(1, 2)
    '-> 1 2 <-'
    

    This has the same benefits of unambiguity inherited from the call syntax and is very concise. This is also its undoing: it's too cryptic. Using strings in function call position will deeply puzzle anybody unfamiliar with it. The idea was probably discussed and dismissed when % was originally introduced.

    Quoting PEP 234, on rejection of __call__() as alternative name for iterators' .next() method:

    ... there's a danger that every special-purpose object wants to use __call__() for its most common operation, causing more confusion than clarity.

Backwards Compatibility

The proposed addition(s) present no compatibility problems (unless somebody intentionally checks the absence of such method(s), in which case he doesn't deserve compatibility :-).

The deprecation is estimated to affect a lot of code - there is hardly any Python programmer who has never used the single-value formatting shorthand. It is completely optional and can be done as slowly as needed to make the transition painless.

There is probably some APIs out there with functions that accept a single value or a tuple and pass it to the formatting operation as-is. The deprecation will force them to either to break the API in this respect (possibly implementing alternatives similar to this PEP), which is a cleaner idea anyway, or explicitly implement compatible functionality, e.g.:

if isinstance(values, tuple):
    do_something(format % values)
else:
    do_something(format % (values ,))

When the mapping-based formats should also be supported, some more code is needed...

References

  1. String formatting operation [1]
[1]http://www.python.org/doc/current/lib/typesseq-strings.html

Docutils System Messages

System Message: WARNING/2 (pep-strfmt.txt); backlink

"PEP" header must contain an integer; "XXX" is an invalid value.