PEP: XXX Title: Refusing to Guess in the String Formatting Operation Version: $Revision: $ Last-Modified: $Date: $ Author: Beni Cherniavsky Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Mar-2003 Python-Version: 2.3 Post-History: 02-Mar-2003 Abstract ======== The string formatting operation - ``format % values`` - has a wart. In the face of non-tuple values, it assumes a singleton tuple around it. It's tempting to use this since singleton tuples look ugly but code using this easily breaks (when the single object happens to be tuple). This PEP proposes several ways to fix this wart. The problem was discussed repeatedly but the specific solutions proposed in this PEP need more feedback. Motivation ========== In the face of ambiguity, refuse the temptation to guess. -- The Zen of Python, by Tim Peters The simple use ``format % (val1, val2, ...)`` with a tuple is completely robust. However, since frequently there is only one value and singleton tuples are quite awkward to write and read, a shorthand is allowed - when the right object is not a tuple, it's interpreted as if it was wrapped in a singleton tuple. This shorthand is bad because: 1. When you pass a single object without the singleton tuple around it, your code will break it the object happens to be a tuple: >>> def decorate(obj): ... return '-> %s <-' % obj ... >>> decorate(1) '-> 1 <-' >>> decorate((1,)) '-> 1 <-' # instead of '-> (1,) <-' >>> decorate((1,2)) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in decorate TypeError: not all arguments converted during string formatting Your program can silently emit wrong results or crash. This is the classic example of the imposibility to create a single robust interface that will accept either an object or a sequence and do the right thing. Any attempt to do so invites bugs. This has bitten many Python newbies and was dicussed many times on comp.lang.python. 2. This temptation will exist forever under the current design. The purpose of this PEP is to provide altenative ways to format a single object that are as convenient as formating several objects, so that the temptation will disappear. 3. It is not possible to use sequences other than tuples (e.g. lists) for passing multiple values to the formatting operation, because the sequence is interpreted as a single object. For example, it's useful to be able to use a transparent debugging proxy that logs all accesses isntead of an object; if such a proxy for a tuple is used on the right side of a formatting operation, the transparency breaks. More generally, it's unpythonic to discriminate objects by actual type instead of the interface they implement. The formatting operator also has a third mode: when the right argument should be a mapping. This mode is triggered deterministically by the "%(...)..." syntax in the format rather than by checking the type of the right argument, so it doesn't need fixing. Single-value operator ===================== Specification ------------- Add a new string operator, ``/``, that will always accept a single value: >>> '-> %s <-' / 1 '-> 1 <-' >>> '-> %s <-' / (1 ,) '-> (1,) <-' To be always availiable, this would need to be duplicated as both ``str.__div__`` and ``str.__truediv__``. Rationale --------- This allows unambiguos expression of formatting with a single value, with minimal hassle when you change the code between one and several values. It's probably the least-intrusive fix for the problem. The ``/`` operator was choosen for this because this is the "closest relative" of ``%``. Mnemonic: ``/`` looks like a subset of ``%`` but with just one stroke :-). Downsides: it's an arbitrary punctuation; the mnemonic isn't better than perl's motivations for ``$]`` and friends... The div/truediv duplication is a hack that highlights the fact that this has nothing to do with division. Some people might start to wonder what ``//`` (floordiv) does with strings... (Should it do mapping formatting? I don't see the need - leave it to ``%``.) Note: PEP 292 mentions that ``/`` was proposed by somebody for string interpolation against a dictionary. This reinforces the concern that it is arbitrary - different people have different ideas for its use. OTOH, this shows that it's the obvious "closest relative" of ``%``... Deprecation of format % single-value ==================================== Once a unambiguos method to format a single value exist, there is little need (except for backward compatibility) for ``%`` to accept non-tuple right arguments. Thus it makes sense to deprecate this, eventually removing it completely. This will break a lot of existing code, not all of which is buggy, so it's better to proceed with this slowly. * As soon as a better alternative is implemented (2.3?), formating operations with non-tuple right argument should raise ``PendingDeprecationWarning`` (except for the "%(...)..." mode that expects mappings, of course). This is harmless and can be continued until everybody is educated and no newly written code uses it. * At some point (2.4? 2.x? 3.0?), the warning becomes ``DeprecationWarning``. At this point all the late adopters go over their code, perhaps fixing some instances of the single-values-happens-to-be-tuple bug in the process :-). * At the next stage, the guessing is completely dropped - you get a ``TypeError`` for non-tuples. * After this forced all uses of non-tuple values to disappear, it's possible to relax the demands and accept other seqeunces besides tuples. Since formatting access the values only seqeuncially, any iterable type can be accepted. This is a minor point in any case. Alphabetic Method ================= *This is not proposed as a sufficient solution but rather as an additional thing that can optionally be implemented. Actually the PEP author tends to think this is not needed but it is listed for completeness - perhaps others will like it more.* Specification ------------- Add a new string method with an alphabetic (non-magic) name that would accept a variable number of arguments (besides self) that are the values for the formatting operation: >>> '-> %s <-'.fmt(1) '-> 1 <-' >>> '-> %s %s <-'.fmt(1,2) '-> 1 2 <-' When the format selects values by names ("%(...)..."), there are several possible interfaces for passing the mapping (options 2 or 3 are preferred): 1. Keyword arguments: >>> '-> %(a)s <-'.fmt(a=1) '-> 1 <-' >>> '-> %(!@#$)s <-'.fmt(**{'!@#$': 1}) '-> 1 <-' 2. As a single mapping argument: >>> '-> %(a)s <-'.fmt({'a': 1}) '-> 1 <-' 3. Support both ways. Since this becomes very similar to the behaviour of the ``dict()`` constructor (since python 2.3), perhaps it makes sense to complete the similirity and also support a sequence of items as an argument: >>> # The above two ways work; in addition: >>> '-> %(a)s <-'.fmt([('a', 1)]) '-> 1 <-' All this could be easily implemented by just passing all arguments of the formatting method to ``dict()``. Note however, that when the argument is already a mapping (has a ``keys`` attribute), ``dict()`` should not be called on it, because that would copy the whole dictionary (and frequently big mappings are used when the format only accesses several keys). Suggested names for the new method: ``fmt``, ``format``, ``sprintf``. Rationale --------- * This notation works equally well for one and multiple values thanks to the call syntax of Python (at the price of some verbosity). This benefit will only be realised if either of the following happens: - Everybody completely switches to it and the % notation is deprecated - this doesn't look like a good idea. - The % notation is fixed by other means so that both notations are applicable at any situation (without dependence on the number of values), subject to the programmer's taste. That's why this solution is not sufficient in any case. * ``format.fmt`` looks more readable than ``format.__mod__`` when you need the bound method. * The duplication of operator functionality as an alphabetic method would not be without precedent. Some examples: - ``list.extend`` vs. ``list.__iadd__`` - ``dict.iterkeys`` vs. ``dict.__iter__`` - ``dict.has_key`` vs. ``dict.__contains__`` Note that in this case, the signature is not identical. * The use of keyword arguments for mapping-based formatting works in simple cases because the limitations on the keys are basically the same - they must be arbitrary strings. However: - Unicode formatting operations allow unicode strings as keys for selecting values, whereas keyword arguments are restricted to plain strings. Does anybody use unicode strings as formatting keys at all? - When you already have a mapping, you need to pass it using the ``**mapping`` syntax. This doesn't work for mappings that are not real dicts and it seems to go over all keys validating they are strings, which unnecesarily slows things down. Therefore supporting only the keyword arguments interface would be bad. * One alternative to an alphabetic method name is ``__call__``: >>> '-> %s <-'(1) '-> 1 <-' >>> '-> %s %s <-'(1, 2) '-> 1 2 <-' This has the same benefits of unambiguity inherited from the call syntax and is very concise. This is also its undoing: it's too cryptic. Using strings in function call position will deeply puzzle anybody unfamiliar with it. The idea was probably discussed and dismissed when ``%`` was originally introduced. Quoting PEP 234, on rejection of ``__call__()`` as alternative name for iterators' ``.next()`` method: ... there's a danger that every special-purpose object wants to use __call__() for its most common operation, causing more confusion than clarity. Backwards Compatibility ======================= The proposed addition(s) present no compatibility problems (unless somebody intentionally checks the absence of such method(s), in which case he doesn't deserve compatibility :-). The deprecation is estimated to affect a lot of code - there is hardly any Python programmer who has never used the single-value formatting shorthand. It is completely optional and can be done as slowly as needed to make the transition painless. There is probably some APIs out there with functions that accept a single value or a tuple and pass it to the formatting operation as-is. The deprecation will force them to either to break the API in this respect (possibly implementing alternatives similar to this PEP), which is a cleaner idea anyway, or explicitly implement compatible functionality, e.g.:: if isinstance(values, tuple): do_something(format % values) else: do_something(format % (values ,)) When the mapping-based formats should also be supported, some more code is needed... References ========== 1. `String formatting operation`__ __ http://www.python.org/doc/current/lib/typesseq-strings.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: