Idiomatic Python — Intermediate and Advanced Software Carpentry 1.0 documentation (2024)

Extracts from The Zen of Python by Tim Peters:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Readability counts.

(The whole Zen is worth reading...)

The first step in programming is getting stuff to work at all.

The next step in programming is getting stuff to work regularly.

The step after that is reusing code and designing for reuse.

Somewhere in there you will start writing idiomatic Python.

Idiomatic Python is what you write when the only thing you’restruggling with is the right way to solve your problem, and you’renot struggling with the programming language or some weird libraryerror or a nasty data retrieval issue or something else extraneous toyour real problem. The idioms you prefer may differ from the idioms Iprefer, but with Python there will be a fair amount of overlap,because there is usually at most one obvious way to do every task. (Acaveat: “obvious” is unfortunately the eye of the beholder, to someextent.)

For example, let’s consider the right way to keep track of the item numberwhile iterating over a list. So, given a list z,

>>> z = [ 'a', 'b', 'c', 'd' ]

let’s try printing out each item along with its index.

You could use a while loop:

>>> i = 0>>> while i < len(z):...  print i, z[i]...  i += 10 a1 b2 c3 d

or a for loop:

>>> for i in range(0, len(z)):...  print i, z[i]0 a1 b2 c3 d

but I think the clearest option is to use enumerate:

>>> for i, item in enumerate(z):...  print i, item0 a1 b2 c3 d

Why is this the clearest option? Well, look at the ZenOfPython extractabove: it’s explicit (we used enumerate); it’s simple; it’s readable;and I would even argue that it’s prettier than the while loop, if notexactly “beatiful”.

Python provides this kind of simplicity in as many places as possible, too.Consider file handles; did you know that they were iterable?

>>> for line in file('data/listfile.txt'):...  print line.rstrip()abcd

Where Python really shines is that this kind of simple idiom – inthis case, iterables – is very very easy not only to use but toconstruct in your own code. This will make your own code much morereusable, while improving code readability dramatically. And that’sthe sort of benefit you will get from writing idiomatic Python.

Some basic data types

I’m sure you’re all familiar with tuples, lists, and dictionaries, right?Let’s do a quick tour nonetheless.

‘tuples’ are all over the place. For example, this code for swapping twonumbers implicitly uses tuples:

>>> a = 5>>> b = 6>>> a, b = b, a>>> print a == 6, b == 5True True

That’s about all I have to say about tuples.

I use lists and dictionaries all the time. They’re the two greatestinventions of mankind, at least as far as Python goes. With lists,it’s just easy to keep track of stuff:

>>> x = []>>> x.append(5)>>> x.extend([6, 7, 8])>>> x[5, 6, 7, 8]>>> x.reverse()>>> x[8, 7, 6, 5]

It’s also easy to sort. Consider this set of data:

>>> y = [ ('IBM', 5), ('Zil', 3), ('DEC', 18) ]

The sort method will run cmp on each of the tuples,which sort on the first element of each tuple:

>>> y.sort()>>> y[('DEC', 18), ('IBM', 5), ('Zil', 3)]

Often it’s handy to sort tuples on a different tuple element, and thereare several ways to do that. I prefer to provide my own sort method:

>>> y.sort(sort_on_second)>>> y[('Zil', 3), ('IBM', 5), ('DEC', 18)]

Note that here I’m using the builtin cmp method (which is what sortuses by default: y.sort() is equivalent to y.sort(cmp)) to do thecomparison of the second part of the tuple.

This kind of function is really handy for sorting dictionaries byvalue, as I’ll show you below.

(For a more in-depth discussion of sorting options, check out theSorting HowTo.)

On to dictionaries!

Your basic dictionary is just a hash table that takes keys and returnsvalues:

>>> d = {}>>> d['a'] = 5>>> d['b'] = 4>>> d['c'] = 18>>> d{'a': 5, 'c': 18, 'b': 4}>>> d['a']5

You can also initialize a dictionary using the dict type to createa dict object:

>>> e = dict(a=5, b=4, c=18)>>> e{'a': 5, 'c': 18, 'b': 4}

Dictionaries have a few really neat features that I use pretty frequently.For example, let’s collect (key, value) pairs where we potentially havemultiple values for each key. That is, given a file containing this data,

a 5b 6d 7a 2c 1

suppose we want to keep all the values? If we just did it the simple way,

>>> d = {}>>> for line in file('data/keyvalue.txt'):...  key, value = line.split()...  d[key] = int(value)

we would lose all but the last value for each key:

>>> d{'a': 2, 'c': 1, 'b': 6, 'd': 7}

You can collect all the values by using get:

>>> d = {}>>> for line in file('data/keyvalue.txt'):...  key, value = line.split()...  l = d.get(key, [])...  l.append(int(value))...  d[key] = l>>> d{'a': [5, 2], 'c': [1], 'b': [6], 'd': [7]}

The key point here is that d.get(k, default) is equivalent tod[k] if d[k] already exists; otherwise, it returns default.So, the first time each key is used, l is set to an empty list;the value is appended to this list, and then the value is set for thatkey.

(There are tons of little tricks like the ones above, but these are theones I use the most; see the Python Cookbook for an endless supply!)

Now let’s try combining some of the sorting stuff above withdictionaries. This time, our contrived problem is that we’d like tosort the keys in the dictionary d that we just loaded, but ratherthan sorting by key we want to sort by the sum of the values for eachkey.

First, let’s define a sort function:

>>> def sort_by_sum_value(a, b):...  sum_a = sum(a[1])...  sum_b = sum(b[1])...  return cmp(sum_a, sum_b)

Now apply it to the dictionary items:

>>> items = d.items()>>> items[('a', [5, 2]), ('c', [1]), ('b', [6]), ('d', [7])]>>> items.sort(sort_by_sum_value)>>> items[('c', [1]), ('b', [6]), ('a', [5, 2]), ('d', [7])]

and voila, you have your list of keys sorted by summed values!

As I said, there are tons and tons of cute little tricks that you cando with dictionaries. I think they’re incredibly powerful.

List comprehensions

List comprehensions are neat little constructs that will shorten yourlines of code considerably. Here’s an example that constructs a listof squares between 0 and 4:

>>> z = [ i**2 for i in range(0, 5) ]>>> z[0, 1, 4, 9, 16]

You can also add in conditionals, like requiring only even numbers:

>>> z = [ i**2 for i in range(0, 10) if i % 2 == 0 ]>>> z[0, 4, 16, 36, 64]

The general form is

[ expression for var in list if conditional ]

so pretty much anything you want can go in expression and conditional.

I find list comprehensions to be very useful for both file parsing andfor simple math. Consider a file containing data and comments:

# this is a comment or a header1# another comment2

where you want to read in the numbers only:

>>> data = [ int(x) for x in open('data/commented-data.txt') if x[0] != '#' ]>>> data[1, 2]

This is short, simple, and very explicit!

For simple math, suppose you need to calculate the average and stddev ofsome numbers. Just use a list comprehension:

>>> import math>>> data = [ 1, 2, 3, 4, 5 ]>>> average = sum(data) / float(len(data))>>> stddev = sum([ (x - average)**2 for x in data ]) / float(len(data))>>> stddev = math.sqrt(stddev)>>> print average, '+/-', stddev3.0 +/- 1.41421356237

Oh, and one rule of thumb: if your list comprehension is longer thanone line, change it to a for loop; it will be easier to read, and easierto understand.

Building your own types

Most people should be pretty familiar with basic classes.

>>> class A:...  def __init__(self, item):...  self.item = item...  def hello(self):...  print 'hello,', self.item
>>> x = A('world')>>> x.hello()hello, world

There are a bunch of neat things you can do with classes, but one ofthe neatest is building new types that can be used with standardPython list/dictionary idioms.

For example, let’s consider a basic binning class.

>>> class Binner:...  def __init__(self, binwidth, binmax):...  self.binwidth, self.binmax = binwidth, binmax...  nbins = int(binmax / float(binwidth) + 1)...  self.bins = [0] * nbins......  def add(self, value):...  bin = value / self.binwidth...  self.bins[bin] += 1

This behaves as you’d expect:

>>> binner = Binner(5, 20)>>> for i in range(0,20):...  binner.add(i)>>> binner.bins[5, 5, 5, 5, 0]

...but wouldn’t it be nice to be able to write this?

for i in range(0, len(binner)): print i, binner[i]

or even this?

for i, bin in enumerate(binner): print i, bin

This is actually quite easy, if you make the Binner class look like alist by adding two special functions:

>>> class Binner:...  def __init__(self, binwidth, binmax):...  self.binwidth, self.binmax = binwidth, binmax...  nbins = int(binmax / float(binwidth) + 1)...  self.bins = [0] * nbins......  def add(self, value):...  bin = value / self.binwidth...  self.bins[bin] += 1......  def __getitem__(self, index):...  return self.bins[index]......  def __len__(self):...  return len(self.bins)
>>> binner = Binner(5, 20)>>> for i in range(0,20):...  binner.add(i)

and now we can treat Binner objects as normal lists:

>>> for i in range(0, len(binner)):...  print i, binner[i]0 51 52 53 54 0
>>> for n in binner:...  print n55550

In the case of len(binner), Python knows to use the special method__len__, and likewise binner[i] just calls __getitem__(i).

The second case involves a bit more implicit magic. Here, Python figuresout that Binner can act like a list and simply calls the right functionsto retrieve the information.

Note that making your own read-only dictionaries is pretty simple, too:just provide the __getitem__ function, which is called for non-integervalues as well:

>>> class SillyDict:...  def __getitem__(self, key):...  print 'key is', key...  return key>>> sd = SillyDict()>>> x = sd['hello, world']key is hello, world>>> x'hello, world'

You can also write your own mutable types, e.g.

>>> class SillyDict:...  def __setitem__(self, key, value):...  print 'setting', key, 'to', value>>> sd = SillyDict()>>> sd[5] = 'world'setting 5 to world

but I have found this to be less useful in my own code, where I’musually writing special objects like the Binner type above: Iprefer to specify my own methods for putting information into theobject type, because it reminds me that it is not a generic Pythonlist or dictionary. However, the use of __getitem__ (and some ofthe iterator and generator features I discuss below) can make code muchmore readable, and so I use them whenever I think the meaning will beunambiguous. For example, with the Binner type, the purpose of__getitem__ and __len__ is not very ambiguous, while thepurpose of a __setitem__ function (to support binner[x] = y)would be unclear.

Overall, the creation of your own custom list and dict types is oneway to make reusable code that will fit nicely into Python’s naturalidioms. In turn, this can make your code look much simpler and feelmuch cleaner. The risk, of course, is that you will also make yourcode harder to understand and (if you’re not careful) harder to debug.Mediating between these options is mostly a matter of experience.

Iterators

Iterators are another built-in Python feature; unlike the list anddict types we discussed above, an iterator isn’t really a type, buta protocol. This just means that Python agrees to respect anythingthat supports a particular set of methods as if it were an iterator.(These protocols appear everywhere in Python; we were taking advantageof the mapping and sequence protocols above, when we defined__getitem__ and __len__, respectively.)

Iterators are more general versions of the sequence protocol; here’s anexample:

>>> class SillyIter:...  i = 0...  n = 5...  def __iter__(self):...  return self...  def next(self):...  self.i += 1...  if self.i > self.n:...  raise StopIteration...  return self.i
>>> si = SillyIter()>>> for i in si:...  print i12345

Here, __iter__ just returns self, an object that has thefunction next(), which (when called) either returns a value orraises a StopIteration exception.

We’ve actually already met several iterators in disguise; in particular,enumerate is an iterator. To drive home the point, here’s a simplereimplementation of enumerate:

>>> class my_enumerate:...  def __init__(self, some_iter):...  self.some_iter = iter(some_iter)...  self.count = -1......  def __iter__(self):...  return self......  def next(self):...  val = self.some_iter.next()...  self.count += 1...  return self.count, val>>> for n, val in my_enumerate(['a', 'b', 'c']):...  print n, val0 a1 b2 c

You can also iterate through an iterator the “old-fashioned” way:

>>> some_iter = iter(['a', 'b', 'c'])>>> while 1:...  try:...  print some_iter.next()...  except StopIteration:...  breakabc

but that would be silly in most situations! I use this if I just wantto get the first value or two from an iterator.

With iterators, one thing to watch out for is the return of self fromthe __iter__ function. You can all too easily write an iterator thatisn’t as re-usable as you think it is. For example, suppose you hadthe following class:

>>> class MyTrickyIter:...  def __init__(self, thelist):...  self.thelist = thelist...  self.index = -1......  def __iter__(self):...  return self......  def next(self):...  self.index += 1...  if self.index < len(self.thelist):...  return self.thelist[self.index]...  raise StopIteration

This works just like you’d expect as long as you create a new object eachtime:

>>> for i in MyTrickyIter(['a', 'b']):...  for j in MyTrickyIter(['a', 'b']):...  print i, ja aa bb ab b

but it will break if you create the object just once:

>>> mi = MyTrickyIter(['a', 'b'])>>> for i in mi:...  for j in mi:...  print i, ja b

because self.index is incremented in each loop.

Generators

Generators are a Python implementation of coroutines. Essentially, they’refunctions that let you suspend execution and return a result:

>>> def g():...  for i in range(0, 5):...  yield i**2>>> for i in g():...  print i014916

You could do this with a list just as easily, of course:

>>> def h():...  return [ x ** 2 for x in range(0, 5) ]>>> for i in h():...  print i014916

But you can do things with generators that you couldn’t do with finitelists. Consider two full implementation of Eratosthenes’ Sieve forfinding prime numbers, below.

First, let’s define some boilerplate code that can be used by eitherimplementation:

>>> def divides(primes, n):...  for trial in primes:...  if n % trial == 0: return True...  return False

Now, let’s write a simple sieve with a generator:

>>> def prime_sieve():...  p, current = [], 1...  while 1:...  current += 1...  if not divides(p, current): # if any previous primes divide, cancel...  p.append(current) # this is prime! save & return...  yield current

This implementation will find (within the limitations of Python’s mathfunctions) all prime numbers; the programmer has to stop it herself:

>>> for i in prime_sieve():...  print i...  if i > 10:...  break235711

So, here we’re using a generator to implement the generation of aninfinite series with a single function definition. To do the equivalentwith an iterator would require a class, so that the object instance canhold the variables:

>>> class iterator_sieve:...  def __init__(self):...  self.p, self.current = [], 1...  def __iter__(self):...  return self...  def next(self):...  while 1:...  self.current = self.current + 1...  if not divides(self.p, self.current):...  self.p.append(self.current)...  return self.current
>>> for i in iterator_sieve():...  print i...  if i > 10:...  break235711

It is also much easier to write routines like enumerate as agenerator than as an iterator:

>>> def gen_enumerate(some_iter):...  count = 0...  for val in some_iter:...  yield count, val...  count += 1
>>> for n, val in gen_enumerate(['a', 'b', 'c']):...  print n, val0 a1 b2 c

Abstruse note: we don’t even have to catch StopIteration here, becausethe for loop simply ends when some_iter is done!

assert

One of the most underused keywords in Python is assert. Assert ispretty simple: it takes a boolean, and if the boolean evaluates toFalse, it fails (by raising an AssertionError exception). assert Trueis a no-op.

>>> assert True>>> assert FalseTraceback (most recent call last): ...AssertionError

You can also put an optional message in:

>>> assert False, "you can't do that here!"Traceback (most recent call last): ...AssertionError: you can't do that here!

assert is very, very useful for making sure that code is behavingaccording to your expectations during development. Worried thatyou’re getting an empty list? assert len(x). Want to make surethat a particular return value is not None? assert retval is notNone.

Also note that ‘assert’ statements are removed from optimized code, so onlyuse them to conditions related to actual development, and make sure thatthe statement you’re evaluating has no side effects. For example,

>>> a = 1>>> def check_something():...  global a...  a = 5...  return True>>> assert check_something()

will behave differently when run under optimization than when run withoutoptimization, because the assert line will be removed completely fromoptimized code.

If you need to raise an exception in production code, see below. Thequickest and dirtiest way is to just “raise Exception”, but that’s kindof non-specific ;).

Conclusions

Use of common Python idioms – both in your python code and for yournew types – leads to short, sweet programs.

Idiomatic Python — Intermediate and Advanced Software Carpentry 1.0 documentation (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5675

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.