Anti-Patterns in Python

This page is a collection of the most unfortunate but occasionally subtle issues I've seen in code written by new Python developers. It's written to help new developers get past the phase of writing ugly Python, and the simplifications employed (for example, ignoring generators when talking about iteration) reflect its intended audience.

If you have comments or wish to use this work in way other than what the license allows, feel free to get in touch with me by e-mail.

There are always reasons to use some of these anti-patterns, which I've tried to give those where possible, but in general using these anti-patterns makes for less readable, more buggy, and less Pythonic code. If you're looking for broader introductory materials for Python, I highly recommend The Python Tutorial or Dive into Python.

The Use of range

Recite it in your sleep: range is not for simple, obvious iterations over sequences. For those used to numerically defined for loops, range feels like home, but using it for iteration over sequences is bug-prone and less clear than using the standard for construct directly on an iterable.

range is a handy function in Python, but most misuses of it are prone to unfortunate off-by-one bugs. This is commonly caused by forgetting that range is inclusive in its first argument and exclusive in its second, just like substring in Java and many, many, other functions of this type. Those who think too hard about not overrunning the end of their sequence are going to create bugs:

# An incorrect way to iterate over a whole sequence
alist = ['her', 'name', 'is', 'rio']
for i in range(0, len(alist)-1): # Off by one!
    print i, alist[i]
The common excuses for using range() inappropriately are:
  1. Needing the index value in the loop. This isn't a valid excuse. Instead, you should write:
    for (index, value) in enumerate(alist): 
        print index, value
    
  2. Needing to iterate over two loops at once, getting a value at the same index from each. In this case, you want to use zip():
    for (word, number) in zip(words, numbers):
        print word, number
    
  3. Needing to iterate over only part of a sequence. In this case, just iterate over a slice of the sequence and include a comment to make it clear that this was intentional.
    for word in words[1:]: # Exclude the first word
        print word
    
    An exception to this is when you're iterating over a sequence so big that the overhead introduced by slicing the would be very expensive. If your sequence is 10 items, this is unlikely to matter, but if it is 10 million items or this is done in a performance-sensitive inner loop, this is going to be very important.
The best time to use range when iterating over a sequence is when you have a complex loop in which under some conditions the loop may not move forward one item in the list each iteration. Another important use case of range outside of iterating over a sequence is when you genuinely need a list of numbers not to be used for indexing:
# Print foo(x) for 0<=x<5
for x in range(5):
    print foo(x)

Using List Comprehensions Properly

If you have a loop that looks like this, you want to rewrite it as a list comprehension:
# An ugly, slow way to build a list
words = ['her', 'name', 'is', 'rio']
alist = []
for word in words:
    alist.append(foo(word))
Instead, write a list comprehension:
words = ['her', 'name', 'is', 'rio']
alist = [foo(word) for word in words]

Why do this? For one, you avoid any bugs related to correctly initializing alist. Also, the code just looks a lot cleaner and what you're doing is clearer. For those from a functional programming background, map may feel more familiar, but I find it less Pythonic.

Some common excuses for not using a list comprehension:
  1. You need to nest your loop. You can nest entire list comprehensions, or just put multiple loops inside a list comprehension. So, instead of writing:
    words = ['her', 'name', 'is', 'rio']
    letters = []
    for word in words:
        for letter in word:
            letters.append(letter)
        
    Write:
    words = ['her', 'name', 'is', 'rio']
    letters = [letter for word in words
                      for letter in word]
        
    Note that in a list comprehension with multiple loops, the loops have the same order as if you weren't making a list comprehension at all.
  2. You need a condition inside your loop. But you can do this in a list comprehension just as easily:
    words = ['her', 'name', 'is', 'rio', '1', '2', '3']
    alpha_words = [word for word in words if isalpha(word)] 
        

A valid reason for not using a list comprehension is that you can't do exception handling inside one. So if some items in the iteration will cause exceptions to be raised, you will need to either offload the exception handling to a function called by the list comprehension or not use a list comprehension at all.

Performance Pitfalls

Checking for contents in linear time

Syntactically, checking if something is contained in a list or a set/dictionary look alike, but under the hood things are different. If you need to repeatedly check whether something is contained in a data structure, use a set instead of a list. (You can use a dict if you need to associate a value with it and also get constant time mebership tests.)

# Avoid this
lyrics_list = ['her', 'name', 'is', 'rio']
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
    if word in lyrics_list: # Linear time
        print word, "is in the lyrics"
# Do this
lyrics_list = ['her', 'name', 'is', 'rio']
lyrics_set = set(lyrics_list) # Linear time set construction
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
    if word in lyrics_list: # Constant time
        print word, "is in the lyrics"

Keep in mind that creation of the set will take linear time even though membership testing takes constant time. So if you are checking for membership in a loop, it's almost always worth it to take the time to build a set since you only have to build the set once.


Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.