This page is a collection of the most unfortunate but occasionally subtle issues I've seen in code written by new Python developers. It's written to help new developers get past the phase of writing ugly Python, and the simplifications employed (for example, ignoring generators when talking about iteration) reflect its intended audience.
If you have comments or wish to use this work in way other than what the license allows, feel free to get in touch with me by e-mail.
There are always reasons to use some of these anti-patterns, which I've tried to give those where possible, but in general using these anti-patterns makes for less readable, more buggy, and less Pythonic code. If you're looking for broader introductory materials for Python, I highly recommend The Python Tutorial or Dive into Python.
rangeRecite it in your sleep: range is not for simple, obvious iterations over sequences.
For those used to numerically defined for loops, range feels like home,
but using it for iteration over sequences is bug-prone and less clear than using the standard
for construct directly on an iterable.
range is a handy function in Python, but most misuses of it are prone to unfortunate
off-by-one bugs. This is commonly caused by forgetting that range is inclusive in its
first argument and exclusive in its second, just like
substring in Java and many, many, other functions of
this type. Those who think too hard about not overrunning the end of
their sequence are going to create bugs:
# An incorrect way to iterate over a whole sequence
alist = ['her', 'name', 'is', 'rio']
for i in range(0, len(alist)-1): # Off by one!
print i, alist[i]
The common excuses for using range() inappropriately are:
for (index, value) in enumerate(alist):
print index, value
for (word, number) in zip(words, numbers):
print word, number
for word in words[1:]: # Exclude the first word
print word
An exception to this is when you're iterating over a sequence so big that the overhead introduced
by slicing the would be very expensive. If your sequence is 10 items, this is unlikely to matter,
but if it is 10 million items or this is done in a performance-sensitive inner loop, this is
going to be very important.
range when iterating over a sequence is when
you have a complex loop in which under some conditions the loop may not move
forward one item in the list each iteration.
Another important use case of range outside of iterating over a sequence
is when you genuinely need a list of numbers not to be used for indexing:
# Print foo(x) for 0<=x<5
for x in range(5):
print foo(x)
# An ugly, slow way to build a list
words = ['her', 'name', 'is', 'rio']
alist = []
for word in words:
alist.append(foo(word))
Instead, write a list comprehension:
words = ['her', 'name', 'is', 'rio'] alist = [foo(word) for word in words]
Why do this? For one, you avoid any bugs related to correctly initializing alist. Also, the code just looks a lot cleaner and what you're doing is clearer. For those from a functional programming background, map may feel more familiar, but I find it less Pythonic.
Some common excuses for not using a list comprehension:
words = ['her', 'name', 'is', 'rio']
letters = []
for word in words:
for letter in word:
letters.append(letter)
Write:
words = ['her', 'name', 'is', 'rio']
letters = [letter for word in words
for letter in word]
Note that in a list comprehension with multiple loops, the loops have the same order as
if you weren't making a list comprehension at all.
words = ['her', 'name', 'is', 'rio', '1', '2', '3']
alpha_words = [word for word in words if isalpha(word)]
A valid reason for not using a list comprehension is that you can't do exception handling inside one. So if some items in the iteration will cause exceptions to be raised, you will need to either offload the exception handling to a function called by the list comprehension or not use a list comprehension at all.
Syntactically, checking if something is contained in a list or a set/dictionary look alike, but under the hood things are different. If you need to repeatedly check whether something is contained in a data structure, use a set instead of a list. (You can use a dict if you need to associate a value with it and also get constant time mebership tests.)
# Avoid this
lyrics_list = ['her', 'name', 'is', 'rio']
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
if word in lyrics_list: # Linear time
print word, "is in the lyrics"
# Do this
lyrics_list = ['her', 'name', 'is', 'rio']
lyrics_set = set(lyrics_list) # Linear time set construction
words = make_wordlist() # Pretend this returns many words that we want to test
for word in words:
if word in lyrics_list: # Constant time
print word, "is in the lyrics"
Keep in mind that creation of the set will take linear time even though membership testing takes constant time. So if you are checking for membership in a loop, it's almost always worth it to take the time to build a set since you only have to build the set once.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.