This is a story about how a house holy was cut in twine. When an idea where beautiful is better than ugly goes too far. Today we are talking about the transition from Python 2 to Python 3 and the war that ensued.

Python History 201

In April of 2006, a decision was made to push a major release to the Python programming language. At the time, Python 2.5 was the latest stable release available. It was decided that after the next release of 2.6, the language would go through a major transition where core functionality would change and not be backwards compatible with prior versions. This new major release would be called Python 3.

The Proposed Python 2.6 and 3.0 Release Schedule

Guido van Rossum, the author and authority of Python, layed out a timeline in which the Python 2 and prior codebase would be maintained. It was considered that in 2013 the community would embrace the new transition and the prior versions would no longer be supported. As we know now, this timeline would stretch for much further proving just how accurate us programmers are at estimating time.

The changes for Python 3 were so great, that the Python Enhanced Proposals, also known as PEPs for short, would be incremented to PEP-3000 to indicate the new version of the language. In the next sections, we will look at the modifications to the language, how this could break code written for Python 2, and how to migrate existing codebases to Python 3.

Changes in Python 3

There should be one-- and preferably only one --obvious way to do it

One of the main scriptures followed by Pythonistas is The Zen of Python. In it, a particular pragma states that there should be only one clear and concise way of doing something within the language. Because of the growth of the Python, there had become many ways of using packages and syntaxes which did not follow standards set by other similar functionalities in the language. Let's take a look at how Python 3 rectified the inconsistencies but at the cost of incompatibility with older syntaxes in the language.

print is the only application-level functionality that has a statement dedicated to it.

Prior to the change outlined in PEP-3105, print was a language statement. With Python 3, print would now be considered a builtin function. According to the PEP, the print statement was an exception to the rule and Guido regretted this particular construct in the language.

Changing print to a function lead to a few differences which broke backwards compatibility.

>>> print('Python', '2')
('Python', '2')

For Python 2, this call is the equivalent to writing print tuple('Python', '2') or in other words: printing a tuple collection type containing the two strings 'Python' and '2' to the builtin function print.

>>> print('Python', '3')
Python 3

For Python 3, since print was converted to a function, the call above results in passing two strings as arguments to the print function.

The differences are subtle but this did result in breaking functionality for some programs.

Dictionary Keys and Values

The interface used for looping over dictionary items was also changed for Python 3. Prior, there were two redundant sets of ways to iterate over a dictionary and its elements using these methods:

  • dict.keys() and dict.iterkeys()
  • dict.values() and dict.itervalues()
  • dict.items() and dict.iteritems()

keys, values, and items return a list type. While the iterkeys, itervalues, and iteritems methods return an iterator type. Otherwise, the two sets of methods serve the same purpose and could be used to get the same data. For this reason, it was decided to remove iterkeys, itervalues, and iteritems methods and only support keys, values, and items.

In addition, the data types returned from keys, values, and items were changed to a lightweight set equivalent type. This allows for direct comparison of results and also removes the unnecessary copying that was done internally in Python 2.

This leads to some differences as can be seen below.

>>> {'python': 2}.items()[0]
('python', 2)

In this Python 2 example, we can see that we can index into the return of the items call due to it being a list type.

>>> {'python': 3}.items()[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict_items' object does not support indexing

But, in the Python 3 example above, we cannot index into the result due to the change in the data type returned.

PEP-3106 covers these changes in detail.

Reorganization of the Standard Library

The next breaking change encompassed within Python 3 were the efforts to normalize the modules of the standard library. This was evaluated in PEP-3108.

Just like the language itself, Python's standard library (stdlib) has grown over the years to be very rich. But over time some modules have lost their need to be included with Python. There has also been an introduction of a naming convention for modules since Python's inception that not all modules follow.

Python has been around for a long time. Many do not realize that Python is older than many other popular programming languages such as Java, Javascript, and Ruby. In that time, standards such as naming conventions changed and the migration to Python 3 was used to normalize some of the inconsistencies in naming within the standard library.

One of the changes that has affected many Python processes was the renaming of the urllib package and its contents. The urllib package holds functionality to create HTTP requests, call HTTP endpoints, and parse HTTP responses.

Other modules renamed included:

  • html
  • http
  • tkinter
  • xmlrpc

Raising and Catching Exceptions

PEP-3109 and PEP-3110 contain the details for the changes to syntax around raising and catching exceptions in Python 3.

In Python 2, there were several ways to raise exceptions:

raise Exception, 'blah' # Python 2
raise Exception('blah') # Python 2 and 3

This was another case where there was duplicated functionality which accomplished the same result. It was proposed to remove the first syntax and keep the second when raising exceptions in Python 3.

Similarly, catching exceptions had multiple syntaxes which were equivalent.

except Exception, e: # Python 2
except Exception as e: # Python 2 and 3

Again, in Python 3 the first syntax is no longer valid and the latter was kept.

Bytes versus Strings

Prior to Python 3, byte and string data types were used interchangeably. This was due to the fact that the default type for string literals in the language was the bytestring type.

'blah' == b'blah' # Only returns True before Python 3

The problem with this functionality is that byte objects do not contain encoding information. So when converting a byte array to a unicode string, the runtime cannot determine the appropriate character set to use automatically. Thus, the language should not allow conversion freely between the types without having the option to set the encoding schema by the programmer.

In Python 3, it was decided to separate bytes and strings. String literals are now considered proper unicode strings with UTF-8 encoding by default. To convert between bytes and strings, you may use encode and decode where the encoding can be changed.

This was a step in the right direction for providing proper unicode support for the language but obviously came at the expense of breaking a subset of existing code. The enhancement was proposed as a part of PEP-358.

Converting from Python 2 to Python 3

As we have seen, the change set for Python 3 contained a lot of syntactical and functional modifications to the language. Many of these changes could break processes written with only Python 2 syntax and packages in mind. So how does one migrate to Python 3? And how does a maintainer support both versions?

The general framework to accomplishing a migration to Python 3 can be followed.

  • Migrate to the latest Python 2.7 release.
  • Use unit testing with sufficient code coverage to test all points in the code.
  • Enable logging and watch for warnings around deprecated functions.
  • Use the __future__ module in order to support both Python 2 and Python 3.

An important module to consider, during version migration and version compatibility support, is the __future__ module. Outlined in PEP-236, importing one of the future statements allows a programmer to introduce the new syntax of newer versions of the core language into older versions of the language. The benefit of using this construct is that code can be updated to new syntax one file at a time as well as maintain support for both versions of the language.

# From - Quick Start Guide:
from __future__ import (absolute_import, division,
                        print_function, unicode_literals)
from builtins import *

Obviously, using future statements will not change your code to the new syntax. If you are looking for a more automated method of converting to Python 3, look into the 2to3 translation script. With this, Python source files are passed in and a series of fixers are applied to convert syntax in place.

Another important feature to enable during unit testing is setting the -3 flag during execution. Setting the python -3 flag enables the outputting of specific deprecation warnings to be visible during execution of code.

Dropping Support for Python 2

We have seen how the many syntax changes in Python 3 have caused compatibility issues between the older versions of the language. And we have seen some of the ways in which we can support Python 2 and 3 as package maintainers. Unfortunately, at the scale in which Python is growing, the Python collective are unable to continue to support both versions forever and at some time, Python 2 support must be dropped.

The end of life of Python 2.7 was officially extended until 2020. After, we will see many popular Python communities dropping support. Many projects are already tracking this timeline and planning the EOL of Python 2:

Improvements in Python 3

We have gone over many of the problems with the migration from Python 2 to 3. But is it worth the effort to do so? Take a look at some of the new functionalities available in Python 3. Some will certainly change your mind.