notes on setting up python projects

Introduction

bdist_wheel
distutils A Python standard library module for packaging and installing modules. Now most maintainers use setuptools, instead of using distutils directly. Like setuptools, distutils expects the maintainer to create a setup.py script containing a call to a function imported from distutils called setup.
easy_install A legacy command line installer distributed with setuptools. Use pip instead.
egg A legacy package format. Egg files have an .egg suffix and are ZIP archives.
pip A command line installer. pip is included with Python 3.4 and later, though some distributions may remove it. Some of advantages of pip over easy_install are (1) installs wheel packages, (2) lists all installed packages, and (3) uninstalls packages.
PyPI The Python Package Index. A repository of Python packages.
requirements.txt A file containing a list of Python packages and optionally versions. pip install -r will read such a file and install the packages. pip freeze will generate such a file, which is useful for recording the dependencies of a project.
sdist A package format introduced by pip. It is a gzipped tarball of Python source code.
setuptools
setup.py
tox
twine A third party Python module and command line tool for uploading packages to PyPI.
venv
virtualenv
wheel

Installing Python

mac

If you use Homebrew, set your Mac up for Python development with these commands:

$ brew install python python3
$ pip install virtualenv tox

Now there will be three Python interpreters:

  • /usr/bin/python
  • /usr/local/bin/python
  • /usr/local/bin/python3

The first two are Python 2.7. One reason for installing another Python 2.7 interpreter is so we have pip. Another is stability when upgrading MacOS. The Python that ships with Mac is said to be modified by Apple, so it is recommended to use a "clean" Python 2.7 in projects.

Homebrew introduces the convention of keeping /usr/local and its subdirectories writable by non-privileged users, so don't run pip with sudo. If you do run sudo pip install, the damage can be fixed by running sudo chown -R $USER in the appropriate directory in /usr/local/lib/pythonX.X.

ubuntu

Install Python 2.7, Python 3.5, and virtualenv:

$ sudo apt install python2.7 python3
$ sudo -H pip install virtualenv tox

Installing Packages

virtualenv

Now use virtualenv to select a Python version and install private pip packages for the project:

$ virtualenv -p python3.5 ve
$ . ve/bin/activate
$ pip install flask

With Python 3.3 or later, you can use python -m venv instead of virtualenv, but it is not exactly the same since it uses global site packages by default.

To be explicit about the version of Python, the -p/--python flag will take either a interpreter path or an interpreter basename (e.g. python3.5). The latter is more portable. In the case of Python 2.7 with two interpreters installed, it should use the freshly installed, not the Apple provided interpreter.

I've encountered a bug where virtualenv cannot create a Python 3.5 environment. I worked around the problem by uninstalling virtualenv with pip and then re-installing it with pip3.

Sourcing the ve/bin/activate shell script modifies the prompt of your shell. This is the remind you which virtualenv you are in. You can prevent it by putting this in your .bashrc:

export VIRTUAL_ENV_DISABLE_PROMPT=1

Here is a

ve:
	virtualenv -p python3.5 ve
	. ./ve/bin/activate && pip install -r requirements.txt

pip

The pip packages for a project are usually kept in the requirements.txt file at the project root. Here is how to create requirements.txt file, assuming the packages have already been installed manually, and later how to install those packages:

$ pip freeze > requirements.txt
$ pip install -r requirements.txt

wheel

Create wheel files for NumPy and SciPy:

$ virtualenv -p python2.7 ve
$ . ve/bin/activate
$ pip install wheel
$ pip wheel numpy scipy

Create wheel files for all requirements.txt packages:

$ pip wheel -r requirements.txt

TBD: how to use wheel files

tox

Here is an example of using Tox to test a packaged product against multiple versions of Python.

Here is what the project looks like:

$ cat hello.py
#!/usr/bin/env python

print "Hello, World!"

$ cat setup.py
#!/usr/bin/env python

from distutils.core import setup

setup(name='Hello',
      version='1.0',
      description='Hello World script',
      author='John Doe',
      author_email='jdoe@python.net',
      url='https://nowhere.com/hello',
      packages=[],
     )

$ cat tox.ini
[tox]
envlist = py27,py35
[testenv]
commands=./hello.py

Run tox to check against the supported versions:

$ tox
GLOB sdist-make: /Users/clark/Lang/Python/my_project/setup.py
py27 inst-nodeps: /Users/clark/Lang/Python/my_project/.tox/dist/Hello-1.0.zip
py27 installed: You are using pip version 6.1.1, however version 8.1.2 is available.,You should consider upgrading via the 'pip install --upgrade pip' command.,Hello==1.0
py27 runtests: PYTHONHASHSEED='3322588766'
py27 runtests: commands[0] | ./hello.py
Hello, World!
py35 inst-nodeps: /Users/clark/Lang/Python/my_project/.tox/dist/Hello-1.0.zip
py35 installed: You are using pip version 6.1.1, however version 8.1.2 is available.,You should consider upgrading via the 'pip install --upgrade pip' command.,Hello==1.0
py35 runtests: PYTHONHASHSEED='3322588766'
py35 runtests: commands[0] | ./hello.py
  File "/Users/clark/Lang/Python/my_project/hello.py", line 3
    print "Hello, World!"
                        ^
SyntaxError: Missing parentheses in call to 'print'
ERROR: InvocationError: '/Users/clark/Lang/Python/my_project/hello.py'
_________________________________________________________________________________________________ summary _________________________________________________________________________________________________
  py27: commands succeeded
ERROR:   py35: commands failed

Code Checks

pep8

pep8 flags code which does not conform to PEP 8.

$ . ve/bin/activate
$ pip install pep8
$ ./ve/bin/pep8 foo.py

PEP 8 calls for a maximum line length of 79 characters, but projects are allowed to use a diffierent limit:

$ ./ve/bin/pep8 --max-line-length=100 foo.py

Here is a make target for running pep8:

python_src := src

.PHONY: pep8
pep8:
	find $(python_src) -name '*.py' | xargs pep8 --max-line-length=100

pylint

pylint uses static analysis to find run time errors, such as use of undeclared variables.

$ . ve/bin/activate
$ pip install pylint
$ ./ve/bin/pylint foo.py

Violations of a certain type can be disabled:

$ ./ve/bin/pylint -d missing-docstring,blacklisted-name foo.py

How to specify the configuration file pylint uses:

$ ./ve/bin/pylint --rcfile .pylintrc

Here is an example .pylintrc file:

[MESSAGES CONTROL]
disable=invalid-name,redefined-outer-name,superfluous-parens,too-many-arguments,too-many-branches,too-many-locals,duplicate-code,too-few-public-methods,too-many-public-methods,no-self-use,too-many-return-statements,too-many-statements,too-many-instance-attributes,too-many-lines

Here is a make target for running pylint. It assumes a .pylintrc file at the project root:

python_src := src

.PHONY: pylint
pylint: | ve
	. ve/bin/activate && find $(python_src) -name '*.py' | xargs pylint --rcfile ./.pylintrc -d missing-docstring

unit tests

$ cat test_foo.py
import unittest

class TestFoo(unittest.TestCase):
    def test_01(self):
        self.assertTrue(True, 'not True!')

if __name__ == '__main__':
    unittest.main()

$ python test_foo.py
$ python test_foo.py TestFoo.test_01

A make target for running unit tests:

.PHONY: test
test: | ve
	. ./ve/bin/activate && find test/unit -name '*.py' | PYTHONPATH=. xargs -n 1 python

test.%: ve
	. ./ve/bin/activate && find test/unit/test_$*.py -name '*.py' | PYTHONPATH=. xargs -n 1 python

code coverage

TBD

Libraries

PYTHONPATH

TBD

Logging

TBD

Error Handling

TBD

Unicode

Dealing with Unicode data can be confusing. How Python handles Unicode changed between Python 2 and Python 3.

Python 2

source encoding

By default, Python 2.7 source code is assumed to have US-ASCII encoding. Trying to execute this file results in a SyntxError:

#!/usr/bin/env python

print('λ')

It is possible to change the source code encoding to UTF-8:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

print('λ')

Even if the source code encoding is UTF-8, most Unicode characters cannot be used outside of string literals. Running this raises a SyntaxError:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

λ = 'lambda'

print(λ)

types

The string types are str, unicode, and bytearray. Here is how to perform a type test:

if type('lorem ipsum') == str:
    print("It's a string.")

if isistance('lorem ipsum', str):
    print(It's a string.")

str is an immutable array of bytes. The 'lorem ipsum' and "lorem ipsum" literals create str objects. For compatibility with Python 3, the bytes constructor is a synonym for the str constructor, and the b'lorem ipsum' and b"lorem ipsum" literals create str objects.

unicode is an immutable array of Unicode characters. The u'lorem ipsum' and u"lorem ipsum" literals create unicode objects. These literals support \uXXXX and \UXXXXXXXX escape sequences for specifying Unicode code points with hex digits. \OOO and {{\xXX} style escapes are supported, though the numbers are taken to be Unicode points, not bytes.

bytearray is a mutable array of bytes. There are no literals. Objects are created in this manner:

data = bytearray('lorem ipsum')

conversions

Use the constructors to convert between mutable and immutable arrays of bytes:

mutable = bytearray('lorem ipsum')

immutable = str(mutable)

Use decode to convert a str object to unicode object:

'lorem ipsum'.decode('utf-8')

# raises UnicodeDecodeError:
'\xff'.decode('utf-8')

Use encode to convert a unicode object to a str object:

u'\u03bb'.encode('utf-8')

# raises UnicodeEncodeError:
u'\u03bb'.encode('ascii')

file handles

The open builtin function returns a file object. The file object has read and write methods which take and return str objects, respectively. The file object also has an encoding attribute, which is None.

f = open('/etc/hosts')
s = f.read()

fout = open('/tmp/hosts', 'w')
fout.write(s)

The codecs.open function returns a wrapper to a file object. It has read and write methods, and these take and return str objects if the encoding is None, and unicode objects otherwise. The wrapper object saves the encoding in the encoding attribute.

import codecs

f = codecs.open('/etc/hosts', encoding='utf-8')
s = f.read()

fout = codecs.open('/tmp/hosts', 'w', encoding='utf-8')
fout.write(s)

The codecs.open wrapper object read will attempt to decode bytes if the encoding is not None. Thus a UnicodeDecodeError is possible.

The codecs.open wrapper object write method will also accept a str argument. It will attempt to decode it to unicode using the encoding, if one was provided, and a UnicodeDecodeError could result.

If we want to read unicode strings from standard input:

f = codecs.open('/dev/stdin', encoding='utf-8')

If we wanted to treat all of the standard file handles as UTF-8 encoded streams, we could do something like the following:

import codecs
import sys

ENCODING = 'utf-8'

sys.stdin = codecs.getreader(ENCODING)(sys.stdin)
sys.stdout = codecs.getwriter(ENCODING)(sys.stdout)
sys.stderr = codecs.getwriter(ENCODING)(sys.stderr)

Python 3

source encoding

The Python 3 interpreter assumes a UTF-8 encoding:

#!/usr/bin/env python3

print('λ')

It is possible to change it. Trying to execute this file results in a syntax error:

#!/usr/bin/env python3
# -*- coding: us-ascii -*-

print('λ')

Python 3 allows Unicode characters in the letter category to be used in identifiers:

#!/usr/bin/env python3

λ = 'lambda'
print(λ)

How to get the category for a character. The categories are "Lettter, Uppercase", "Letter, Lowercase", and "Letter, Other":

$ python3
>>> import unicodedata
>>> unicodedata.category('L')
'Lu'
>>> unicodedata.category('l')
'Ll'
>>> unicodedata.category('Λ')
'Lu'
>>> unicodedata.category('λ')
'Ll'
>>> unicodedata.category('人')
'Lo'

types

The string types are str, bytes, and bytearray. Here is how to perform a type test:

if type('lorem ipsum') == str:
    print("It's a string.")

if isistance('lorem ipsum', str):
    print(It's a string.")

str is an immutable array of unicode characters. The literals 'lorem ipsum' and "lorem ipsum" create str objects. These literals support \uXXXX and \UXXXXXXXX escape sequences for specifying Unicode code points with hex digits. For compatibility with Python 2, the u'lorem ipsum' and u"lorem ipsum" literals also create str objects.

bytes is an immutable array of bytes. The literals b'lorem ipsum' and b"lorem ipsum" create bytes objects.

bytearray is a mutable array of bytes. There are no literals. Objects are created in this manner:

data = bytearray(b'lorem ipsum')

conversions

Use the constructors to convert between mutable and immutable arrays of bytes:

mutable = bytearray('lorem ipsum')

immutable = bytes(mutable)

Use decode to convert a bytes object to str object:

b'lorem ipsum'.decode('utf-8')

# raises UnicodeDecodeError:
b'\xff'.decode('utf-8')

Use encode to convert a str object to a bytes object:

'\u03bb'.encode('utf-8')

# raises UnicodeEncodeError:
'\u03bb'.encode('ascii')

file handles

We can open file handles and read and write bytes:

f = open('/etc/hosts')
s = f.read()

fout = open('/tmp/hosts', 'w')
fout.write(s)

If we specify an encoding when we open a file handle, we can read and write str objects. When reading, there is a possibility of a UnicodeDecodeError:

f = open('/etc/hosts', encoding='utf-8')
s = f.read()

fout = open('/tmp/hosts', 'w', encoding='utf-8')
fout.write(s)

The argument to write must be a str object if an encoding was set when the file handle was opened.

TBD: reading from stdin, writing to stdout

Command Line Tool

argparse

TBD

supervisord

TBD

Webserver

flask

TBD

gunicorn

TBD

Parallelization

multiprocessing

TBD

celery

TBD