Packaging a python library

Sun 25 May 2014


This is about packaging libraries, not applications.


All the advice here is implemented in a project template (with full support for C extensions): cookiecutter-pylibrary (introduction).

I think the packaging best practices should be revisited, there are lots of good tools now-days that are either unused or underused. It's generally a good thing to re-evaluate best practices all the time.

I assume here that your package is to be tested on multiple Python versions, with different combinations of dependency versions, settings etc.

And few principles that I like to follow when packaging:

  • If there's a tool that can help with testing use it. Don't waste time building a custom test runner if you can just use py.test or nose. They come with a large ecosystem of plugins that can improve your testing.
  • When possible, prevent issues early. This is mostly a matter of strictness and exhaustive testing. Design things to prevent common mistakes.
  • Collect all the coverage data. Record it. Identify regressions.
  • Test all the possible configurations.

The structure *

This is fairly important, everything revolves around this. I prefer this sort of layout:

β”œβ”€ src
β”‚  └─ packagename
β”‚     β”œβ”€
β”‚     └─ ...
β”œβ”€ tests
β”‚  └─ ...

The src directory is a better approach because:

  • You get import parity. The current directory is implicitly included in sys.path; but not so when installing & importing from site-packages. Users will never have the same current working directory as you do.

    This constraint has beneficial implications in both testing and packaging:

    • You will be forced to test the installed code (e.g.: by installing in a virtualenv). This will ensure that the deployed code works (it's packaged correctly) - otherwise your tests will fail. Early. Before you can publish a broken distribution.
    • You will be forced to install the distribution. If you ever uploaded a distribution on PyPI with missing modules or broken dependencies it's because you didn't test the installation. Just beeing able to successfuly build the sdist doesn't guarantee it will actually install!
  • It prevents you from readily importing your code in the script. This is a bad practice because it will always blow up if importing the main package or module triggers additional imports for dependencies (which may not be available [5]). Best to not make it possible in the first place.

  • Simpler packaging code and manifest. It makes manifests very simple to write (e.g.: you package a Django app that has templates or static files). Also, zero fuss for large libraries that have multiple packages. Clear separation of code being packaged and code doing the packaging.

    Without src writting a is tricky [6]. If your manifest is broken your tests will fail. It's much easier with a src directory: just add graft src in

    Publishing a broken package to PyPI is not fun.

  • Without src you get messy editable installs (" develop" or "pip install -e"). Having no separation (no src dir) will force setuptools to put your project's root on sys.path - with all the junk in it (e.g.: and other test or configuration scripts will unwittingly become importable).

  • There are better tools. You don't need to deal with installing packages just to run the tests anymore. Just use tox - it will install the package for you [2] automatically, zero fuss, zero friction.

  • Less chance for user mistakes - they will happen - assume nothing!

  • Less chance for tools to mixup code with non-code.

Another way to put it, flat is better than nested [*] - but not for data. A file-system is just data after all - and cohesive, well normalized data structures are desirable.

You'll notice that I don't include the tests in the installed packages. Because:

  • Module discovery tools will trip over your test modules. Strange things usually happen in test module. The help builtin does module discovery. E.g.:

    >>> help('modules')
    Please wait a moment while I gather a list of all available modules...
    __future__          antigravity         html                select
  • Tests usually require additional dependencies to run, so they aren't useful by their own - you can't run them directly.

  • Tests are concerned with development, not usage.

  • It's extremely unlikely that the user of the library will run the tests instead of the library's developer. E.g.: you don't run the tests for Django while testing your apps - Django is already tested.

Alternatives *

You could use src-less layouts, few examples:

Tests in package Tests outside package
β”œβ”€ packagename
β”‚  β”œβ”€
β”‚  β”œβ”€ ...
β”‚  └─ tests
β”‚     └─ ...
β”œβ”€ packagename
β”‚  β”œβ”€
β”‚  └─ ...
β”œβ”€ tests
β”‚  └─ ...

These two layouts became popular because packaging had many problems few years ago, so it wasn't feasible to install the package just to test it. People still recommend them [4] even if it based on old and oudated assumptions.

Most projects use them incorectly, as all the test runners except Twisted's trial have incorrect defaults for the current working directory - you're going to test the wrong code if you don't test the installed code. trial does the right thing by changing the working directory to something temporary, but most projects don't use trial.

The setup script *

Unfortunately with the current packaging tools, there are many pitfalls. The script should be as simple as possible:

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
from __future__ import absolute_import
from __future__ import print_function

import io
import re
from glob import glob
from os.path import basename
from os.path import dirname
from os.path import join
from os.path import splitext

from setuptools import find_packages
from setuptools import setup

def read(*names, **kwargs):
        join(dirname(__file__), *names),
        encoding=kwargs.get('encoding', 'utf8')

    description='An example package. Generated with',
    long_description='%s\n%s' % (
        re.compile('^.. start-badges.*^.. end-badges', re.M | re.S).sub('', read('README.rst')),
        re.sub(':[a-z]+:`~?(.*?)`', r'``\1``', read('CHANGELOG.rst'))
    author='Ionel Cristian Mărieș',
    package_dir={'': 'src'},
    py_modules=[splitext(basename(path))[0] for path in glob('src/*.py')],
        # complete classifier list:
        'Development Status :: 5 - Production/Stable',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: BSD License',
        'Operating System :: Unix',
        'Operating System :: POSIX',
        'Operating System :: Microsoft :: Windows',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.3',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: Implementation :: CPython',
        'Programming Language :: Python :: Implementation :: PyPy',
        # uncomment if you test on these interpreters:
        # 'Programming Language :: Python :: Implementation :: IronPython',
        # 'Programming Language :: Python :: Implementation :: Jython',
        # 'Programming Language :: Python :: Implementation :: Stackless',
        'Topic :: Utilities',
        # eg: 'keyword1', 'keyword2', 'keyword3',
        # eg:
        #   'rst': ['docutils>=0.11'],
        #   ':python_version=="2.6"': ['argparse'],
        'console_scripts': [
            'nameless = nameless.cli:main',

What's special about this:

  • No exec or import trickery.
  • Includes everything from src: packages or root-level modules.
  • Explicit encodings.

Running the tests *

Again, it seems people fancy the idea of running python test to run the package's tests. I think that's not worth doing - test is a failed experiment to replicate some of CPAN's test system. Python doesn't have a common test result protocol so it serves no purpose to have a common test command [1]. At least not for now - we'd need someone to build specifications and services that make this worthwhile, and champion them. I think it's important in general to recognize failure where there is and go back to the drawing board when that's necessary - there are absolutely no services or tools that use test command in a way that brings added value. Something is definitely wrong here.

I believe it's too late now for PyPI to do anything about it, Travis is already a solid, reliable, extremely flexible and free alternative. It integrates very well with Github - builds will be run automatically for each Pull Request.

To test locally tox is a very good way to run all the possible testing configurations (each configuration will be a tox environment). I like to organize the tests into a matrix with these additional environments:

  • check - check package metadata (e.g.: if the restructured text in your long description is valid)
  • clean - clean coverage
  • report - make coverage report for all the accumulated data
  • docs - build sphinx docs

I also like to have environments with and without coverage measurement and run them all the time. Race conditions are usually performance sensitive and you're unlikely to catch them if you run everything with coverage measurements.

The test matrix *

Depending on dependencies you'll usually end up with a huge number of combinations of python versions, dependency versions and different settings. Generally people just hard-code everything in tox.ini or only in .travis.yml. They end up with incomplete local tests, or test configurations that run serially in Travis. I've tried that, didn't like it. I've tried duplicating the environments in both tox.ini and .travis.yml. Still didn't like it.


This technique is a bit outdated now. It still works fine but for simple matrices you can use a tox generative envlist (it was implemented after I wrote this blog post, unfortunately).


See python-nameless for an example using that.

As there were no readily usable alternatives to generate the configuration, I've implemented a generator script that uses templates to generate tox.ini and .travis.yml. This is way better, it's DRY, you can easily skip running tests on specific configurations (e.g.: skip Django 1.4 on Python 3) and there's less work to change things.

The essentials (full code):

setup.cfg *

The generator script uses a configuration file (setup.cfg for convenience):

skip = migrations, south_migrations

# This is the configuration for the `./` script.
# It generates `.travis.yml`, `tox.ini` and `appveyor.yml`.
# Syntax: [alias:] value [!variable[glob]] [&variable[glob]]
# alias:
#  - is used to generate the tox environment
#  - it's optional
#  - if not present the alias will be computed from the `value`
# value:
#  - a value of "-" means empty
# !variable[glob]:
#  - exclude the combination of the current `value` with
#    any value matching the `glob` in `variable`
#  - can use as many you want
# &variable[glob]:
#  - only include the combination of the current `value`
#    when there's a value matching `glob` in `variable`
#  - can use as many you want

python_versions =

dependencies =
#    1.4: Django==1.4.16 !python_versions[3.*]
#    1.5: Django==1.5.11
#    1.6: Django==1.6.8
#    1.7: Django==1.7.1 !python_versions[2.6]
# Deps commented above are provided as examples. That's what you would use in a Django project.

coverage_flags =
    cover: true
    nocov: false

environment_variables =

ci/ *

This is the generator script. You run this whenever you want to regenerate the configuration:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, unicode_literals

import os
import sys
from os.path import abspath
from os.path import dirname
from os.path import exists
from os.path import join

if __name__ == "__main__":
    base_path = dirname(dirname(abspath(__file__)))
    print("Project path: {0}".format(base_path))
    env_path = join(base_path, ".tox", "bootstrap")
    if sys.platform == "win32":
        bin_path = join(env_path, "Scripts")
        bin_path = join(env_path, "bin")
    if not exists(env_path):
        import subprocess

        print("Making bootstrap env in: {0} ...".format(env_path))
            subprocess.check_call(["virtualenv", env_path])
        except subprocess.CalledProcessError:
            subprocess.check_call([sys.executable, "-m", "virtualenv", env_path])
        print("Installing `jinja2` and `matrix` into bootstrap environment...")
        subprocess.check_call([join(bin_path, "pip"), "install", "jinja2", "matrix"])
    activate = join(bin_path, "")
    # noinspection PyCompatibility
    exec(compile(open(activate, "rb").read(), activate, "exec"), dict(__file__=activate))

    import jinja2

    import matrix

    jinja = jinja2.Environment(
        loader=jinja2.FileSystemLoader(join(base_path, "ci", "templates")),

    tox_environments = {}
    for (alias, conf) in matrix.from_file(join(base_path, "setup.cfg")).items():
        python = conf["python_versions"]
        deps = conf["dependencies"]
        tox_environments[alias] = {
            "python": "python" + python if "py" not in python else python,
            "deps": deps.split(),
        if "coverage_flags" in conf:
            cover = {"false": False, "true": True}[conf["coverage_flags"].lower()]
        if "environment_variables" in conf:
            env_vars = conf["environment_variables"]

    for name in os.listdir(join("ci", "templates")):
        with open(join(base_path, name), "w") as fh:
        print("Wrote {}".format(name))

ci/templates/.travis.yml *

This has some goodies in it: the very useful trick.

It basically just runs tox.

language: python
sudo: false
cache: pip
    - LD_PRELOAD=/lib/x86_64-linux-gnu/
    - TOXENV=check
    - TOXENV=docs
{%- for env, config in tox_environments|dictsort %}{{ '' }}
    - python: '{{ '{0[0]}-5.4'.format(env.split('-')) if env.startswith('pypy') else env.split('-')[0] }}'
        - TOXENV={{ env }}{% if config.cover %},report,coveralls,codecov{% endif -%}
{% endfor %}

  - python --version
  - uname -a
  - lsb_release -a
  - pip install tox
  - virtualenv --version
  - easy_install --version
  - pip --version
  - tox --version
  - tox -v
  - more .tox/log/* | cat
  - more .tox/*/log/* | cat
    on_success: never
    on_failure: always

ci/templates/tox.ini *

envlist =
{% for env in tox_environments|sort %}
    {{ env }},
{% endfor %}

basepython =
    {docs,spell}: python2.7
    {bootstrap,clean,check,report,extension-coveralls,coveralls,codecov}: python3
setenv =
passenv =
deps =
commands =
    {posargs:py.test -vv --ignore=src}

setenv =
commands =
    sphinx-build -b spelling docs dist/docs
skip_install = true
usedevelop = false
deps =

deps =
commands =
    sphinx-build {posargs:-E} -b html docs dist/docs
    sphinx-build -b linkcheck docs dist/docs

deps =
skip_install = true
usedevelop = false
commands =
    python ci/
passenv =

deps =
skip_install = true
usedevelop = false
commands =
    python check --strict --metadata --restructuredtext
    check-manifest {toxinidir}
    flake8 src tests
    isort --verbose --check-only --diff --recursive src tests

deps =
skip_install = true
usedevelop = false
commands =
    coveralls []

deps =
skip_install = true
usedevelop = false
commands =
    coverage xml --ignore-errors
    codecov []

deps = coverage
skip_install = true
usedevelop = false
commands =
    coverage combine --append
    coverage report
    coverage html

commands = coverage erase
skip_install = true
usedevelop = false
deps = coverage

{% for env, config in tox_environments|dictsort %}
[testenv:{{ env }}]
basepython = {env:TOXPYTHON:{{ config.python }}}
{% if config.cover or config.env_vars %}
setenv =
{% endif %}
{% for var in config.env_vars %}
    {{ var }}
{% endfor %}
{% if config.cover %}
usedevelop = true
commands =
    {posargs:py.test --cov --cov-report=term-missing -vv}
{% endif %}
{% if config.cover or config.deps %}
deps =
{% endif %}
{% if config.cover %}
{% endif %}
{% for dep in config.deps %}
    {{ dep }}
{% endfor %}

{% endfor %}

ci/templates/appveyor.ini *

For Windows-friendly projects:

version: '{branch}-{build}'
build: off
  - '%LOCALAPPDATA%\pip\Cache'
    WITH_COMPILER: 'cmd /E:ON /V:ON /C .\ci\appveyor-with-compiler.cmd'
    - TOXENV: check
      PYTHON_HOME: C:\Python27
      PYTHON_VERSION: '2.7'
      PYTHON_ARCH: '32'
{% for env, config in tox_environments|dictsort %}{{ '' }}{% if config.python.startswith('python') %}
    - TOXENV: '{{ env }}{% if config.cover %},codecov{% endif %}'
      TOXPYTHON: C:\{{ config.python.replace('.', '').capitalize() }}\python.exe
      PYTHON_HOME: C:\{{ config.python.replace('.', '').capitalize() }}
      PYTHON_VERSION: '{{ config.python[-3:] }}'
      PYTHON_ARCH: '32'
    - TOXENV: '{{ env }}{% if config.cover %},codecov{% endif %}'
      TOXPYTHON: C:\{{ config.python.replace('.', '').capitalize() }}-x64\python.exe
      {%- if config.python != 'python3.5' %}

      WINDOWS_SDK_VERSION: v7.{{ '1' if config.python[-3] == '3' else '0' }}
      {%- endif %}

      PYTHON_HOME: C:\{{ config.python.replace('.', '').capitalize() }}-x64
      PYTHON_VERSION: '{{ config.python[-3:] }}'
      PYTHON_ARCH: '64'

{% endif %}{% endfor %}
  - ps: echo $env:TOXENV
  - ps: ls C:\Python*
  - python -u ci\
  - '%PYTHON_HOME%\Scripts\virtualenv --version'
  - '%PYTHON_HOME%\Scripts\easy_install --version'
  - '%PYTHON_HOME%\Scripts\pip --version'
  - '%PYTHON_HOME%\Scripts\tox --version'
  - '%WITH_COMPILER% %PYTHON_HOME%\Scripts\tox'

  - ps: dir "env:"
  - ps: get-content .tox\*\log\*
  - path: dist\*

### To enable remote debugging uncomment this (also, see:
# on_finish:
#   - ps: $blockRdp = $true; iex ((new-object net.webclient).DownloadString(''))

If you've been patient enough to read through that you'll notice:

  • The Travis configuration uses tox for each item in the matrix. This makes testing in Travis consistent with testing locally.
  • The environment order for tox is clean, check, 2.6-1.3, 2.6-1.4, ..., report.
  • The environments with coverage measurement run the code without installing (usedevelop = true) so that coverage can combine all the measurements at the end.
  • The environments without coverage will sdist and install into virtualenv (tox's default behavior [2]) so that packaging issues are caught early.
  • The report environment combines all the runs at the end into a single report.

Having the complete list of environments in tox.ini is a huge advantage:

  • You run everything in parallel locally (if your tests don't need strict isolation) with detox. And you can still run everything in parallel if you want to use instead of Travis.
  • You can measure cummulated coverage for everything (merge the coverage measurements for all the environments into a single one) locally.

Test coverage *

There's Coveralls - a nice way to track coverage over time and over multiple builds. It will automatically add comments on Github Pull Request about changes in coverage.


  • Put code in src.
  • Use tox and detox.
  • Test both with coverage measurements and without.
  • Use a generator script for tox.ini and .travis.ini.
  • Run the tests in Travis with tox to keep things consistent with local testing.

Too complicated? Just use a python package template.

Not convincing enough? Read Hynek's post about the src layout.

Also worth checking out this short list of packaging pitfalls.

[1]There's subunit and probably others but they are widely used.
[2](1, 2) See example.
[3]There is a feature specification/proposal in tox for multi-dimensional configuration but it still doesn't solve the problem of generating the .travis.yml file. There's also tox-matrix but it's not flexibile enough.
[4]cookiecutter-pypackage is acceptable at the surface level (tests outside, correct MANIFEST) but still has the core problem (lack of src separation) and gives the wrong idea to glancing users.

It's a chicken-and-egg problem: how can pip know what dependencies to install if running the script requires unknownable dependencies?

There are so many weird corners you can get into by having the power to run arbitrary code in the script. This why people tried to change to pure metadata.

[6]Did you know the order of the rules in matters?
[*]PEP-20's 5th aphorism: Flat is better than nested.

This entry was tagged as django packaging python testing