The problem with packaging in Python

24 February 2015 (updated 04 March 2015)

Packaging is currently too hard in Python, and while there's effort to improve it, it's still largely focused on the problem of installing. The current approach is to just throw docs and specs at the building part: [2]

Drown it in docs.

Lets make docs! Must be poorly documented if no one understands it.

Why do we need a damn mountain of docs? Because when building a distribution the user experience is like this:

A screenshot of Microsoft Word at it's worst

Thanks for asking Mr. Clippy, I'd like to package code without going mad.

There are so many things going on in setup.py:

  • Do you use py_modules or packages?
  • Do you hardcode the lists for py_modules or packages? Do you use setuptools.find_packages? What are the right arguments?
  • What about package_dir?
  • Do you want to distribute files that aren't code? Though luck: more buttons!
    • Do you use a MANIFEST?
    • Or maybe a MANIFEST.in is better? What the hell do I put in there? There's include, recursive-include, global-include, graft. Where do I need exclude, recursive-exclude, global-exclude or prune?
    • How about data_files or package_data?
    • What about include_package_data?
No one is going to read the list above, let alone understand what everything means!

We don't need a goddamn mountain of docs, we need something that's so simple even a monkey could publish a decent distribution on PyPI. But that means cutting down features ...

The perspective problem *

There are lots of improvements made in PEP-376, PEP-345, PEP-425, PEP-427 and PEP-426, but they are all improvements that allow tools like pip to work better. They still don't make my life easier, as a packager - the user of setuptools or distutils.

Don't get me wrong, it's good that we got those but I think there should be some focus on making a simpler packaging tool. An alternative to setuptools/distutils that has less features, more constraints but way easier to use. Sure, anyone can try to make something like that, but if it's not officially sanctioned it's going to have very limited success.

It has been tried before *

There have been attempts to replace the unholy duo [1] we have now but alas, the focus was wrong. There have been two directions of improvement:

  • Internals: better architecture to make the build tools more maintainable/extensible/whatever. Distutils2 was the champion of this approach.
  • Metadata as configuration: the "avoid code" mantra. Move the metadata in a configuration file, and avoid the crazy problems usually happen when you let users put code in setup.py. Distutils2 championed this idea and it lives today through d2to1.

However, the way code and data files are collected didn't change. As a packager, you still have to deal with the dozen confusing buttons. [3]

d2to1 is not better in this regard. In fact, it's worse because you have to hardcode metadata and there's no automatic discovery for whatever you're trying to package. [4]

The current course *

PEP-426 will open up possibilities of custom build systems, something else than setuptools, that could hypothetically solve all sorts of niche problems like C extensions with unusual dependencies.

What I dream of *

What if there would be a build system just for pure-Python distributions (and maybe some C extension with no dependencies)? Something that has some strong conventions: code in this place, docs in that place - no exceptions. Something like cargo has. Maybe with a nice project scaffold generator.

Of course, anyone can say: PEP-426 lets you build whatever you want, just do it! However, to make something really simple to use some conventions need to be broken, and if you want to convert your project some effort would be needed. You see, if it's not officially sanctioned it's not going to pick up. Death by lack of interest.

And if it doesn't pick up, then the vast majority of packagers are going to stick with the complicated setup.py we have now.

In a way, packaging in Python is a victim of bad habits - complex conventions, and feature bloat. It's hard to simplify things, because all the historical baggage people want to carry around. But it there's some official sanctioning then it's easier to accept the hard changes.

Concretely what I want is along these lines:

  • Get rid of py_modules, packages and package_dir. Just discover automatically whatever you have in a src dir.
  • Get rid of MANIFEST, MANIFEST.in and the baffling trio of package_data, data_files and include_package_data. Just take all the files are inside packages. Use .gitignore to exclude files.
  • Have a single way to store and retrieve metadata like the version in your code. Not a handful of ways.

In other words, one way to do it. Not one clear way, cause we document the hell out of it, but one, and only one, way to do it. What do you think, could it work? Would it improve anything?

[1]Distutils and setuptools: the confusing system everyone loves to hate.
[2]

There are a ton of places where you can find information about packaging, of various quality and freshness. At least now there's sanctioned place to go to: https://packaging.python.org/en/latest/distributing.html

Still, there's so much to read. What if there wouldn't be a need to know so much to package stuff?

[3]Does this look familiar? It has mostly the same options as distutils's setup. Too many options. Still lots of trial and error to make a distribution.
[4]Hardcoding information that you already have in the filesystem is a sure way to make mistakes. More about this: Python packaging pitfalls.

This entry was tagged as packaging python