27 November 2017

Python packaging primer

There is a great initiative at place where I work currently called Python Bunch. Every couple of weeks someone gives a talk related to Python. I have decided to give one in regards to Python packaging, most challenging thing in Python for me. Possibly more complex than meta programming and monads ;) This post will be a preparation for this talk as I'm about to write down everything I would like to talk about. I'll start with basics describing what is package, basic elements of package configuration, how to install dependencies. Next is how to create a proper setup.py and how to define dependencies.

Foundation

The basic question is what the Python package really is. In short it's a bunch of files which are installed when you do a pip install, but in order for pip to make sense out of this bunch there must be a proper structure. Python package in fact is an archive including required files, like setup.py and optional ones like setup.cfg, and MANIFEST. The directory structure is dependent on the format used, eggs are different to wheels (these are package formats). Both formats have different directory structure but starting point for both is same and it consists of four files I have mentioned.

setup.py describes the package. It is in my knowledge file that is required to build a package. Details like name, version, repository url, description, dependencies, category, etc. are defined in this file. pypi uses this file to create a descriptive page listing for example supported versions.

setup.cfg is a set of default options for setup.py commands like bdist or sdist. It may be used by other tools like bamp or pytest for the same purpose of keeping configuration.

MANIFEST is another list. If your package requires to have any non code files, maybe a documentation, or .csv files with data, it should be added to this file.

Writing setup.py

In the old days it was rather daunting task for me to write a setup.py. It was either because I was pretty fresh to Python or there was no tooling. Rather former than latter ;) Today I would use cookiecutter solution for a python package, it may be a bit overwhelming but it's a good place to start if you are already familiar with cookiecutter. This is however for people starting from scratch. If you already have a package and would like to upgrade your setup use pyroma tool to rate your packaging skills. You will also get information out of pyroma what is missing from your package definition, so it helps to iron out the kinks. Another approach is to have a look at a couple of widely used and popular packages like flask, requests or django and base your file on their approach. Please be aware that they most probably include things not needed by your package, it's worth to look at arguments passed into setup() anyway.

As an example of pyroma usage here is a grade of one of the packages I have written.

majki@enchilada ~/projects/priv
% pyroma bamp
------------------------------
Checking bamp
Found bamp
------------------------------
Your package does not have keywords data.
------------------------------
Final rating: 9/10
Cottage Cheese
------------------------------

 
It's not a bad rating and here is the villain. It is also a good starting point, despite a bit of bloat you most probably won't need.

# -*- coding: utf-8 -*-
from setuptools import setup, find_packages

setup(
    name='bamp',
    version='0.2.2',
    install_requires=['Click', 'dulwich', 'six'],
    entry_points='''
      [console_scripts]
      bamp=bamp.main:bamp
      ''',
    packages=find_packages(),
    long_description='Bamp version of your packages according to semantic versioning. Automagically create commits and tags.',
    include_package_data=True,
    zip_safe=True,
    description='Bamp version according to semantic versioning',
    author='Michał Klich',
    author_email='michal@michalklich.com',
    url='https://github.com/inirudebwoy/bamp',
    setup_requires=['pytest-runner'],
    tests_require=['pytest'],
    license='MIT',
    classifiers=['Development Status :: 4 - Beta', 'Environment :: Console',
                 'License :: OSI Approved :: MIT License',
                 'Operating System :: OS Independent',
                 'Programming Language :: Python :: 2',
                 'Programming Language :: Python :: 2.7',
                 'Programming Language :: Python :: 3',
                 'Programming Language :: Python :: 3.4',
                 'Programming Language :: Python :: 3.5',
                 'Topic :: Software Development :: Build Tools'])


Defining dependencies

There is a good chance that you need to define a couple dependencies when building a package. You may do it in requirements.txt file I have mentioned couple paragraphs before, but if you had a glimpse at online resources you may be convinced there is another place for dependencies. I'm referring to install_requires argument passed into setup(), which accepts a list of strings defining package names and versions. I gotta say this is incorrect and the purpose of those two list is quite different.

requirements.txt

In order to explain the difference I'll start with clarifying what requirements.txt is used for. Simply put it is a line by line directory of packages. Below is a plain example of really short file.

click==6.7
six==1.10.0
dulwich==0.16.3

So it is a list. List of names and versions, and it may look similar to something you have seen already. You are right, the list is in format of pip freeze output. It is an exact reflection of your current environment and there is a good reason for it. Some may call it a concrete dependency, as it won't change and when installed give you the same setup each time. The word in bold couple sentences back is environment, and to me it is very similar to a deployment definition. If you have written or even saw any deployment configurations, either docker or ansible, Python packages are installed by running pip install -r requirements.txt. Since run of deployment script should give you exact same results these dependencies are called 'concrete'. To illustrate this better requirements.txt can specify the source from packages are pulled, be it public PyPI, private on premises version of PyPI, or your local directory.

# public PyPI
--index-url https://pypi.python.org/simple/

click==6.7
six==1.10.0
dulwich==0.16.3
# private PyPI
--index-url https://gizmo.biz/pypi/

click==6.7
six==1.10.0
dulwich==0.16.3
# local directory
--find-links=/local/dir/

click==6.7
six==1.10.0
dulwich==0.16.3

install_requires

install_requires consist of other list, the abstract dependencies list. Items on this list should more loosely define a set of dependencies. What I mean by loosely is no versions, and definitely no links to package sources (private PyPI, or directory, etc.). In it's simplest form it's just a bunch of names.

install_requires=['Click', 'dulwich', 'six']

You may include a minimal working version if you know that your library will not work with anything below a certain version. In case of SemVer you may also define a higher boundary for a version if you know of any incompatible changes.

install_requires=['Click', 
                  'dulwich>=0.16', 
                  'six>=1,<2']

It is a flexible system, but on the other hand allows to specify some set of rules to follow when installing a package.Having pinned versions or even using dependency_links is not advised. Defining such dependencies and requirements for package sourcesmay not be possible to fulfill by a person fetching this code. It is a frustrating thing to work offline with a cache of PyPI and havingto modify setup() call in order to even start development.

Hopefully after reading this you realized, as I did at some point, that packaging is not that difficult. To be fair it can be better, and it is getting better. Now you have the foundation to make a package with 10/10 grade given by pyroma.

What cheese is your package?

Tags: setup.py packaging python