Environments
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do you install and manage packages?
Objectives
Learn about virtual environments
One of Python’s biggest strengths is its ecosystem of packages which build upon and extend the capabilities offered by the standard library.
We’ll look at how:
- You should use virtual environments to manage the dependencies of different projects,
- You can install packages you want to build on using
pip
, and - You can install Python applications you just want to use using
pipx
.
Creating and Activating/Deactivating a Virtual Environment
Whenever we’re developing or working in Python, the best practice is to use a “virtual environment” which isolates the packages used for a particular project.
Python 3 comes with the venv
module built-in, which supports making virtual environments.
To make one, you call the module with:
python3 -m venv .venv
This creates a new directory .venv
containing:
.venv/bin
- A link to the Python version,
- A link to the Python package installer
pip
, and - activation scripts.
.venv/lib/site-packages
where packages will be installed.
To activate the environment, source the activation script:
. .venv/bin/activate
Now .venv/bin
has been added to your PATH, and usually your shell’s prompt
will be modified to indicate you are “in” a virtual environment:
% . .venv/bin/activate
(.venv) %
Upgrade
pip
Check the version of pip installed! If it’s old, you might want to run
pip install --upgrade pip
or, for Python 3.9 or later, you can add--upgrade-deps
to the venv creation line.
To “leave” the virtual environment, you undo those changes by running the deactivate function the activation added to your shell:
deactivate
The prompt will revert:
(.venv) % deactivate
%
Alternatives
There are several alternatives that provide the same experience, but offer some speed or update benefits. The installable
virtualenv
package provides the same interface as the built-invenv
, but is faster, has more options, and has up-to-date embedded pip.Another alternative is
uv
, which is written in Rust, anduv venv
is actually faster than Python can even start up, though it doesn’t install pip by default (since you are supposed to useuv pip
).Use the tool you prefer; the resulting venv works identically.
What about conda?
The same concerns apply to Conda. You should always make separate environments and use those. Quick tips:
conda config --set auto_activate_base false # turn off the default environment conda env create -n some_name # or use paths with `-p` conda activate some_name conda deactivate
Alternative implementations of
conda
are available and may be faster, like:
- Micromamba a single binary, also written in C++,
- Pixi written in Rust.
Mamba, written in C++, became very popular as its package resolver was much faster than the default in
conda
. In 2023,conda
incorporated thelibmamba
package resolver as its default, largely eliminating the speed difference betweenconda
andmamba
.
Installing Packages
To install a package,
first make sure the environment is activated using . .venv/bin/activate
first,
then call:
pip install <package>
Install
numpy
- Install the numpy package into your virtual environment.
- Test it by opening a Python session and running
import numpy as np
thennp.arange(15).reshape(3, 5)
.Solution
% . .venv/bin/activate (.venv) % pip install numpy (.venv) % python Python 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.arange(15).reshape(3, 5) array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) >>> exit() (.venv) %
Installing Packages in the “base” and “user” environment
Be careful installing packages without an activated virtual environment.
You will see two very common recommendations when installing a package, neither of which you should use unless you know what you’re doing:
pip install <package> # Use only in virtual environment!
Unless you’ve activated a virtual environment, this will try to install globally, and if you don’t have permission, will install to your user site packages. In global site packages, you can get conflicting versions of libraries, you can’t tell what you’ve installed for what, packages can update and break your system; it’s a mess. This is the “update problem”.
pip install --user <package> # Almost never use
This will install in your user directory. This is even worse worse, because all installs of Python on your computer share it, so you might override and break things you didn’t intend to. And with pip’s new smart solver, updating packages inside a global environment can take many minutes and produce unexpectedly solves that are technically “correct” but don’t work because it backsolved conflicts to before issues were discovered.
There is a solution: virtual environments (libraries) or pipx (applications).
There are likely a few libraries (ideally just
pipx
) that you just have to install globally. Go ahead, but be careful, and always use your system package manager instead if you can, like
brew
on macOS,winget
orscoop
on Windows,apt-get
on Ubuntu.
Installing Applications
There are many Python packages that provide a command line interface and are
not really intended to be imported (pip
, for example, should not be
imported). It is really inconvenient to have to set up venvs for every command
line tool you want to install, however. pipx
,
from the makers of pip
, solves this problem for you.
If you pipx install
a package, it will be
created inside a new virtual environment, and
just the executable scripts will be exposed in your regular shell.
Pipx also has a pipx run <package>
command, which will download a package and
run a script of the same name, and will cache the temporary environment for a
week. This means you have all of PyPI at your fingertips in one line on any
computer that has pipx installed!
Install
pipx
andcowsay
- Install
pipx
by following the installation instructions.- Test it by running
pipx run cowsay -t "venvs are the foo\!"
Solution
Ubuntu Linux:
sudo apt update sudo apt install pipx pipx ensurepath
macOS:
brew install pipx pipx ensurepath
Then:
% pipx run cowsay -t "venvs are the foo\!" __________________ | venvs are the foo! | ================== \ \ ^__^ (oo)\_______ (__)\ )\/\ ||----w | || ||
Key Points
Virtual environments isolate software
Virtual environments solve the update problem
Code to Package
Overview
Teaching: 20 min
Exercises: 5 minQuestions
How do we take code and turn that into a package?
What are the minimum elements required for a Python package?
How do you set up tests?
Objectives
Create and install a Python package
Create and run a test
Much research software is initially developed by hacking away in an interactive setting, such as in a Jupyter Notebook or a Python shell. However, at some point when you have a more-complicated workflow that you want to repeat, and/or make available to others, it makes sense to package your functions into modules and ultimately a software package that can be installed. This lesson will walk you through that process.
Check Setup
- Ensure you’re in an empty git repository (see Setup for details).
- Ensure you’ve created and activated your virtual environment (see Environment for details):
. .venv/bin/activate
Consider the rescale()
function written as an exercise in the Software Carpentry
Programming with Python lesson.
Install NumPy:
pip install numpy
Then, in a Python shell or Jupyter Notebook, declare the function:
import numpy as np
def rescale(input_array):
"""Rescales an array from 0 to 1.
Takes an array as input, and returns a corresponding array scaled so that 0
corresponds to the minimum and 1 to the maximum value of the input array.
"""
low = np.min(input_array)
high = np.max(input_array)
output_array = (input_array - low) / (high - low)
return output_array
and call the function with:
rescale(np.linspace(0, 100, 5))
which provides the output:
array([ 0. , 0.25, 0.5 , 0.75, 1. ])
Create a minimal package
Let’s create a Python package that contains this function.
Create the necessary directory structure for your package. This includes:
- a
src
(“source”) directory, which will contain another directory calledpackage
for the source files of your package itself, - a
tests
directory, which will hold tests for your package and its modules/functions, - a
docs
directory, which will hold the files necessary for documenting your software package.
$ mkdir -p src/example_package_YOUR_USERNAME_HERE tests docs
(The -p
flag tells mkdir
to create the src
parent directory for example_package_YOUR_USERNAME_HERE
.)
Package naming
The PEP8 style guide recommends short, all-lowercase package names. The use of underscores is also discouraged.
It’s a good idea to keep package names short so that it is easier to remember and type. We are straying from this convention in this tutorial to prevent naming conflicts.
Directory Structure
Putting the package directory and source code inside the
src
directory is not actually required.
- If you put the
<package_name>
directory at the same level astests
anddocs
then you could actually import or call the package directory from that location.- However, this can cause several issues, such as running tests with the local version instead of the installed version.
- In addition, the
src/
package structure matches that of compiled languages, and lets your package easily contain non-Python compiled code, if necessary.
Inside src/example_package_YOUR_USERNAME_HERE
, create the files __init__.py
and rescale.py
:
__init__.py
is required to import this directory as a package, and should remain empty (for now).$ touch src/example_package_YOUR_USERNAME_HERE/__init__.py
rescale.py
is the module inside this package that will contain therescale()
function;$ touch src/example_package_YOUR_USERNAME_HERE/rescale.py
Copy the rescale()
function into rescale.py
file. (Don’t forget the NumPy import!)
The last element your package needs is a pyproject.toml
file. Create this with
$ touch pyproject.toml
and then provide the minimally required metadata, which include
- information about the package itself (
name
,version
anddependencies
) and - build system (hatchling):
# contents of pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "example-package-YOUR-USERNAME-HERE"
version = "0.1.0"
dependencies = [
"numpy"
]
The package name given here, “package,” matches the directory src/package
that contains the code.
We’ve chosen 0.1.0 as the starting version for this package; you’ll see more in a later episode about versioning,
and how to specify this without manually writing it here.
Build Backend
The build backend is a program which will convert the source files into a package ready for distribution. It determines how your project will specify its configuration, metadata and files. You should choose one that suits your preferences.
We selected a specific backend - hatching. It had a good default configuration, is fast, has some powerful features, and supports a growing ecosystem of plugins. There are other backends too, including ones for compiled projects.
Minimal Working Package
The only elements of your package strictly required to install and import it are the
pyproject.toml
,__init__.py
, andrescale.py
files.
At this point, your package’s file structure should look like this:
.
├── docs/
├── pyproject.toml
├── src/
│ └── example_package_YOUR_USERNAME_HERE/
│ │ ├── __init__.py
│ │ └── rescale.py
├── tests/
└── .venv/
Installing and using your package
Now that your package has the necessary elements, you can install it into your virtual environment (which should already be active). From the top level of your project’s directory, enter:
$ pip install --editable .
The --editable
flag tells pip
to install in editable mode, meaning that you can continue developing your package on your computer as you test it. Note, you will often see the short option -e
.
Then, in a Python shell or Jupyter Notebook, import your package and call the (single) function:
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
rescale(np.linspace(0, 100, 5))
array([0. , 0.25, 0.5 , 0.75, 1. ])
This matches the output we expected based on our interactive testing above! 😅
Your first test
Now that we have installed our package and we have manually tested that it works,
let’s set up this situation as a test that can be automatically run using pytest
.
In the tests
directory, create the test_rescale.py
file:
touch tests/test_rescale.py
In this file, we need to import the package, and check that a call to the rescale
function with our known input returns the expected output:
# contents of tests/test_rescale.py
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
def test_rescale():
np.testing.assert_allclose(
rescale(np.linspace(0, 100, 5)),
np.array([0.0, 0.25, 0.5, 0.75, 1.0]),
)
To use pytest to run this (and other tests), we need to install pytest
.
Here we add it to the pyproject.toml
file,
by adding a block below the dependencies with an “extra” called test
:
...
dependencies = [
"numpy"
]
[project.optional-dependencies]
test = ["pytest"]
[build-system]
...
You install the project with the “extra” using:
pip install --editable ".[test]"
You can run the tests using pytest
:
(.venv) % pytest
======================== test session starts ========================
platform darwin -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/john/Developer/packaging-example
configfile: pyproject.toml
collected 1 item
tests/test_rescale.py . [100%]
========================= 1 passed in 0.14s =========================
This tells us that the output of the test function matches the expected result, and therefore the test passes! 🎉
Commit and Push
Don’t forget to commit all your work using git
.
(In the rest of the lesson, we won’t remind you to do this,
but you should still make small commits regularly.)
Commit the relevant files, first the code:
git add src/example_package_YOUR_USERNAME_HERE/{__init__,rescale}.py
git add tests/test_rescale.py
git commit -m "feat: add basic rescaling function
… then the metadata:
git add pyproject.toml
git commit -m "build: add minimal pyproject.toml"
… then push those to the origin
remote repository.
git push origin main
Always
git add
individual files until you’ve set up your.gitignore
When working with git, it’s best to
- stage individual files using
git add FILENAME ...
,- check what you’re about to commit using
git status
, before you- commit with
git commit -m "COMMIT MESSAGE"
.You can also use a graphical tool which makes it easy to see at a glance what is is you’re committing.
Adding a
.gitignore
file (which we’ll cover later), will help avoid inadvertently committing files like the virtual environment directory, and is a prerequisite for usinggit commit -a
which commits everything.
Conventional commits
In this example, we use conventional commit messages which look like
<type>: <description>
.Each commit should do one and only one “thing” to the code, for instance:
- add a new feature (type:
feat
), or- fix a bug (type:
fix
), or- rename a function (type:
refactor
), or- add documentation (type:
docs
), or- change something affecting the build system (type:
build
).By doing only one thing per commit, it’s easier to:
- write the commit message,
- understand the history by looking at the
git log
, and- revert individual changes you later decide could be done in a better way.
Check your package
Check that you can install your package and that it works as expected.
If everything works, you should be able to install your package (in a new virtual environment):
python3 -m venv .venv2 . .venv2/bin/activate python3 -m pip install git+https://github.com/<your github username>/example-package-YOUR-USERNAME-HERE
Open a python console and call the rescale function with some data.
Switch back to the original virtual environment before going onto the next lesson:
. .venv/bin/activate
You now have a package that is:
- installed in editable mode in an isolated environment,
- can be interacted with in tests and in an interactive console, and
- has a passing test.
Next, we’ll look at other files that should be included with your package.
Key Points
Put your code and tests in a standard package structure
Use a
pyproject.toml
file to describe a Python package
Other files that belong with your package
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What other files are important parts of your software package?
Objectives
Create a README for a software package
Add a software LICENSE to a software package
Create a CHANGELOG for a package
We now have an installed, working Python package that provides some functionality. Are we ready to push the code to GitHub (or your preferred code hosting service) for others to use and contribute to? 🛑 Not quite—we need to add a few more files at minimum to describe our package, and to actually make it open-source software.
Aside from the name of the package and docstring included with the (single) function, we haven’t yet provided any description or other information about the package for anybody that comes across it.
We also haven’t specified the terms and conditions under which the software may be downloaded, used, and/or modified. This means that if we posted it online right now, due to copyright laws (in the United States, at least) nobody else would actually be able to use or modify the code, since we haven’t given explicit permission to do so.
Lastly, as you continue working on your package, you will likely fix bugs and modify/add/remove functionality. Although these changes will technically be present in your Git logs—because you are committing regularly and writing descriptive commit messages, right? 😉—you should also maintain a file that describes these changes in a human-readable way.
Creating a README
A README is a plaintext file that sits at the top level of your package (next to the src
, tests
, docs
directories and pyproject.toml
file) and provides general information about your software.
Modern READMEs are typically written in Markdown, or occasionally reStructuredText (ReST), due to the additional formatting options that services like GitHub nicely render.
A README is a form of software documentation, and should contain at minimum:
- the name of your software package
- a brief description of what your software does or provides
- installation instructions
- a brief usage example
- the type of software license (with more information in a separate
LICENSE
file, described next)
In addition, a README may also contain:
- badges near the top that quickly show key information, such as the latest version, whether the tests are currently passing
- information about how people can contribute to your package
- a code of conduct for people interacting around your project (in GitHub Issues or Pull Requests, for example)
- contact information for authors and/or maintainers
Create a README using
$ touch README.md
and then add these elements:
# Example Package YOUR USERNAME HERE
`example-package-YOUR-USERNAME-HERE` is a simple Python library that contains a single function for rescaling arrays.
## Installation
Download the source code and use the package manager [pip](https://pip.pypa.io/en/stable/) to install `package`:
```bash
pip install .
```
## Usage
```python
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
# rescales over 0 to 1
rescale(np.linspace(0, 100, 5))
```
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
## License
TBD
You can see more guidance on creating READMEs at https://www.makeareadme.com.
Keep your READMEs relatively brief
You should try to keep your README files relatively brief, rather than including very detailed documentation to this file. This should only be a high-level introduction, with detailed theory, examples, and other information reserved for a true documentation website.
Choosing a software license
Now, your package includes a README file, which tells someone who finds the source code a bit about how to use your project and also contribute to it. However, you still need one more element before uploading it to GitHub or another code-hosting service: a software license that explicitly gives specific permissions to users and contributors. Simply making your project available publicly is not the same as making it an open-source software project.
By default, when you make a creative work such as software (but also including writing and images), your work is under exclusive copyright. Others cannot use, copy, share/distribute, or modify your work without your permission. This is often a good thing, because it means you can put your work out into the world, and copyright protects you as the creator and owner of the work. Open Source Guides has more about the legal side of open source software.
However, if you have created research software and plan to share it openly, you want others to use your software, and possibly contribute to it. (Who doesn’t love having other people fix the bugs in their code?)
A software license provides the explicit permissions for others to use, modify, or share your code, and lays out the specific rules for any restrictions about how they can do those things. To pick a license, use resources like Choose a License or Civic Commons “Choosing a License” based on how you want others to interact with your software. You can also see the full list of open-source licenses approved by the Open Source Initiative, which maintains the Open Source Definition.
For a new project, you essentially have one major choice to make:
- Do you want to allow others to use your software in almost any way they want, or
- Do you want to require others to share any uses of your project in an open way?
These two categories are “permissive” and “copyleft” licenses. Common permissive licenses include the MIT License and BSD 3-Clause License. The GNU General Public License v3.0 (or GNU GPLv3) License is a common copyleft license.
Most research software uses permissive licenses like the:
- BSD 3-Clause License which includes a specific clause preventing the names of creators/contributors from being used to endorse or promote derivatives, without permission,
- MIT License, or the
- Apache License 2.0.
In addition, when working on a project with others or as part of a larger effort, you should check if your collaborators have already determined an appropriate license; for example, on work funded by a grant, a particular license may be mandated by the proposal/agreement.
Create a LICENSE
file using
$ touch LICENSE
and copy the exact text of the license you chose, modifying only the year and names.
For instance, for the BSD 3-Clause License:
BSD 3-Clause License
Copyright (c) [year], [fullname]
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
That’s it!
Do not write your own license!
You should never try to write your own software license, or modify the text of an existing license.
Although we are not lawyers, the licenses approved and maintained by the Open Source Initiative have gone through a rigorous review process, including legal review, to ensure that they are both consistent with the Open Source Definition and also are legally valid.
Adding a license badge to README
Badges are a fun and informative way to quickly show information about your software package in the README. Shields.io is a resource for generating badge images in SVG format, which can easily be added to the top of your README as links pointing to more information.
Using Shields.io (or an example found elsewhere), generate the Markdown syntax for adding a badge describing the BSD 3-Clause License we chose for this example package.
Solution
The Markdown syntax for adding a badge describing the BSD 3-Clause License is:
[![License](https://img.shields.io/badge/license-BSD-green.svg)](https://opensource.org/licenses/BSD-3-Clause)
Keeping a CHANGELOG
Over time, our package will likely evolve, whether through bug fixes, improvements, or
feature changes. For example, the rescale
function in our package does not have a way
of properly treating cases where the max and min of the array are the same (i.e., when
the array holds the same number repeated). For example:
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
a = 2 * np.ones(5)
rescale(a)
gives
rescale.py:11: RuntimeWarning: invalid value encountered in divide
output_array = (input_array - low) / (high - low)
array([nan, nan, nan, nan, nan])
This is probably not the desired output; instead, let’s say we want to rescale all the values in this array to 1. We can modify the function to properly handle this situation:
def rescale(input_array):
"""Rescales an array from 0 to 1.
Takes an array as input, and returns a corresponding array scaled so that 0
corresponds to the minimum and 1 to the maximum value of the input array.
"""
low = np.min(input_array)
high = np.max(input_array)
if np.allclose(low, high):
output_array = input_array / low
else:
output_array = (input_array - low) / (high - low)
return output_array
Now, when we call rescale (no need to reinstall or upgrade the package, since we previously installed using editable mode):
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
a = 2 * np.ones(5)
rescale(a)
we get the desired behavior:
array([1., 1., 1., 1., 1.])
Great! Let’s commit that change using Git, with a message and perhaps update the version to 0.1.1 to indicate the package has changed (more to come on that in a later episode on versioning).
That may be enough for us to record the change, but how will a user of your package know that the functionality has changed? It’s not exactly easy to hunt through Git logs and try to find which commit message(s) align with the changes since the last version.
Instead, we can keep a changelog in a CHANGELOG.md
file,
also at the top level of your package’s directory. In this Markdown-formatted file, you should
record major changes to the package made since the last released version. Then, when you decide
to release a new version, you add a new section to the file above this list of changes.
Changes should be grouped together based on the type; suggestions for these come from the Keep a Changelog project by Olivier Lacan:
Added
for new features,Changed
for changes in existing functionality,Deprecated
for soon-to-be removed features,Removed
for now-removed features,Fixed
for any bug fixes, andSecurity
in case of vulnerabilities.
For example, our initial release was version 0.1.0, and we have now changed the functionality. Our CHANGELOG should look something like:
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased
### Changed
- rescale function now scales constant arrays to 1
## [0.1.0] - 2022-08-09
### Added
- Created rescale() function and released example_package_YOUR_USERNAME_HERE
If at this point you want to increment the version to 0.1.1 to indicate this small fix to the behavior, you would add a new section for this version:
## Unreleased
## [0.1.1] - 2022-08-10
### Changed
- rescale function now scales constant arrays to 1
## [0.1.0] - 2022-08-09
### Added
- Created rescale() function and released example_package_YOUR_USERNAME_HERE
Note that the version numbers are shown as links in these examples, although the links are not included in the file snippets. You should add definitions of these links at the bottom of the file, using (for example) GitHub’s ability to compare between tagged versions:
[0.1.1]: https://github.com/<username>/example-package-YOUR-USERNAME-HERE/compare/v0.1.1...v0.1.0
Other locations
Once we have things inside the
docs/
folder,docs/changelog.md
is also a good place, and keeps your outer directory a bit cleaner. But we’ll put it here for now so that we don’t interfere with that lesson.
Automating Changelog Management
There are several tools which are intended to help manage a changelog. Broadly speaking, they use specifically formatted commit messages to generate the changelog. They usually impose a specific way of working, require consistency and discipline when writing the messages, often need manual tweaking of the changelog after generation.
commitizen
, “conventional commits” and SemVer
commitizen
is a release management tool which helps developers to write conventional commits and can generate a grouped and sorted changelog (commitizen changelog
). Generating a new release, tagging it with an updated version number, and generating a changelog can be automated using GitHub Actions.This requires:
- squashing each change into a single commit,
- consistency and discipline when writing commit messages.
GitHub
GitHub can “generate release notes” with each release of code, which is a list of the titles of the pull requests included in the release. People can view the release notes on GitHub.
These release notes:
- only appear on GitHub,
- aren’t automatically grouped and sorted,
- only include changes which were part of a pull request title.
Additional files for Git
At this point, your package has most of the supplemental files that it needs to be shared with the world. However, there are some additional files you can add to help with your Git workflow.
.gitignore
After adding and committing the files above, you might have noticed that git status
points out a few files/directories that you do not want it to track:
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
__pycache__/
src/example_package_YOUR_USERNAME_HERE/__pycache__/
tests/__pycache__/
nothing added to commit but untracked files present (use "git add" to track)
Fortunately, you can instruct Git to ignore these files, and others that you will never
want to track, using a .gitignore
file, which goes at the main directory level of your package.
This file tells Git to ignore either specific files or directories, or those that match a certain
pattern via the wildcard character (e.g., *.so
). The Git reference manual
has very detailed documentation of possible .gitignore
file syntax, but for convenience GitHub maintains a
collection of .gitignore
files for various languages
and tools.
For your project, you should copy or download the Python-specific .gitignore
file file into a local .gitignore
:
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
You can see that a few patterns are commented out, and can be uncommented if they apply to your project and/or workflow. You can also clean up sections of the file that do not apply to your situation, but there’s no real need to do so since you likely won’t look at this file again.
Once you have added this to the top level of your project (alongside the .git
directory),
and told Git to track it (git add .gitignore
, git commit -m 'adds gitignore file'
),
Git will automatically begin following your rules:
$ git status
On branch main
nothing to commit, working tree clean
Summary
At this point, if you have added all of these files, your package’s file structure should look something like this:
.
├── .git/
├── .venv/
├── .gitignore
├── docs/
├── pyproject.toml
├── src/
│ └── example_package_YOUR_USERNAME_HERE/
│ │ ├── __init__.py
│ │ └── rescale.py
└── tests/
└── test_rescale.py
The .git
and .venv
directories would have been automatically generated
by Git and Virtualenv respectively. You may also see additional directories
like __pycache__
and .pytest_cache
.
Key Points
Packages should include a README, LICENSE, and CHANGELOG.
Choose an existing software license
You can also include
.gitignore
to avoid from committing non-source files.
Metadata
Overview
Teaching: 10 min
Exercises: 5 minQuestions
What metadata is important to add to your package?
How to I add common functionality, like executable scripts?
Objectives
Learn about the project table.
In a previous lesson, we left the metadata in our project.toml
quite minimal, just:
- a name and
- a version.
There are quite a few other fields that can really help your package on PyPI, however. We’ll look at them, split into categories:
- Informational: author, description, URL, etc.
- Functional: requirements, tool configurations etc.
There’s also a special dynamic
field that lets you list values
that are going to come from some other source.
Informational metadata
Name
Required. .
, -
, and _
are all equivalent characters, and may be normalized to _
. Case is unimportant. This is the only field that must exist statically in this table.
name = "some_project"
Version
Required. Many backends provide ways to read this from a file or from a version control system, so in those cases you would add "version"
to the dynamic
field and leave it off here.
version = "1.2.3"
version = "0.2.1b1"
Description
A string with a short description of your project.
description = "This is a very short summary of a very cool project."
Readme
The name of the readme. Most of the time this is README.md
or README.rst
, though there is a more complex mechanism if a user really desires to embed the readme into your pyproject.toml
file (you don’t).
readme = "README.md"
readme = "README.rst"
Authors and maintainers
This is a list of authors (or maintainers) as (usually inline) tables. A TOML table is very much like a Python dict.
authors = [
{name="Me Myself", email="email@mail.com"},
{name="You Yourself", email="email2@mail.com"},
]
maintainers = [
{name="It Itself", email="email3@mail.com"},
]
Note that TOML supports two ways two write tables and two ways to write arrays, so you might see this in a different form, but it should be recognizable.
Keywords
A list of keywords for the project. This is mostly used to improve searchability.
keywords = ["example", "tutorial"]
URLs
A set of links to help users find various things for your code; some common ones are Homepage
, Source Code
, Documentation
, Bug Tracker
, Changelog
, Discussions
, and Chat
. It’s a free-form name, though many common names get recognized and have nice icons on PyPI.
# Inline form
urls."Source Code" = "https://github.com/<your github username>/example-package-YOUR-USERNAME-HERE"
# Sectional form
[project.urls]
"Source Code" = "https://github.com/<your github username>/example-package-YOUR-USERNAME-HERE"
Classifiers
This is a collection of “classifiers”. You select the classifiers that match your projects from https://pypi.org/classifiers/. Usually, this includes a “Development Status” to tell users how stable you think your project is, and a few things like “Intended Audience” and “Topic” to help with search engines. There are some important ones though: the “License” (s) is used to indicate your license. You also can give an idea of supported Python versions, Python implementations, and “Operating System”s as well. If you have statically typed Python code, you can tell users about that, too.
[project]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: BSD License",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Information Analysis",
"Topic :: Scientific/Engineering :: Mathematics",
"Topic :: Scientific/Engineering :: Physics",
"Typing :: Typed",
"Private :: Do Not Upload",
]
Prevent Inadvertent Publishing
By adding the “Private :: Do Not Upload” classifier here, we ensure that the package will be rejected when we try to upload it to PyPI. If you want to upload to PyPI, you will need to remove that classifier.
License
There are three ways to include your license:
- The preferred way to include a standard license
is to include a classifier starting with “License ::”,
[project] classifiers = [ "License :: OSI Approved :: BSD License", ]
- The other way to include a standard license is to
put its name in the
license
field:[project] license = {text = "MIT License"}
- You may also put the license in a file named
LICENSE
orLICENSE.txt
and link it in thelicense
field:[project] license = {file = "LICENSE"}
If you do this, after the
build
step, verify the contents of your SDist and Wheel(s) manually to make sure the license file is included, because some build backends may not support including the license using this field.tar -tvf dist/example_package_YOUR_USERNAME_HERE-0.1.1.tar.gz unzip -l dist/example_package_YOUR_USERNAME_HERE-0.1.1-py2.py3-none-any.whl
Functional metadata
The remaining fields actually change the usage of the package.
Requires-Python
This is an important and sometimes misunderstood field. It looks like this:
requires-python = ">=3.8"
Pip will check if the Python version of the environment where the package being installed passes this expression. If it doesn’t, pip will start checking older versions of the package until it finds one that passes. This is how pip install numpy
still works on Python 3.7, even though NumPy has already dropped support for it.
You need to make sure you always have this line and that it stays accurate, since you can’t edit metadata after releasing - you can only yank or delete release(s) and try again.
Upper caps
Upper caps (like
">=3.8,<4
or"~=3.8"
) are generally discouraged in the Python ecosystem, but they are broken (even more than usual) when used withrequires-python
field. This field was added to help users drop old Python versions, and the idea it would be used to restrict newer versions was not considered. The above field is not the right one to set an upper cap! Never upper cap this field and instead use classifiers to tell users what versions of Python your code was tested with.
Dependencies
Your package likely will need other packages to run. You can add dependencies on other packages like this:
[project]
...
dependencies = [
"numpy",
]
Sometimes you have dependencies that are only needed some of the time. These can be specified as optional dependencies. Unlike normal dependencies, these are specified in a table, with the key being the option you pass to pip to install it. For example:
[project.optional-dependenices]
test = ["pytest>=6"]
check = ["flake8"]
plot = ["matplotlib"]
Now you can run:
pip install --editable '.[test,check]'
or – once it’s published,
pip install 'example-package-YOUR-USERNAME-HERE[test,check]'
– and pip will install both the required and optional dependencies pytest
and flake8
,
but not matplotlib
.
Setting minimum, maximum and specific versions of dependencies
Whether you set versions on dependencies depends on what sort of package you are working on:
- Library: something that can be imported. Support the widest range you are able to test. If you have very few users, requiring recent versions of dependencies is probably fine. If you expect a large number of users, you should support (and test!) a few past releases of dependencies.
- App (CLI/GUI/TUI): Something that is installable but not importable. Wide range preferred, but not as important (assuming users use pipx and don’t add it to “dev” environments or things like that).
- Application (deployment): Something like a website or an analysis. Not installable. Versions should be fully locked.
You can set ranges on your dependencies by specifying the ranges in the pyproject.toml
file:
dependencies = [
"tqdm", # no specified range
"numpy>=1.18", # lower cap
"matplotlib<4.0", # upper cap
"pandas>1.4.2,<=3.0", # lower and upper caps
"seaborn==0.13.2", # specific version
]
If you have a range of versions supported, you should ideally run your tests at least once with the minimum versions of your dependencies. You can do this using:
- a
constraints.txt
file, which specifies pins on the minimum versions of all your dependencies, and then usepip install --constraint constraints.txt --editable .
before running your tests . - a tool like
uv
which supports installing with the lowest possible dependencies like this:uv pip install --resolution=lowest .
before running your tests
Adding an “upper cap” like "numpy>=1.18,<2.0"
is only recommended if you
are fairly sure the next version will break your usage of the library.
For more information, see this article on
bound version constraints.
There are several ways to lock your dependencies completely:
- specify the versions in the
pyproject.toml
file, or - allow ranges in the
pyproject.toml
file and- specify the versions for your development environment manually using
pip-tools
, or - using a locking package manager like PDM or Poetry. (Just add the lockfiles generated by the tools to your repository, and they should be used automatically. There is usually an update command that will update the local environment and the lock file.)
- specify the versions for your development environment manually using
Pinning Dependencies with
pip-tools
Set up a
requirements.in
file with your unpinned dependencies. For example:# requirements.in packaging
Now, run pip-compile from pip-tools on your requirements.in to make a requirements.txt:
pipx run --spec pip-tools pip-compile requirements.in --generate-hashes
This will produce a
requirements.txt
with fully locked dependencies, including hashes. You can always regenerate it when you want updates.
project.dependencies vs. build-system.requires
What is the difference between
project.dependencies
vs.build-system.requires
?Answer
build-system.requires
describes what your project needs to “build”, that is, produce an SDist or wheel. Installing a built wheel will not install anything frombuild-system.requires
, in fact, thepyproject.toml
is not even present in the wheel!project.dependencies
, on the other hand, is added to the wheel metadata, and pip will install anything in that field if not already present when installing your wheel.
Entry Points
A Python package can have entry points. There are three kinds: command-line entry points (scripts
), graphical entry points (gui-scripts
), and general entry points (entry-points
). As an example, let’s say you have a main()
function inside __main__.py
that you want to run to create a command project-cli
. You’d write:
[project.scripts]
project-cli = "project.__main__:main"
The command line name is the table key, and the form of the entry point is package.module:function
. Now, when you install your package, you’ll be able to type project-cli
on the command line and it will run your Python function.
Dynamic
Fields can be specified dynamically by your build backend. You specify fields to populate dynamically using the dynamic
field.
For example, if you want hatchling
to read __version__.py
from src/package/__init__.py
:
[project]
name = "example-package-YOUR-USERNAME-HERE"
dynamic = ["version"]
[tool.hatch]
version.path = "src/example_package_YOUR_USERNAME_HERE/__init__.py"
All together
Now let’s take our previous example and expand it with more information. Here’s an example:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "example-package-YOUR-USERNAME-HERE"
version = "0.1.1"
dependencies = [
"numpy"
]
authors = [
{ name="Example Author", email="author@example.com" },
]
description = "A small example package"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Private :: Do Not Upload",
]
[project.optional-dependencies]
test = ["pytest"]
[project.urls]
"Homepage" = "https://<your github username>.github.io/example-package-YOUR-USERNAME-HERE"
"Source Code" = "https://github.com/<your github username>/example-package-YOUR-USERNAME-HERE"
Add metadata and check it.
Take your existing package and add more metadata to it. Install it, then use
pip show -v <package>
to see the metadata. You can also look inside the wheel or SDist to see the metadata.Solution
pip install -e . pip show -v example-package-YOUR-USERNAME-HERE
Key Points
Add informational metadata to tell people about your package.
Add functional metadata to tell people how to install and use your package.
Versioning
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do you choose your versions?
How do you set version limits on dependencies?
How do you set a version?
Objectives
Know the different versioning schemes and drawbacks
Know how to set a version in a package
Versioning is a surprisingly deep topic with some strong opinions floating around. We will look at a couple of popular versioning schemes. We will also discuss how you should use the versions your dependencies provide.
Then we will cover some best practices in setting your own version, along with a few tips for a good changelog. You’ll learn the simplest way, along with some advanced technique to single-source your version, including using your VCS’s tags as versions. This makes mistakes during releases harder and to allow every single commit to come with a unique version.
Versioning schemes
There are three commonly used versioning schemes that you will usually select from. Most packages follow one of these with some variations. A few packages have custom schemes (LaTeX’s pi based scheme comes to mind!), but this covers almost all software.
SemVer: Semantic Versioning
SemVer lays out a set of rules, summarized here, based on a version of the form
<major>.<minor>.<patch>
. These are:
- If your API changes in a breaking way, the
major
number must be incremented. - If you add API but don’t break existing API, you can increment the
minor
version. - If you only fix bugs, you can increment the
patch
version.
And obviously set the smaller version values to zero when you increment a larger one.
This seems simple, but what is a breaking change? The unintuitive answer is that it depends on how many users you have. Every change could be a breaking change to someone if you have enough users (at least in a language like Python where everything can be accessed if you try hard enough). The closer you try to follow “true” SemVer, the more major releases you will have, until you’ve lost the usefulness of SemVer and every release is basically a major release. An example of a package that has a lot of users and tries to follow SemVer pretty closely is Setuptools, which is on version 68 as of the time of writing. And minor/patch release still break some users.
Avoiding breakage
You can’t be sure that a minor a patch release will not break you. No library can follow SemVer close enough to be “true” SemVer unless they only release major versions. “They didn’t follow SemVer closely enough” is not a good argument for any package causing breakage.
A more realistic form of SemVer, and a better way to think about it, is as an abbreviated changelog and author intent. In this form:
- Bumping a
patch
version means there’s nothing to see, just fixing things, you are probably fine. - Bumping a
minor
version means that there might be interesting things to see, but nothing you have to see. - Bumping a
major
version means you really should look, changes might even be needed for users.
If a release breaks your code, you are more likely to be able to get a followup patch release fixing your use case if it’s a patch or minor version. If you view SemVer this way, a lot less grief will occur from broken expectations. Also, changelogs are really important, so having a way to notify people when there’s things to look at is really useful.
A final variation that is very common in the Python community is a deprecation cycle addition. This says that a feature that is deprecated in a minor version can be removed in a minor version after a certain time or number of minor versions. Python itself uses this (due partially to an extreme aversion to ever releasing another major version), along with NumPy and others. Python’s deprecation cycle is two releases (3.12 deprecations can be removed in 3.14), and NumPy’s is three.
Getting deprecation warnings
In Python, the deprecation warnings are
PendingDeprecationWarning
,DeprecationWarning
, andFutureWarning
, in that order. The first two are often not shown by default, so always run your test suite with full warnings as errors enabled, so that you are able to catch them. And if it’s your library, doing at least one final release withFutureWarning
is a good idea as it’s not hidden by default.
Some suggestions for how you depend on packages will be given later, but let’s look at the other two common mechanisms first.
Python only
Discussions of SemVer need to be centered on Python and the Python community. Other Languages have different communities and different tools. For example, JavaScript’s NodeJS supports each dependency getting their own copy of their dependency. This makes the ecosystem’s expectations of pinning totally different.
ZeroVer
Sometimes packages start with a 0
; if they do that, they are using a modified
version of SemVer, sometimes called ZeroVer. In this, the first non-zero digit
is a major version, and the second non-zero value is somewhere between a minor
and patch. There are several reasons for this version scheme; sometimes it is to
indicate the package is still under heavy development. Though, in practice,
selecting a point to release “1.0” can be quite difficult; a user’s expectation
(this is stable now) and the developers expectation (some huge new feature added
/ rewrite, etc) are completely at odds. This has led some very popular projects to
still be using ZeroVer.
CalVer: Calendar based versioning
Another scheme growing in popularity in the Python community is the CalVer
scheme. This sets the version number based on release date. There are several
variations; some projects literally place the date (two or four digit year
followed by month then day), and some blend a little bit of SemVer in by making
the second or third digit SemVer-like. An example of a popular (non-Python)
project using CalVer is Ubuntu - the version (like 22.04) is the release year
and month. Pip uses a blended CalVer; Pip releases three major versions per year
like 23.1.0
; the final release of the year is 23.3.patch
. Those these are
technically major versions, they tend to try to do larger changes between years.
CalVer does two things well; it communicates how old a release is without having
to check release notes, and it allows deprecation cycles (remember we said those
were important in Python!) to be expressed in terms of time - if you want a one
year deprecation cycle, you know exactly what release the item will be
removed/changed in. It also helps clearly communicate the problem with SemVer -
you don’t know when something will make a breaking release for you, so don’t
predict / depend on something breaking. You might be tempted to set package<2
,
but you should never be tempted to set package<24
- it’s not SemVer! (See the
section on locking!)
Due to this, several core Python packaging projects (like pip, packaging, and
virtualenv) use CalVer. CalVer is especially useful for something that talks to
external services (like pip). Another popular library using it is attrs
.
Setting a version on your package
When you start a project, select one of the major versioning schemes.
Two locations
The “standard” way to set a version is by setting it manually in your pyproject.toml
:
[project]
version = "1.2.3"
And placing it in your source code accessible as __version__
on your package.
You don’t have to do that, since a user can use
importlib.metadata.version("example-package-YOUR-USERNAME-HERE")
to get the version, but this is a
very common practice and is useful if there is an issue and you need a hard copy
of the version in the source for an improperly packaged file.
If you use this method, you should probably set up some automatic way to bump the version in both places, or some sort of check/test that verifies they are in sync.
Single location (hatchling)
Most build backends provide methods to get the version from your source code. In hatchling, it looks like this:
[project]
dynamic = ["version"]
[tool.hatch]
version.path = "src/example_package_YOUR_USERNAME_HERE/__init__.py"
VCS versioning (hatchling)
Technically, there’s another source for the version: git tags. A tag is a marker for a particular commit in a git history.
You can create a git tag on your last commit using git tag <version>
:
git tag v0.1.0
git
Tags and BranchesA tag is like a branch but it doesn’t move when you make a new commit.
You can check out a particular tagged version using
git checkout
:git checkout v0.2.0
but you will need to checkout a branch before you commit again, e.g.
git checkout main
Some backends provide ways to use these as the single version source. This also means every commit gets a unique version, since “commits past tag” is usually added to the version number.
In hatchling, it looks like this:
[build-system]
requires = ["hatchling", "hatch-vcs"]
[project]
dynamic = ["version"]
[tool.hatch]
version.source = "vcs"
build.hooks.vcs.version-file = "src/example_package_YOUR_USERNAME_HERE/_version.py"
When you run pip install --editable .
, or build your package for distribution,
a new file src/example_package_YOUR_USERNAME_HERE/_version.py
will be created.
You can use the __version__
from that file in your __init__.py
file like this:
# src/example_package_YOUR_USERNAME_HERE/__init__.py
from ._version import __version__
… allowing users to call
import example_package_YOUR_USERNAME_HERE
example_package_YOUR_USERNAME_HERE.__version__ # 'v0.1.0'
Ensure the _version.py
file is not stored in the repository by adding it to the .gitignore
file:
# Ignore dynamic version file
src/example_package_YOUR_USERNAME_HERE/_version.py
Version Number in git archive
If you also want git tarballs to contain the version, add the following to your .gitattributes
file:
.git_archival.txt export-subst
And the following .git_archival.txt
file:
node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
Now git archive
’s output (like the tarballs GitHub provides) will also include
the version information (for recent versions of git)!
git archive --output=./package_archive.tar --format=tar HEAD
mkdir extract && tar -xvf package_archive.tar -C extract
cat extract/.git_archival.txt
Add a versioning system
Add one of the two single-version systems listed above to your package.
Key Points
Packages should have a version attribute
Semantic versioning is an abbreviated changelog, not the solution to all problems
You can use packaging tools so that the version number needs updating in one (and only one) place
Publishing package and citation
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do I publish a package?
How do I make my work citable?
Objectives
Learn about publishing a package on PyPI
Learn about making work citable
If you want other people to be able to access your package, you need to publish it.
In this episode we’ll:
- investigate the formats you can use to share your package: source distribution and wheel,
- publish those on the Python Package Index (PyPI),
- add automation to your project to make publishing easier,
- share the package on Zenodo to get a Digital Object Identifier so that your work is more easily citeable.
Formats
One “format” is available as soon as you have pushed your work to a platform like GitHub:
- repository:
- Contains all of your repository, including tests, metadata, and its history,
- Requires your build backend (hatchling, in this case) to build.
- The source is distributed in the form of a repository accessible to the user.
- Installation:
pip install git+https://git.example.com/MyProject
wheregit+
tellspip
how to interpret the repository. - You can also specify particular revisions, like:
pip install git+https://git.example.com/MyProject.git@v1.0
pip install git+https://git.example.com/MyProject.git@main
pip install git+https://git.example.com/MyProject.git@da39a3ee5e6b4b0d3255bfef95601890afd80709
pip install git+https://git.example.com/MyProject.git@refs/pull/123/head
If you want to share your work with a wider audience, there are two major formats used to publish python packages:
- source distribution (
sdist
for short):- Contains most of your repository, including tests, and
- Requires your build backend (hatchling, in this case) to build.
- wheel:
- The wheel is “built”, the
contents are ready to unpack into standard locations (usually site-packages),
and does not contain configuration files like
pyproject.toml
. - Usually you do not include things like tests in the wheel.
- Wheels also can contain binaries for packages with compiled portions.
- The wheel is “built”, the
contents are ready to unpack into standard locations (usually site-packages),
and does not contain configuration files like
Building SDists and wheels
You can build an SDist and a wheel (from that SDist) with pipx
and the build
package:
pipx run build
The module is named build, so python -m build
is how you’d run it from
a task runner like nox or hatch.
The executable is actually named pyproject-build
, since installing a build
executable would likely conflict with other things on your system.
This produces the wheel and sdist in ./dist
.
You can validate the files generated using
pipx run twine check dist/*
Conda
Building for conda is quite different. If you just have a pure Python package, you should just use pip to install in conda environments until you have a conda package that depends on your package and wants to add it into it’s requirements.
If you do need to build a conda package, you’ll need to either propose a new recipe to conda-forge, or set up the build infrastructure yourself and publish to an anaconda.org channel.
Manual publishing
Do you need to publish to PyPI?
Not every package needs to go on PyPI. You can pip install directly from git, or from a URL to a package hosted somewhere else, or you can set up your own wheelhouse and point pip at that. Also an “application” like a website or other code you deploy probably does not need to be on PyPI.
You can publish files manually with twine
:
pipx run twine upload -r testpypi dist/*
The -r testpypi
tells twine to upload to TestPyPI instead of the real PyPI -
remove this if you are not in a tutorial.
To run this locally, you’ll also need to setup an API token to upload the package with. Create a token at https://test.pypi.org/manage/account/.
However, the best way to publish is from CI. This has several benefits: you are always in a clean checkout, so you won’t accidentally include added or changed files, you have a simpler deployment procedure, and you have more control over who can publish in GitHub.
Building in GitHub Actions
GitHub Actions can be used for any sort of automation task, not just building tests. You can use it to make your releases too! Combined with the version control feature from the previous lesson, making a new release can be a simple procedure.
Let’s first set up a job that builds the file in a new workflow:
# .github/workflows/publish.yml
on:
workflow_dispatch:
release:
types:
- published
This has two triggers. The first, workflow_dispatch
, allows you to manually trigger the
workflow from the GitHub web UI for testing. The second will trigger whenever you make a
GitHub Release, which will be covered below. You might want to add builds for your main branch,
as well. We will make sure uploads to PyPI only happen on releases later.
Now, we need to set up the builder job:
jobs:
dist:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Build SDist & wheel
run: pipx run build
- uses: actions/upload-artifact@v4
with:
path: dist/*
We’ve seen the setup before. We are calling the job dist
, using an Ubuntu
runner, and checking out the code, including the git history so the version can
be computed with fetch-depth: 0
(which can be removed if you are not using git
versioning).
Test and upload action
There’s a great action for building and inspecting a pure Python package:
- uses: hynek/build-and-inspect-python-package@v2
This action builds, runs various checkers, then uploads the package to
Packages
. If you use this, you’ll need to download the artifact fromname: Packages
.
The next step builds the wheel and SDist. Pipx is a supported package manager on all GitHub Actions runners.
The final step uploads an Actions “artifact”. This allows you to download the
produced files from the GitHub Actions UI, and these files are also available to
other jobs. The default name is artifact
, which is as good as any other name
for the moment.
We could have combined the build and publish jobs if we really wanted to, but they are cleaner when separate, so we have a publish job as well.
publish:
needs: [dist]
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'published'
steps:
- uses: actions/download-artifact@v4
with:
name: artifact
path: dist
- uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
This job requires that the previous job completes successfully with needs:
. It
has an if:
block as well that ensures that it only runs when you publish. Note
that Actions usually requires ${{ ... }}
to evaluate code, like
github.event_name
, but blocks that always are evaluated, like if:
, don’t
require manually wrapping in this syntax.
Then we download the artifact. You need to tell it the name:
to download
(otherwise it will download all artifacts into named folders). We used the
default artifact
so that’s needed here. We want to unpack it into ./dist
, so
we set the path:
to that.
Finally, we use the PyPA’s publish action. You will need to go to PyPI and tell
it where you are publishing from so that the publish can happen via PyPI’s
trusted publishers. We are using Test PyPI for this exercise - remove the
with:
block to publish to real PyPI.
Making a release
A release on GitHub corresponds to two things: a git tag, and a GitHub Release. If you create the release first, a lightweight tag will be generated for you. If you tag manually, remember to create the GitHub release too, so users can see the most recent release in the UI and will be notified if they are watching your releases.
Click Releases -> Draft a new release. Type in or select a tag; the recommended
format is v1.2.3
; that is, a “v” followed by a version number. Give it a
title, like “Version 1.2.3”; keep this short so that it will be readable on the
web UI. Finally, fill in the description (there’s an autogenerate button that
might be helpful).
When you release, this will trigger the GitHub Action workflow we developed and upload your package to TestPyPI!
Digital Object Identifier (DOI)
You can add a repository to https://zenodo.org to get a DOI once you publish. Follow the instructions in the GitHub Documentation.
To test the functionality, you can use the Zenodo Sandbox.
The CITATION.cff file
From https://citation-file-format.github.io/:
CITATION.cff
files are plain text files with human- and machine-readable citation information for software (and datasets). Code developers can include them in their repositories to let others know how to correctly cite their software.
This file format is becoming a de-facto standard, and is supported by GitHub, Zenodo and Zotero.
The CITATION.cff file looks like this:
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Druskat
given-names: Stephan
orcid: https://orcid.org/1234-5678-9101-1121
title: "My Research Software"
version: 2.0.4
identifiers:
- type: doi
value: 10.5281/zenodo.1234
date-released: 2021-08-11
You can validate your file by running:
pipx run cffconvert --validate
Key Points
CI can publish Python packages
Tagging and GitHub Releases are used to publish versions
Zenodo and CITATION.cff are useful for citations
Documentation Overview
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How do I document my project?
Objectives
Learn how to set up documentation
Documentation used to require learning RestructureText (sometimes referred to as ReST / RST), but today we have great choices for documentation in markdown, the same format used by GitHub, Wikipedia, and others. You should select one of the two major documentation toolchains, sphinx or mkdocs.
The following episodes cover each of those.
Key Points
Sphinx or MkDocs are both good for documentation
Documentation with Sphinx
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do I document my project?
Objectives
Learn how to set up documentation
In this lesson, we’ll outline creating a documentation webpage using Sphinx.
You will:
- set up a basic documentation directory,
- create a configuration including the modern MyST plugin to get markdown support,
- start a preview server,
- create a table of contents,
- add a page,
- include part of your README.md,
- choose a theme.
Configuration
We’ll start with the built-in template. Start by creating a docs/
directory
within your project (i.e. next to src/
).
Why not Sphinx-Quickstart?
You could use sphinx-quickstart to set up a basic template if you’d like.
pipx run --spec sphinx sphinx-quickstart --no-makefile --no-batchfile --ext-autodoc --ext-intersphinx --extensions myst_parser --suffix .md docs
But this will put Restructured Text into the
index.md
file, and doesn’t really generate that much for you. You can instead add thedocs/conf.py
file yourself, which is what we’ll do here.
You first file is a configuration file, docs/conf.py
:
# docs/conf.py
project = "example"
extensions = ["myst_parser"]
source_suffix = [".rst", ".md"]
Index
And add (a correct) docs/index.md
yourself:
# package
```{toctree}
:maxdepth: 2
:hidden:
```
## Indices and tables
* {ref}`genindex`
* {ref}`modindex`
* {ref}`search`
As you add new pages, you will list them in the toctree above.
Dependencies
Add the docs dependencies to pyproject.toml
:
[project.optional-dependencies]
docs = [
"myst_parser >=0.13",
"sphinx >=4.0",
]
Preview Server
You can install these dependencies using `pip install –editable “.[docs]”.
To run the Sphinx preview server, you can install sphinx-autobuild
, then run:
sphinx-autobuild --open-browser -b html "./docs" "_build/html"
This will rebuild if you change files, as well.
Nox session
You can set up a task runner like nox to run this for you:
@nox.session(reuse_venv=True)
def docs(session: nox.Session) -> None:
"""
Build the docs. Use "--non-interactive" to avoid serving.
"""
serve = session.interactive
extra_installs = ["sphinx-autobuild"] if serve else []
session.install("-e.[docs]", *extra_installs)
session.chdir("docs")
shared_args = (
"-n", # nitpicky mode
"-T", # full tracebacks
"-b=html",
".",
f"_build/html",
*session.posargs,
)
if serve:
session.run("sphinx-autobuild", "--open-browser", *shared_args)
else:
session.run("sphinx-build", "--keep-going", *shared_args)
And you now have working docs that you can generate and view cross platform with nox -s docs
!
Read the Docs
If you want to use https://readthedocs.org to build your docs, you’ll also want the following .readthedocs.yml
:
version: 2
build:
os: "ubuntu-22.04"
tools:
python: "3.11"
sphinx:
configuration: docs/conf.py
python:
install:
- method: pip
path: .
extra_requirements:
- docs
Adding a page
Try adding a page. Remember to update your
index.md
table of contents.
Readme in docs
If you want to include your readme in your docs, you can add something like this:
```{include} ../README.md :start-after: <!-- SPHINX-START --> ```
And you use
<!-- SPHINX-START -->
to mark where you want your docs part of yourREADME.md
to start (generally after the title and badges).# Example Package YOUR USERNAME HERE <!-- SPHINX-START --> `example-package-YOUR-USERNAME-HERE` is a simple Python library that contains a single function for rescaling arrays.
Selecting a nicer theme
A really nice theme, used by PyPA projects like pip and pipx, is
furo
. To use it, add this line to yourconf.py
:html_theme = "furo"
And add
"furo"
to yourdocs
extra in yourpyproject.toml
.
Further reading
To see a more complete example, read Scientific-Python’s docs guide.
Key Points
Sphinx is great for documentation
Documentation with MkDocs Material
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do I document my project?
Objectives
Learn how to set up documentation using Material for MkDocs
In this lesson, we’ll outline creating a documentation webpage using the MkDocs framework with the Material theme.
You will:
- add the necessary dependencies to your
pyproject.toml
file, - set up a basic documentation directory,
- create a configuration which comes with a table of contents by default,
- start a preview server,
- add a page which includes a code listing for your
rescale
function.
Dependencies
Start by installing the material-mkdocs
package into you virtual environment.
Add this to pyproject.toml
:
[project.optional-dependencies]
...
docs = [
"mkdocs-material"
]
… then reinstall using pip install --editable ".[docs]"
.
doc
ordocs
The python packaging standard for the name of this extra is
doc
, whereasdocs
is ~3x more popular.
Template
Create an empty site using:
mkdocs new .
This will create files and directories as follows:
.
├─ docs/
│ └─ index.md
└─ mkdocs.yml
- The
index.md
file will contain the content on the front page of your documentation website. - The
mkdocs.yml
file is the configuration file for the documentation.
Configuration
In the mkdocs.yml
file, set the site name and add some additional lines to enable the theme:
site_name: Example Package YOUR USERNAME HERE
site_url: https://<your github username>.github.io/example-package-YOUR-USERNAME-HERE
theme:
name: material
site_url
is importantIt is important to set the
site_url
because it’s assumed to be set by a number of plugins.It’s set here to a GitHub Pages address – you can set it to
https://<your github username>.github.io/example-package-YOUR-USERNAME-HERE
or any other domain where you want to publish.
Preview
You can preview the site as you change it by running the “preview server”:
mkdocs serve
Update index.md
And add (a correct) docs/index.md
yourself:
<!--- docs/index.md -->
# Example Package YOUR USERNAME HERE
`example-package-YOUR-USERNAME-HERE` is a simple Python library that contains a single function for rescaling arrays.
## Installation
You can install the package by calling:
```bash
pip install git+https://github.com/<your github username>/example-package-YOUR-USERNAME-HERE
```
## Usage
```python
import numpy as np
from example_package_YOUR_USERNAME_HERE.rescale import rescale
# rescales over 0 to 1
rescale(np.linspace(0, 100, 5))
```
README.md
vsindex.md
Often, similar information will be contained in the repository README and the index page of the documentation – installation instructions, basic usage, licensing etc., and so it’s common to want to include (parts of) the README in the index page.
Sphinx has built-in tools to allow you to include parts of another markdown file directly, but MkDocs doesn’t.
We’d recommend writing the
index.md
andREADME.md
files separately, so that you can vary the information and instructions you present for the particular audience.For instance, someone viewing the repository can be expected to know where to download the source code from, whereas someone viewing the documentation website might not.
Add Code Reference
We’ll add a new page to the documentation with the docstrings from the package.
-
Add the
mkdocstrings
plugin tomkdocs.yml
:# mkdocs.yml plugins: - mkdocstrings
-
Include the
mkdocstrings[python]
package in thedocs
dependencies of the pyprojects.toml:# pyproject.toml [project.optional-dependencies] docs = [ "mkdocs-material", "mkdocstrings[python]" ]
-
Add a new page
docs/ref.md
with the content:# Code Reference ::: example_package_YOUR_USERNAME_HERE.rescale
- Stop the preview server using Ctrl-C
- Reinstall using
pip install --editable ".[docs]"
since we added a new dependency - Reload the documentation preview using
mkdocs serve
MkDocs automatically adds the additional page to your documentation.
Publish to GitHub Pages
To publish the documentation to GitHub pages, run:
mkdocs gh-deploy
The documentation will be made available at the URL
https://<your github username>.github.io/example-package-YOUR-USERNAME-HERE
Once this is deployed, you can add an additional URL to the pyproject.toml
file,
which will be included in the package metadata and linked to on PyPI.
# pyproject.toml
[project.urls]
Homepage = "https://<your github username>.github.io/example-package-YOUR-USERNAME-HERE"
Read The Docs
If you want to use https://readthedocs.org to build your docs, you’ll need to add a .readthedocs.yml
file. Find details at https://docs.readthedocs.io/en/stable/config-file.
Challenges
Adding a page
Try adding another page.
Further reading
- https://diataxis.fr/ outlines an excellent framework for planning documentation.
- https://squidfunk.github.io/mkdocs-material/ outlines all the options and many of the plugins available with Material for MkDocs, including syntax highlighting, and making comprehensive code listings for projects with many files.
Key Points
MkDocs is great for documentation
Checks and tests
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do you ensure your code will work well?
Objectives
Learn about the basics of setting up tests.
Learn about the basics of setting up static checks
In this episode we’ll give an introduction to setting up your project for
- tests with
pytest
, and - static checks (also known as linters, formatters, and type checkers).
Testing
The most popular testing framework is pytest, so we will focus on that. Python has a built-in framework too, but it’s really intended for Python’s own tests, and adding a dependency for testing only is fine even for the most strict no-dependencies allowed packages, since users don’t need tests. Tests should be as easy to write as possible, and pytest handles that beautifully.
Test directory
There are several options for test directory. The recommendation is /tests
(with an s
), at the root of your repository. Combined with /src/<package>
layout, you will have the best experience avoiding weird edge cases with package
importing.
User runnable tests
Tests should distributed with your SDist, but not your wheel. Sometimes, you might want some simple Tests a user can run in order to verify that their system works. Adding a
/src/<package>/tests
module using Python’sunittest
that does some very quick checks to validate the package works is fine (though it should not be your entire test suite!).
Pytest configuration
A recommended pytest configuration in your pyproject.toml
is:
[tool.pytest.ini_options]
minversion = "7.0"
addopts = ["-ra", "--showlocals", "--strict-markers", "--strict-config"]
xfail_strict = true
filterwarnings = ["error"]
log_cli_level = "info"
testpaths = [
"tests",
]
- The
minversion
will print a nicer error if yourpytest
is too old. - The
addopts
setting will add whatever you put there to the command line when you run;-ra
will print a summary “r”eport of “a”ll results, which gives you a quick way to review what tests failed and were skipped, and why.--showlocals
will print locals in tracebacks.--strict-markers
will make sure you don’t try to use an unspecified fixture.- And
--strict-config
will error if you make a mistake in your config.
xfail_strict
will change the default forxfail
to fail the tests if it doesn’t fail - you can still override locally in a specific xfail for a flaky failure.filter_warnings
will cause all warnings to be errors (you can add allowed warnings here too, see below).log_cli_level
will reportINFO
and above log messages on a failure.- Finally,
testpaths
will limitpytest
to just looking in the folders given - useful if it tries to pick up things that are not tests from other directories.
See the docs for more options.
pytest also checks the current and parent directories for a conftest.py
file.
If it finds them, they will get run outer-most to inner-most. These files let
you add fixtures and other pytest configurations (like hooks for test discovery,
etc) for each directory. For example, you could have a “mock” folder, and in
that folder, you could have a conftest.py
that has a mock fixture with
autouse=True
, then every test in that folder will get this mock applied.
In general, do not place a __init__.py
file in your tests; there’s not often a
reason to make the test directory importable, and it can confuse package
discovery algorithms. You can use pythonpath=["tests/utils"]
to allow you to
import things inside a tests/utils
folder - though many things can be added to
conftest.py
as fixtures.
Python hides important warnings by default, mostly because it’s trying to be
nice to users. If you are a developer, you don’t want it to be “nice”. You want
to find and fix warnings before they cause user errors! Locally, you should run
with -Wd
, or set export PYTHONWARNINGS=d
in your environment. The pytest
warning filter “error” will ensure that pytest
will fail if it finds any
warnings. You can list warnings that should be hidden or just shown without
becoming errors using the syntax
"<action>:Regex for warning message:Warning:package"
, where <action>
can
tends to be default
(show the first time) or ignore
(never show). The regex
matches at the beginning of the error unless you prefix it with .*
.
Static checks
In addition to tests, which run your code, there are also static checkers that look for problems or format your code without running it. While tests only check the parts of the code you write tests for, and only the things you specifically think to check, static checkers can verify your entire codebase is free of certain classes of bugs. Unlike a compiled language, like C, C++, or Rust, there is no required “compile” step, so think of this like that - an optional step you can add that can find things that don’t make sense, invalid syntax, etc.
Ruff
Ruff is a Python linter (a tool used to flag programming errors, bugs, stylistic errors and suspicious constructs) and code formatter.
Ruff has recently exploded as the most popular linting tool for Python, and it’s
easy to see why. It’s tens to hundreds of times faster than similar tools like
flake8, and has dozens of popular flake8 plugins and other tools (like isort and
pyupgrade) all well maintained and shipped in a single Rust binary. It is highly
configurable in a modern configuration format (in pyproject.toml
!). And it
supports auto-fixes, something common outside of Python, but rare in the Python
space before now.
You’ll want a bit of configuration in your pyproject.toml
:
[tool.ruff]
src = ["src"]
lint.extend-select = [
"B", # flake8-bugbear
"I", # isort
"PGH", # pygrep-hooks
"RUF", # Ruff-specific
"UP", # pyupgrade
]
To use Ruff to check your code for style problems, run:
pipx run ruff check
To use Ruff to format your code, run:
pipx run ruff format
For examples of Ruff’s formatting, see its documentation.
You can a more complete suggested config at the Scientific-Python Development Guide.
MyPy
The biggest advancement since the development of Python 3 has been the addition of optional static typing. Static checks in Python have a huge disadvantage vs. a more “production” focused language like C++: you can’t tell what types things are most of the time! For example, is this function well defined?
def bit_count(x):
return x.bit_count()
A static checker can’t tell you, since it depends on how it is called. bit_count("hello")
is an error, but you
won’t know that until it runs, hopefully in a test somewhere. However, now contrast that with this version:
def bit_count(x: int) -> int:
return x.bit_count()
Now this is well defined; a type checker will tell you that this function is
valid (and it will even be able to tell you it is invalid if you target any
Python before 3.10, regardless of the version you are using to run the check!),
and it will tell you if you try to call it with anything that’s not an int
,
anywhere - regardless if the function is part of a test or not!
You do have to add static types to function signatures and a few variable definitions (usually variables can be inferred automatically), but the payoff is well worth it - a static type checker can catch many things, and doesn’t require writing tests!
To run mypy, you can call:
pipx run mypy --python-executable .venv/bin/python .
- mypy needs the argument
--python-executable .venv/bin/python
to access to the version of the Python interpreter you are using for your project. It uses this to get access to type information for imported packages (like numpy). - You also need to give it a path to a directory containing files to check, in this case
.
.
You can learn about configuring mypy in the Scientific-Python Development Guide.
The pre-commit framework
There’s a tool called pre-commit that is used to run static checks. (Technically it can run just about anything, but it’s designed around speed and works best with checks that take under a couple of seconds - perfect for static checks.)
You can install pre-commit with pipx
, pip
, your favorite package manager, or
even run it inside nox
.
You run pre-commit like this:
pre-commit run --all-files
This runs pre-commit on all files; the default is to just check staged changes
for speed. As you might have guessed from the name, you can also make pre-commit
run as a git pre-commit hook, with pre-commit install
. You can also keep your
pre-commit config up to date with pre-commit autoupdate
.
You can add pre-commit checks inside a .pre-commit-config.yaml
file. There are some
“standard” checks most projects include:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: "v4.6.0"
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: trailing-whitespace
There are a few things to dissect here. First, we have a repos
table. This
holds a list of git repositories pre-commit will use. They each have repo
(pointing at the repo URL), a rev
, which holds a non-moving tag (pre-commit
caches environments based on this tag), and a hooks
table that holds the hooks
you want to use from that repo.
You can look at the docs (or the pre-commit-hooks.yaml
file in the repo you
are using!) to see what id
’s you can select. There are more options as well -
in fact, every pre-defined field can be overridden by providing the field when
you use the hook.
The checks above, from the first-part pre-commit/pre-commit-hooks
repo, are
especially useful in the “installed” mode (where only staged changes are
checked).
To configure Ruff within .pre-commit-config.yaml
, add the following configuration:
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.5.2"
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
- id: ruff-format
To configure mypy within .pre-commit-config.yaml
, add the following configuration:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v1.10.0"
hooks:
- id: mypy
files: src
args: []
You will need to add additional_dependencies: [numpy]
as the pre-commit mypy
runs in a separate virtual environment which doesn’t have numpy
installed.
hooks:
- id: mypy
files: src
args: []
additional_dependencies: [numpy]
You need to add any other packages that have static types to additional_dependencies: [...]
.
Going further
See the Style guide at Scientific-Python Development Guide for a lot more suggestions on static checking.
Key Points
Run tests and static checks on your codebase.
Task runners
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How can you ensure others run the same code you do?
Objectives
Use a task runner to manage environments and run code
A task runner is a tool that lets you specify a set of tasks via a common interface.
Use of a task runner is optional, but can be helpful to:
- make it easy and simple for new contributors to run things,
- make specialized developer tasks easy,
- avoid making single-use virtual environments for docs and other rarely run tasks.
Task runner preferences are subjective and diverse. Different people prefer different task runners because they are more flexible, simpler to understand, specialized to one language, general to many languages, and so on.
Examples we’ll cover include:
There are many other task runners for different languages, including:
Task Runner as Crutch
Task runners can be a crutch, allowing poor packaging practices to be employed behind a custom script, and they can hide what is actually happening.
Further reading
See the Scientific Python Development Guide page on task runners for more information.
Nox
Nox has two strong points that help with this concern. First, it is very explicit, and even prints what it is doing as it operates. Unlike the older tox, it does not have any implicit assumptions built-in. Second, it has very elegant built-in support for both virtual and Conda environments. This can greatly reduce new contributor friction with your codebase.
A daily developer is not expected to use nox for simple tasks, like running tests or linting. You should not rely on nox to make a task that should be made simple and standard (like building a package) complicated. You are not expected to use nox for linting on CI, or sometimes even for testing on CI, even if those tasks are provided for users. Nox is a few seconds slower than running directly in a custom environment - but for new users and rarely run tasks, it is much faster than explaining how to get setup or manually messing with virtual environments. It is also highly reproducible, creating and destroying the temporary environment each time by default.
Since nox is an application, you should install it with pipx
. If you use
Homebrew, you can install nox
with that (Homebrew isolates Python apps it
distributes too, just like pipx).
Running nox
If you see a noxfile.py
in a repository, that means nox is supported. You can start
by checking to see what the different tasks (called sessions
in nox) are provided
by the noxfile author. For example, if we do this on packaging.python.org
’s repository:
nox -l # or --list-sessions
Sessions defined in /github/pypa/packaging.python.org/noxfile.py:
- translation -> Build the gettext .pot files.
- build -> Make the website.
- preview -> Make and preview the website.
- linkcheck -> Check for broken links.
sessions marked with * are selected, sessions marked with - are skipped.
You can see that there are several different sessions. You can run them with -s
:
nox -s preview
Will build and start up a preview of the site.
If you need to pass options to the session, you can separate nox options with
and the session options with --
,
e.g. nox -s preview -- --quiet
to pass the --quiet
flag to the session named preview
.
Writing a Noxfile
For this example, we’ll need a minimal test file for pytest to run. Let’s make this file in a local directory:
# test_nox.py
def test_runs():
assert True
Let’s write our own noxfile. If you are familiar with pytest, this should look familiar as well; it’s intentionally rather close to pytest. We’ll make a minimal session that runs pytest:
# noxfile.py
import nox
@nox.session()
def tests(session):
session.install("pytest")
session.run("pytest")
A noxfile is valid Python, so we import nox. The session decorator tells nox that this function is going to be a session. By default, the name will be the function name, the description will be the function docstring, it will run on the current version of Python (the one nox is using), and it will make a virtual environment each time the session runs, though all of this is changeable via keyword arguments to session.
The session function will be given a nox.Session
object that has various
useful methods. .install
will install things with pip, and .run
will run a
command in a session. The .run
command will print a warning if you use an
executable outside the virtual environment unless external=True
is passed.
Errors will exit the session.
Let’s expand this a little:
# noxfile.py
import nox
@nox.session()
def tests(session: nox.Session) -> None:
"""
Run our tests.
"""
session.install("pytest")
session.run("pytest", *session.posargs)
This adds a type annotation to the session object, so that IDE’s and type
checkers can help you write the code in the function. There’s a docstring,
which will print out nice help text when a user lists the sessions. And we pass
through to pytest anything the user passes in via session.posargs
.
Let’s try running it:
nox -s tests
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests
nox > python -m pip install pytest
nox > pytest
==================================== test session starts ====================================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/henryschreiner/git/teaching/packaging
collected 1 item
test_nox.py . [100%]
===================================== 1 passed in 0.05s =====================================
nox > Session tests was successful.
You can pass arguments through to the session.run
command
by prefixing them with --
on the command line.
For instance, to pass --verbose
to pytest:
nox -s tests -- --verbose
If you have specified your test dependencies using an the test
extra,
you can install all those dependencies more simply:
@nox.session
def tests(session):
session.install(".[test]") # this installs all of the dependencies
session.run("pytest")
Virtual environments
Nox is really just doing the same thing we would do manually (and printing all the steps except the exact details of creating the virtual environment. You can see the virtual environment in
.nox/tests
! How would you activate this environment?Solution
. .nox/tests/bin/activate
Add documentation generation to your task runner (MkDocs)
Add the commands do preview and build your MkDocs documentation using
nox
.Solution
Add a session to your
noxfile.py
to generate docs:# noxfile.py import nox @nox.session() def preview_docs(session: nox.Session): """Show the documentation preview.""" session.install(".[docs]") session.run("mkdocs", "serve") @nox.session() def build_docs(session: nox.Session): """Build the documentation.""" session.install(".[docs]") session.run("mkdocs", "build")
You now have working docs that you can generate and view cross platform with
nox -s preview_docs
!
Backends
It’s possible to use different backends than venv
and pip
when running nox
.
uv
is a fast package installer and resolver, written in Rust and designed to be a
replacement for pip
. Using it can lead to enormous performance gains,
which can be useful when you create and destroy virtual environments with nox
many times per day.
- Update the top of your
noxfile.py
:# noxfile.py import nox nox.needs_version = ">=2024.3.2" nox.options.default_venv_backend = "uv"
- Install
uv
using your package manager (installation instructions).
You can also specify nox.options.default_venv_backend = "uv|virtualenv"
which will fallback to virtualenv
if uv
is not installed
Alternative backends
Try running your tests with the default
virtualenv
and theuv|virtualenv
backend.How does the execution time change?
Hatch
Hatch is a Python “Project Manager” which can:
- build packages,
- manage virtual environments,
- manage multiple python versions for a project,
- run tests using
pytest
, - run static analysis on code using
ruff
, - execute scripts with specific dependencies and python versions (this is the “task runner” part)
- publish packages to PyPI,
- help bump version numbers of packages,
- use templates to create new python projects
To initialize an existing project for hatch
,
enter the directory containing the project and run the following:
hatch new --init
This will interactively guide you through the setup process.
To run tests using hatch
, run the following:
hatch test
This will:
- create a python environment in which your tests will run, then
- run the tests.
Alongside built-in commands like test
, hatch
allows adding custom scripts.
For instance, to add an environment and scripts
for viewing and publishing the Material for MkDocs documentation,
you can add the following lines to the pyproject.toml
file:
[tool.hatch.envs.doc]
dependencies = [
"mkdocs-material",
"mkdocstrings[python]"
]
[tool.hatch.envs.doc.scripts]
serve = "mkdocs serve --dev-addr localhost:8000"
build = "mkdocs build --clean --strict --verbose"
deploy = "mkdocs gh-deploy"
This specifies a new environment doc
with the mkdocs-material
and mkdocstrings
dependencies needed, and
scripts serve
, build
and deploy
defined within that environment.
Then to view the documentation locally, run hatch run <ENV>:<SCRIPT>
, e.g.:
hatch run doc:serve
to run the preview server, or
hatch run doc:build
to build the documentation, ready for deployment.
The key benefits here are that:
- these scripts run within an isolated environment,
- the simple commands like
hatch run doc:serve
allow the developer to use arguments like--dev-addr localhost:8000
without needing to remember or think about them.
The developer must decide whether these benefits outweigh the added complexity of an additional layer of abstraction, which will hinder debugging if something goes wrong.
Key Points
A task runner makes it easier to contribute to software
Continuous Integration
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How do you ensure code keeps passing
Objectives
Use a CI service to run your tests
Developers often need to run some tasks every time they update code. This might include running tests, or checking that the formatting conforms to a style guide.
Continuous Integration (CI) allows the developer to automate running these kinds of tasks each time various “trigger” events occur on your repository. For example, you can use CI to run a test suite on every pull request.
In this episode we will set up CI using GitHub Actions:
- test the code on every pull request or merge to main,
- run those tests under multiple versions of python, on Linux, Windows and macOS.
GitHub Actions workflows
directory
GitHub Actions is made up of workflows which consist of actions.
Workflows are files in the .github/workflows
folder ending in .yml
.
Triggers
Workflows start with triggers, which define when things run. Here are three triggers:
on:
pull_request:
push:
branches:
- main
This will run on all pull requests and pushes to main. You can also specify specific branches for pull requests instead of running on all PRs (will run on PRs targeting those branches only).
Running unit tests
Let’s set up a basic test. We will define a jobs dict, with a single job named
“tests”. For all jobs, you need to select an image to run on - there are images
for Linux, macOS, and Windows. We’ll use ubuntu-latest
.
on:
pull_request:
push:
branches:
- main
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install package
run: python -m pip install -e .[test]
- name: Test package
run: python -m pytest
This has five steps:
- Checkout the source (your repo).
- Prepare Python 3.10 (will use a preinstalled version if possible, otherwise will download a binary).
- Install your package with testing extras - this is just an image that will be removed at the end of the run, so “global” installs are fine. We also provide a nice name for the step.
- Run your package’s tests.
By default, if any step fails, the run immediately quits and fails.
Running in a matrix
You can parametrize values, such as Python version or operating system. Do do this, make a strategy: matrix:
dict. Every key in that dict (except include:
and exclude
should be set with a list, and a job will be generated with every possible combination of values. You can access these values via the matrix
variable; they do not “automatically” change anything.
For example:
example:
strategy:
matrix:
onetwothree: [1, 2, 3]
name: Job ${{ matrix.onetwothree }}
would produce three jobs, with names Job 1
, Job 2
, and Job 3
. Elsewhere,
if you refer to the example
job, it will implicitly refer to all three.
This is commonly used to set Python and operating system versions:
on:
pull_request:
push:
branches:
- main
jobs:
tests:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.11"]
runs-on: [ubuntu-latest, windows-latest, macos-latest]
name: Check Python ${{ matrix.python-version }} on ${{ matrix.runs-on }}
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install package
run: python -m pip install -e .[test]
- name: Test package
run: python -m pytest
There are two special keys: include:
will take a list of jobs to include one
at a time. For example, you could add Python 3.9 on Linux (but not the others):
include:
- python-version: 3.9
runs-on: ubuntu-latest
include
can also list more keys than were present in the original
parametrization; this will add a key to an existing job.
The exclude:
key does the opposite, and lets you remove jobs from the matrix.
Other actions
GitHub Actions has the concept of actions, which are just GitHub repositories of the form org/name@tag
, and there are lots of useful actions to choose from (and you can write your own by composing other actions, or you can also create them with JavaScript or Dockerfiles). Here are a few:
There are some GitHub supplied ones:
- actions/checkout: Almost always the first action. v2+ does not keep Git history unless
with: fetch-depth: 0
is included (important for SCM versioning). v1 works on very old docker images. - actions/setup-python: Do not use v1; v2+ can setup any Python, including uninstalled ones and pre-releases. v4+ requires a Python version to be selected.
- actions/cache: Can store files and restore them on future runs, with a settable key.
- actions/upload-artifact: Upload a file to be accessed from the UI or from a later job.
- actions/download-artifact: Download a file that was previously uploaded, often for releasing. Match upload-artifact version.
And many other useful ones:
- ilammy/msvc-dev-cmd: Setup MSVC compilers.
- jwlawson/actions-setup-cmake: Setup any version of CMake on almost any image.
- wntrblm/nox: Setup all versions of Python and provide nox.
- pypa/gh-action-pypi-publish: Publish Python packages to PyPI.
- pre-commit/action: Run pre-commit with built-in caching.
- conda-incubator/setup-miniconda: Setup conda or mamba on GitHub Actions.
- ruby/setup-miniconda Setup Ruby if you need it for something.
Exercise
Add a CI file for your package.
Key Points
Set up GitHub Actions on your project
Run your tests on multiple platforms and with multiple Python versions