- Mon 03 July 2023
- Development
- #tools, #python, #containers
Update
I have formatted this entry into a workshop format for Som Energia, the cooperative I work with at Girona. You can find it here.
Sharpen your axe
If I had to chop down a tree in eight hours, I would spend six hours sharpening my axe.
This saying is usually attributed to Abraham Lincoln, who seemed very good at chopping down trees. The idea is that taking care of your tools will make your craft easier. This is true for any craft and specially of software development.
I wrote this as a short tutorial for a job I had. The idea was to introduce Poetry
and dependency management to the people I worked with, and to move away from the requirements.txt
file. These are the tools I use nowadays to manage my python projects and they have been working great for me. I will use Docker to show you how it all fits together, so you don't need to install anything.
Pin those dependencies
Pinned dependencies are important, as they help maintain and control functionalities of your project. Listing pinned dependencies ensures projects get outdated (i.e. they “rot”) in a manageable and understandable way.
There are some levels to how reproducible your project is:
- Mention somewhere evident what version of Python your project is supposed to work with. A badge would suffice. See for example static badges, like this one .
- Provide at least a
requirements.txt
with pinned dependencies. - Provide a
setup.py
file declaring dependencies. - Provide a
pyproject.toml
file in the best case - Use a dependency manager such as
poetry
,pipenv
orhatch
or conda. - Bundle everything in a Docker image.
- Use multistage docker images to build your project.
- Use a CI/CD tool to automate the process of building and deploying your project.
Get control of your python environments
Old is the xkcd comic about python environments. The gist is that isolated environments are important and you should use them. Even better if you know what python version you are using at all times and where is it installed. whereis python
is your friend.
Same as with dependencies, there are some levels to how isolated your python environments are:
- First level
python -m venv .venv
at your project level - Second level is to have
pipx
manage your system level utilities, i.e. normally CLI tools likecowsay
orpoetry
orblack
- Second level is to let
poetry
manage your virtual environments - Third level is to let
pyenv
manage your python versions and your environments and let poetry just do dependency management - Fourth level is some python version manager that does not use shims and does not suck but I don't know of any as of today.
- Fifth level is using Docker to build an isolated environment.
But first, what is a dependency?
- A project may depend on another library: consider for example the
pandas
library. You normally install this withpip install pandas
. That means that your project now has a dependency onpandas
. - Pandas is a very large project and it has its own dependencies too. Depending on the complexity of the project, these may be hard to understand at first. For
pandas
, this may be very well listed as anaconda dependencies. - Each library at some point grows past the point where it needs to start tracking such growth through some kind of a versioning system. This is so that users (e.g. other developers like you) can write code depending on a specific expectated behaviours.
- Sometimes a change of the version won't mean anything to your own code. Other times, however, it may introduce a change of its expected behaviour when running the same thing. This is loosely referred to as a breaking change. Consider the example for
distutils
telling its users how to migrate to new versions so their code won't break.
What is a virtual environment?
Make sure to read this https://realpython.com/python-virtual-environments-a-primer/ for more info. It's very complete and I won't try to improve on that.
Why do we need a virtual environment?
- Virtual environment are there to keep you sane.
- A virtual environment is a directory (a.k.a folder) holding a copy of python, isolated from other copies that you may have installed in your system.
- Oftentimes such an environment is installed in a folder, generally called
.venv
- Any libraries installed with
./.venv/bin/python -m pip
are linked to that version, i.e. they will be located under./.venv/lib/pythonX.Y/site-packages/
- Imagine the following weird example: You own three python projects, each one with its own dependencies
- Library A has dependency X with version 1.0.0. Running something like
python -m X.foo
will yield the integer42
- Library B has dependency Y with version 2.0.0.
- Library C has dependency X with version 3.0.0. Running something like
python -m X.foo
will yieldNone
- You have now a dependency conflict: A and C have the same dependency, but with different versions
- Library A does not support version 3.0.0 of dependency X, and vice versa
- Can’t go installing and uninstalling versions every time I want to use A or C. I expect a
42
on the first case and aNone
on the second. - Solution: install two instances of python at different locations that can coexist because their dependencies are now isolated: we have a separated version of python installed per each project
The easy win: using virtualenv
virtualenv
is the standard library. That means that without installing any other dependency, python -m venv .venv
will create a copy of python within a folder called .venv
. This folder will contain something like this:
.venv/
├── bin
├── include
├── lib
└── lib64 -> lib
And you normally "activate" it by calling the script inside /bin/activate
.
The "a bit more involved" win using pyenv
+ pipx
+ poetry
In short, it means
- Let pyenv manage versions of python. Follow their instructions to install it.
- Create a virtual environment with
pyenv virtualenv 3.9.10 <environment-name>
. - Activate it with
pyenv activate <environment-name>
. - Let pyenv automatically activate an environment after entering a folder with
pyenv local <environment-name>
. - If you are running Windows, you can use pyenv-win
- let pipx manage your system level utilities e.g.
poetry
- let poetry manage dependencies
- install it system-wide with
pipx install poetry
- poetry creates virtual environments by default, we want to deactivate it
poetry config virtualenvs.create false
poetry config virtualenvs.in-project false
A demonstration using Docker
Spoiler: you will need docker. You can get free containers using play-with-docker.
Start by launching a new container named my-python-environment
with the latest version of ubuntu and run a bash shell. The --rm
flag will remove the container after you exit it so it won't clutter your system.
docker run -it --name my-python-environment --rm ubuntu:latest bash`
You are now starting from a clean slate. Let's install some dependencies.
apt update && apt install curl git vim build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev -y
curl https://pyenv.run | bash
You will now have pyenv
installed. Let's configure it. As per the instructions, we need to add the following to our ~/.bashrc
file:
vim ~/.bashrc
Go to the end of the file and add the following:
# Load pyenv automatically by appending
# the following to
#~/.bash_profile if it exists, otherwise ~/.profile (for login shells)
# and ~/.bashrc (for interactive shells) :
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
Save and quit. Now we need to refresh our shell to load the new configuration:
exec "$SHELL"
You can now use pyenv
to install a new version of python. Let's install version 3.10.9
:
pyenv install 3.10.9
Let's make it available globally:
pyenv global 3.10.9
We can also install different versions of python and switch between them:
pyenv install 3.8.12
We have now python 3.10.9 installed globally. Let's install pipx
with it:
python3 -m pip install --upgrade pip
python3 -m pip install --user pipx
python3 -m pipx ensurepath
Again, we need to refresh our shell to load the new configurations:
exec "$SHELL"
Let's install poetry
with pipx
:
$ pipx install poetry==1.4.2 --force
And confirm it's installed with:
$ poetry --version
Poetry (version 1.4.2)
$ whereis poetry
poetry: /root/.local/bin/poetry
We want to configure poetry to not create virtual environments by default:
# configure poetry
poetry config virtualenvs.create false`
poetry config virtualenvs.in-project false`
We can create a new project with poetry named mypackage
:
# create project
mkdir ~/mypackage
cd ~/mypackage
We will now create a virtual environment for this project only
pyenv virtualenv 3.10.9 myenv
pyenv activate myenv
pyenv rehash # refresh pyenv shims
use poetry dialog to create new project
poetry init
The dialog will ask you a few questions. You can skip them by pressing enter. The dialog will then create a pyproject.toml
file. You can now manage dependencies with poetry:
poetry add pandas
and remove dependencies
poetry remove pandas
Every time you add
or remove
packages with poetry, two things happen:
- the
pyproject.toml
file is updated and a new line is added or deleted under the dependencies section - a
poetry.lock
file is created or updated, containing the exact versions of the dependencies you are using and their relations.
Let's install cowsay
with poetry
:
poetry add cowsay
poetry add --group dev black
pyenv rehash # to update shims
cowsay hola # a cow should greet you, in spanish
____
| hola |
====
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
Writing your project as a CLI with poetry
cd ~/mypackage
mkdir mypackage
touch mypackage/__init__.py
touch mypackage/cli.py
Open mypackage/cli.py
and add the following:
import cowsay
def cli():
return cowsay.cow('Hello World')
if __name__ == "__main__":
cli()
Confirm that your program works by running it with python mypackage/cli.py
:
$ python mypackage/cli.py
___________
| Hello World |
===========
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
We can update our pyproject.toml
file to include a CLI entrypoint:
[tool.poetry]
name = "mypackage"
version = "0.1.0"
description = "some description"
authors = ["Your Name <youremail@yopmail.com>"]
readme = "README.md"
packages = [{include = "mypackage"}] # new
[tool.poetry.dependencies]
python = "^3.10"
pandas = "^2.0.0"
cowsay = "^5.0"
[tool.poetry.scripts]
mypackage-cli = "mypackage.cli:cli" # new
[tool.poetry.group.dev.dependencies]
black = "^23.3.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
And now we can install our package with poetry
and run our little CLI by calling mypackage-cli
:
poetry install # install package along with its CLI
pyenv rehash # refresh python shims
mypackage-cli # launch cli
Building and publishing your project
Make sure you have a valid account at pypi. You can create one here.
Run poetry build
(at the level where a valid pyproject.toml
is present). From this point on, you have two options:
Publish to pypi
Run poetry publish --username myprivaterepo --password <password> --repository pypi
Publish to a private repository
if you wish to publish to a private repository, say, pypi.myprivaterepo.org
:
- Add repository to the
config
in poetry, withpoetry config repositories.myprivaterepo https://pypi.myprivaterepo.org/
- Careful to not add the
/simple
bit if your privare repo is usingpypiserver
, see https://github.com/pypiserver/pypiserver/issues/329#issuecomment-688883871 - now your private pypi ****repository is aliased with
myprivaterepo
Now you can run poetry publish --username myprivaterepo --password <password> --repository myprivaterepo
. See https://python-poetry.org/docs/libraries#publishing-to-pypi for more documentation
Creating your own private repository using docker compose
Launch your own pypi repository with https://github.com/pypiserver/pypiserver and https://github.com/pypiserver/pypiserver/blob/master/docker-compose.yml
Adding dependencies from private repositories to pyproject.toml
You may also want to add dependencies from private repositories. These repos normally need keys to access them. Make sure to follow the instructions from your private repository to add credentials to your pyproject.toml
file. Generally, the process is as follows:
-
Add a
source
topyproject.toml
file -
poetry source add myprivaterepo https://pypi.myprivaterepo.org
-
This should modify your
pyproject.toml
file and will add something like thistoml [[tool.poetry.source]] name = "myprivaterepo" url = "https://pypi.myprivaterepo.org/" default = false secondary = false
-
Add credentials for that repository
poetry config http-basic.myprivaterepo <user> <password>
- Add dependencies using the
--source
argument
bash
poetry add --source myprivaterepo my-private-package=0.2.1
Managing virtual environments inside docker
Last but not least. People tend to argue over such scenario. Is isolation within isolation necessary? Is a virtual environment needed inside a docker container? See
- https://github.com/python-poetry/poetry/discussions/1879#discussioncomment-346113
- https://github.com/python-poetry/poetry/pull/3209#issuecomment-710678083
Answer is “it depends”, but it gives more control over dependencies and their state. I prefer it using docker multistage builds. I will cover this in a next post hopefully soon.
Until next time!