Background
When it comes to the scientific Python stack (numpy and friends) I often need to have a certain package installed and/or use the latest version of a package. My solution has been to use Arch Linux, which always uses the latest versions of all the Python packages. However, there are two reasons why I've found this to be not completely sufficient for my needs:
- While my code often runs on machines I control running Arch, it also often runs on machines out of my control running RHEL
- For reproducible results, it may be nice to be able to run my code with a specific (older) version of numpy, Python, matplotlib, etc
I haven't had much cause for the second reason yet, but I do see its merit. As far as the first reason, the RHEL machines I mention either come with an ancient version of Python (2.3.4) and/or don't have all the scientific Python packages installed that I need.
Conda
My solution previously has been to use EPD, the Enthought Python Distribution. I installed it into my home directory and thus could have reasonably up-to-date (albeit not always the very latest) versions of everything I needed. However, they've recently changed the way they do things and, long story short, I no longer care to use EPD.
An even better Python distribution is Anaconda. To clear up some confusion that I had, dear reader, until I read several posts on their mailing list, conda is the package manager and anaconda is a conda meta-package that pulls in tons of scientific Python packages. I won't go into the technical merits of conda, but I've used it and like it.
binstar
While anaconda has a nice list of packages in their repos, it's not exhaustive. For example, as of today there is no h5py for Python 3, although they have it for Python 2 (this should be fixed soon). Enter binstar, a place to upload binary packages for use with conda. Coming from an Arch Linux background, this reminds me of the Arch User Repository, a place where anyone (not just the Arch developers or trusted users) can upload a package. Creating a package is in the BSD ports tradition, and is very similar to an Arch PKGBUILD, although it's done in yaml instead of bash.
pyfftw
A useful package I use for my scientific Python work is pyFFTW, a nice Pythonic wrapper to the fftw library. Since this is not part of the Anaconda repos, this is where binstar steps in. I've created binstar packages for both fftw and pyfftw. Since the documentation is a little sparse, I'll detail below how I did it.
Creating the package
I used the Python 3 version of miniconda, which I thought was a good idea but it turned out using the Python 3 version instead of Python 2 made things slightly harder (see below). I used miniconda instead of anaconda since it's a smaller download and I don't need all of the packages included with the anaconda download. I installed it to my home directory. While I didn't actually add ~/miniconda3/bin to my path, I assume I do for the code examples below for compactness (e.g. I actually used ~/miniconda3/bin/conda instead of conda).
First, the conda-build package is installed in order to build packages from conda recipes:
$ conda install conda-build
The binstar package is also installed for easy uploading to binstar. However, this package is only for Python 2, not 3. So I create a new conda environment called "binstar" that uses Python 2.7 and install binstar into it:
$ conda create -n binstar python=2.7 binstar
Then I switch to that environment and authenticate to binstar:
$ source activate binstar (binstar) $ binstar login
I kept a terminal open in the "binstar" environment for later when I upload my packages, but note that to leave that environment use:
$ source deactivate
Now I'm ready to build. First I created the conda build recipe for fftw. The recipe consists of two files, meta.yaml and build.sh, in a folder I named fftw. Here they are, with most comments and unnecessary details removed (they are also in the tarball hosted at binstar in the info/recipe folder):
package:
name: fftw
version: "3.3.3"
source:
fn: fftw-3.3.3.tar.gz
url: http://www.fftw.org/fftw-3.3.3.tar.gz
md5: 0a05ca9c7b3bfddc8278e7c40791a1c2
requirements:
build:
1. system
2. chrpath
about:
home: http://fftw.org
license: GNU General Public License (GPL)
summary: 'The fastest Fourier transform in the west.'
#!/usr/bin/env bash
CONFIGURE="./configure --prefix=$PREFIX --enable-shared --enable-threads --disable-fortran"
# Single precision (fftw libraries have "f" suffix)
$CONFIGURE --enable-float --enable-sse
make
make install
# Long double precision (fftw libraries have "l" suffix)
$CONFIGURE --enable-long-double
make
make install
# Double precision (fftw libraries have no precision suffix)
$CONFIGURE --enable-sse2
make
make install
# Test suite
cd tests && make check-local
Then it's simply a matter of calling conda build fftw to build the package. For me, it was stored under $HOME/miniconda3/conda-bld/linux-64/fftw-3.3.3-0.tar.bz2, so I passed that filename to binstar in order to upload it.
Then I created a recipe for pyfftw:
package:
name: pyfftw
version: "0.9.2"
source:
fn: pyFFTW-0.9.2.tar.gz
url: https://pypi.python.org/packages/source/p/pyFFTW/pyFFTW-0.9.2.tar.gz
md5: 34fcbc68afb8ebe5f040a02a8d20d565
requirements:
build:
1. python
2. numpy
run:
1. python
2. numpy
3. fftw
test:
# Python imports
imports:
1. pyfftw
2. pyfftw.builders
3. pyfftw.interfaces
about:
home: http://hgomersall.github.com/pyFFTW/
license: GNU General Public License (GPL)
summary: 'A pythonic wrapper around FFTW, the FFT library, presenting a unified interface for all the supported transforms.'
#!/usr/bin/env bash
$PYTHON setup.py build
$PYTHON setup.py install --optimize=1
As before, I called conda build pyfftw to create the package. However, it's built in an environment using my default Python/numpy combination (3.3 and 1.8 in my case). To build it for other configurations, the conda build docs mention setting some environment variables. Here I build it for some relevant (not all) combinations:
CONDA_PY=27 CONDA_NPY=17 conda build pyfftw
CONDA_PY=27 CONDA_NPY=18 conda build pyfftw
CONDA_PY=33 CONDA_NPY=17 conda build pyfftw
CONDA_PY=33 CONDA_NPY=18 conda build pyfftw
The packages are created with the Python/numpy combination in the filename and other metadata within the tarball. I've uploaded them all to my binstar account.
To install the packages, it's as simple as
conda install -c https://conda.binstar.org/richli pyfftw
Sometimes the dependency checking will do funny things, like downgrade numpy to 1.7 even though a pyfftw version for numpy 1.8 exists. In that case, I used
conda install -c https://conda.binstar.org/richli pyfftw numpy=1.8
to make it do the right thing.
Conclusion
I really like the Arch package manager, pacman, and the AUR. Conda and binstar are similar in some ways to pacman and the AUR, respectively, although conda's use of multiple isolated environments to have arbitrary versions of packages is definitely outside the scope of pacman. Conda is not as mature as pacman and has some serious deficiencies when it comes to documentation. But it has great potential and is already an excellent way for me to use scientific Python packages outside of Arch.
Comments !