SPQR for Python


For my work, I was working on a research question, and for part of it, I wanted to see if I could solve the matrix equation $$Ax = b.$$ Easy stuff, right? Here's the thing, my matrix $A$ is both very large (887217 by 39744) and very sparse (0.02% filled). It's overdetermined, so I want to find the least squares solution for $x$ given my large sparse $A$ and a column vector $b$.

I'm doing my prototyping work in Python, so I used the LSMR solver in SciPy. This is a nice iterative solver using a modern algorithm (it improves upon the LSQR algorithm). However, I wanted to try an alternative solver, preferably a direct (not iterative) solution.

After doing some research, I settled upon SuiteSparseQR. It uses the QR decomposition to solve my matrix equation. It's even used in Matlab, so it must be good. It has nice C and C++ interfaces.


The problem is, my current code is in Python, not C. Due to licensing problems, SPQR cannot be included in SciPy, and I didn't find any resources online that were Python bindings to SPQR (I did find one for another SuiteSparse package, but it looks pretty out-of-date). So...time to do it myself!

There is an Arch package for the whole SuiteSparse collection, so I installed that. Time to use ctypes, right? Not so fast, SuiteSparse only includes static libraries, not shared libraries.


So, in case anyone wants to do something similar, here's how I approached it. I took the example SPQR code and used it to write a function in C that wraps the SPQR solver. This is compiled to a shared library. Then a thin Python interface is used so I can call my C wrapper function from Python.

Here's the C wrapper, spqr_wrapper.c:

// C wrapper to SparseSuiteQR library et al. for Python

// We pass in the sparse matrix data in a COO sparse matrix format. Cholmod
// refers to this as a "cholmod_triplet" format. This is then converted to its
// "cholmod_sparse" format, which is a CSC matrix.

#include <stdio.h>
#include <stdlib.h>
#include "SuiteSparseQR_C.h"

void qr_solve(double const *A_data, long const *A_row, long const *A_col, size_t A_nnz, size_t A_m, size_t A_n, double const *b_data, double *x_data) {
    // Solves the matrix equation Ax=b where A is a sparse matrix and x and b
    // are dense column vectors. A and b are inputs, x is solved for in the
    // least squares sense using a rank-revealing QR factorization.
    // Inputs
    // A_data, A_row, A_col: the COO data
    // A_nnz: number of non-zero entries, ie the length of the arrays A_data, etc
    // A_m: number of rows in A
    // A_n: number of cols in A
    // b_data: the data in b. It is A_m entries long.
    // Outputs
    // x_data: the data in x. It is A_n entries long
    // MAKE SURE x_data is allocated to the right size before calling this function
    cholmod_common Common, *cc;
    cholmod_sparse *A_csc;
    cholmod_triplet *A_coo;
    cholmod_dense *b, *x;
    size_t k;
    // Helper pointers
    long *Ai, *Aj;
    double *Ax, *bx, *xx;

    /* start CHOLMOD */
    cc = &Common ;
    cholmod_l_start (cc) ;

    // Create A, first as a COO matrix, then convert to CSC
    A_coo = cholmod_l_allocate_triplet(A_m, A_n, A_nnz, 0, CHOLMOD_REAL, cc);
    if (A_coo == NULL) {
        fprintf(stderr, "ERROR: cannot allocate triplet");
    // Copy in data
    Ai = A_coo->i;
    Aj = A_coo->j;
    Ax = A_coo->x;
    for (k=0; k<A_nnz; k++) {
        Ai[k] = A_row[k];
        Aj[k] = A_col[k];
        Ax[k] = A_data[k];
    A_coo->nnz = A_nnz;
    // Make sure the matrix is valid
    if (cholmod_l_check_triplet(A_coo, cc) != 1) {
        fprintf(stderr, "ERROR: triplet matrix is not valid");
    // Convert to CSC
    A_csc = cholmod_l_triplet_to_sparse(A_coo, A_nnz, cc);

    // Create b as a dense matrix
    b = cholmod_l_allocate_dense(A_m, 1, A_m, CHOLMOD_REAL, cc);
    bx = b->x;
    for (k=0; k<A_m; k++) {
        bx[k] = b_data[k];
    // Make sure the matrix is valid
    if (cholmod_l_check_dense(b, cc) != 1) {
        fprintf(stderr, "ERROR: b vector is not valid");

    // Solve for x
    x = SuiteSparseQR_C_backslash_default(A_csc, b, cc);

    // Return values of x
    xx = x->x;
    for (k=0; k<A_n; k++) {
        x_data[k] = xx[k];

    /* free everything and finish CHOLMOD */
    cholmod_l_free_triplet(&A_coo, cc);
    cholmod_l_free_sparse(&A_csc, cc);
    cholmod_l_free_dense(&x, cc);
    cholmod_l_free_dense(&b, cc);

The SPQR solver, SuiteSparseQR_C_backslash_default() uses data structures to represent the matrices from the CHOLMOD component of SuiteSparse. My function qr_solve() has a more C-like interface where it takes in arrays of doubles and so forth (this is a COO sparse matrix format, but is converted in the function to the CSC format SPQR expects). I admit it may not be the most memory efficient since it creates new sparse matrices and copies over the data, but it gets the job done and isn't very time-intensive compared to the actual SPQR solver.

Here's my Makefile to compile it to a shared library:

CFLAGS = -m64 -Wall -Wextra -Wformat=2 -std=gnu90 -O2 -march=native
SPQRLIBS = -lspqr -lsuitesparseconfig -lcholmod -lamd -lcolamd -lm -llapack -lblas
spqr_wrapper.so: spqr_wrapper.c
    gcc $^ -o $@ $(CFLAGS) -fPIC -shared $(SPQRLIBS)
    rm -f spqr_wrapper.so

Note that SPQR takes quite a few libraries (in this order) in order to link right. But at least we now have a shared library that can work with Python.

The Python wrapper is fairly thin since it's just playing around with ctypes. I use a NumPy/ctypes function to help ensure the data from Python is compatible with what my C function expects. The Python function call also does some quick checking to make sure the inputs are the right length and so on. Here's the Python wrapper, spqr.py (note that it's for Python 3, not 2):

#!/usr/bin/env python
__author__ = 'Rich Li'
""" SuiteSparseQR Python wrapper """

import os.path
import ctypes
from ctypes import c_double, c_size_t, byref, pointer, POINTER
import numpy as np
from numpy.ctypeslib import ndpointer

# Assume spqr_wrapper.so (or a link to it) is in the same directory as this file
spqrlib = ctypes.cdll.LoadLibrary(os.path.dirname(__file__) + os.path.sep + "spqr_wrapper.so")

# Function prototypes for spqr_wrapper.so
# void qr_solve(double const *A_data, double const *A_row, double const *A_col, size_t A_nnz, size_t A_m, size_t A_n, double const *b_data, double *x_data)
spqrlib.qr_solve.argtypes = [
        ndpointer(dtype=np.float64, ndim=1, flags='C_CONTIGUOUS'), # A_data
        ndpointer(dtype=np.int, ndim=1, flags='C_CONTIGUOUS'), # A_row
        ndpointer(dtype=np.int, ndim=1, flags='C_CONTIGUOUS'), # A_col
        c_size_t,  # A_nnz
        c_size_t,  # A_m
        c_size_t,  # A_n
        ndpointer(dtype=np.float64, ndim=1, flags='C_CONTIGUOUS'), # b_data
        ndpointer(dtype=np.float64, ndim=1, flags=('C_CONTIGUOUS', 'WRITEABLE')), # x_data
spqrlib.qr_solve.restype = None

def qr_solve(A_data, A_row, A_col, A_nnz, A_m, A_n, b_data):
    """ Python wrapper to qr_solve """
    if len(A_data) != len(A_row) != len(A_col) != A_nnz:
        raise TypeError("A_data, A_row, A_col, A_nnz must agree")
    if len(b_data) != A_m:
        raise TypeError("b_data must be A_m long")

    x_data = np.empty(A_n, dtype=np.float64)
            np.require(A_data, np.float64, 'C'),
            np.require(A_row, np.int64, 'C'),
            np.require(A_col, np.int64, 'C'),
            A_nnz, A_m, A_n,
            np.require(b_data, np.float64, 'C'),
            np.require(x_data, np.float64, 'C')
    return x_data

def main():
    print("Testing qr_solve")
    A_data = np.array([2, 9, 25], dtype=np.float64)
    A_row = np.array([0, 1, 2])
    A_col = np.array([0, 1, 2])
    b_data = np.array([1, 1, 1], dtype=np.float64)

    x_data = qr_solve(A_data, A_row, A_col, len(A_data), 3, 3, b_data)

if __name__ == "__main__":

The test (by calling python spqr.py) creates a 3x3 diagonal matrix and computes the inverse of the diagonal. Works for me! To use SPQR in my Python code, it's simply

import spqr
# A is a scipy.sparse COO matrix that I fill in earlier
x = spqr.qr_solve(A.data, A.row, A.col, A.nnz, A.shape[0], A.shape[1], b)

For my problem, it turned out that it takes a lot longer to run than the iterative LSMR solver, but the results are very similar. This makes me a little more confident in my results--I can obtain them irrespective of which algorithm I use to solve $Ax=b$.

Comments !