## Motivation

For my work, I was working on a research question, and for part of it, I wanted to see if I could solve the matrix equation $$Ax = b.$$ Easy stuff, right? Here's the thing, my matrix $A$ is both very large (887217 by 39744) and very sparse (0.02% filled). It's overdetermined, so I want to find the least squares solution for $x$ given my large sparse $A$ and a column vector $b$.

I'm doing my prototyping work in Python, so I used the LSMR solver in SciPy. This is a nice iterative solver using a modern algorithm (it improves upon the LSQR algorithm). However, I wanted to try an alternative solver, preferably a direct (not iterative) solution.

After doing some research, I settled upon SuiteSparseQR. It uses the QR decomposition to solve my matrix equation. It's even used in Matlab, so it must be good. It has nice C and C++ interfaces.

## Problem

The problem is, my current code is in Python, not C. Due to licensing problems, SPQR cannot be included in SciPy, and I didn't find any resources online that were Python bindings to SPQR (I did find one for another SuiteSparse package, but it looks pretty out-of-date). So...time to do it myself!

There is an Arch package for the whole SuiteSparse collection, so I installed that. Time to use ctypes, right? Not so fast, SuiteSparse only includes static libraries, not shared libraries.

## Solution

So, in case anyone wants to do something similar, here's how I approached it. I took the example SPQR code and used it to write a function in C that wraps the SPQR solver. This is compiled to a shared library. Then a thin Python interface is used so I can call my C wrapper function from Python.

Here's the C wrapper, `spqr_wrapper.c`:

```
// C wrapper to SparseSuiteQR library et al. for Python
// We pass in the sparse matrix data in a COO sparse matrix format. Cholmod
// refers to this as a "cholmod_triplet" format. This is then converted to its
// "cholmod_sparse" format, which is a CSC matrix.
#include <stdio.h>
#include <stdlib.h>
#include "SuiteSparseQR_C.h"
void qr_solve(double const *A_data, long const *A_row, long const *A_col, size_t A_nnz, size_t A_m, size_t A_n, double const *b_data, double *x_data) {
// Solves the matrix equation Ax=b where A is a sparse matrix and x and b
// are dense column vectors. A and b are inputs, x is solved for in the
// least squares sense using a rank-revealing QR factorization.
//
// Inputs
//
// A_data, A_row, A_col: the COO data
// A_nnz: number of non-zero entries, ie the length of the arrays A_data, etc
// A_m: number of rows in A
// A_n: number of cols in A
// b_data: the data in b. It is A_m entries long.
//
// Outputs
//
// x_data: the data in x. It is A_n entries long
//
// MAKE SURE x_data is allocated to the right size before calling this function
//
cholmod_common Common, *cc;
cholmod_sparse *A_csc;
cholmod_triplet *A_coo;
cholmod_dense *b, *x;
size_t k;
// Helper pointers
long *Ai, *Aj;
double *Ax, *bx, *xx;
/* start CHOLMOD */
cc = &Common ;
cholmod_l_start (cc) ;
// Create A, first as a COO matrix, then convert to CSC
A_coo = cholmod_l_allocate_triplet(A_m, A_n, A_nnz, 0, CHOLMOD_REAL, cc);
if (A_coo == NULL) {
fprintf(stderr, "ERROR: cannot allocate triplet");
return;
}
// Copy in data
Ai = A_coo->i;
Aj = A_coo->j;
Ax = A_coo->x;
for (k=0; k<A_nnz; k++) {
Ai[k] = A_row[k];
Aj[k] = A_col[k];
Ax[k] = A_data[k];
}
A_coo->nnz = A_nnz;
// Make sure the matrix is valid
if (cholmod_l_check_triplet(A_coo, cc) != 1) {
fprintf(stderr, "ERROR: triplet matrix is not valid");
return;
}
// Convert to CSC
A_csc = cholmod_l_triplet_to_sparse(A_coo, A_nnz, cc);
// Create b as a dense matrix
b = cholmod_l_allocate_dense(A_m, 1, A_m, CHOLMOD_REAL, cc);
bx = b->x;
for (k=0; k<A_m; k++) {
bx[k] = b_data[k];
}
// Make sure the matrix is valid
if (cholmod_l_check_dense(b, cc) != 1) {
fprintf(stderr, "ERROR: b vector is not valid");
return;
}
// Solve for x
x = SuiteSparseQR_C_backslash_default(A_csc, b, cc);
// Return values of x
xx = x->x;
for (k=0; k<A_n; k++) {
x_data[k] = xx[k];
}
/* free everything and finish CHOLMOD */
cholmod_l_free_triplet(&A_coo, cc);
cholmod_l_free_sparse(&A_csc, cc);
cholmod_l_free_dense(&x, cc);
cholmod_l_free_dense(&b, cc);
cholmod_l_finish(cc);
return;
}
```

The SPQR solver, `SuiteSparseQR_C_backslash_default()` uses data structures to represent the matrices from the CHOLMOD component of SuiteSparse. My function `qr_solve()` has a more C-like interface where it takes in arrays of doubles and so forth (this is a COO sparse matrix format, but is converted in the function to the CSC format SPQR expects). I admit it may not be the most memory efficient since it creates new sparse matrices and copies over the data, but it gets the job done and isn't very time-intensive compared to the actual SPQR solver.

Here's my `Makefile` to compile it to a shared library:

```
CFLAGS = -m64 -Wall -Wextra -Wformat=2 -std=gnu90 -O2 -march=native
SPQRLIBS = -lspqr -lsuitesparseconfig -lcholmod -lamd -lcolamd -lm -llapack -lblas
spqr_wrapper.so: spqr_wrapper.c
gcc $^ -o $@ $(CFLAGS) -fPIC -shared $(SPQRLIBS)
clean:
rm -f spqr_wrapper.so
```

Note that SPQR takes quite a few libraries (in this order) in order to link right. But at least we now have a shared library that can work with Python.

The Python wrapper is fairly thin since it's just playing around with ctypes. I use a NumPy/ctypes function to help ensure the data from Python is compatible with what my C function expects. The Python function call also does some quick checking to make sure the inputs are the right length and so on. Here's the Python wrapper, `spqr.py` (note that it's for Python 3, not 2):

```
#!/usr/bin/env python
__author__ = 'Rich Li'
""" SuiteSparseQR Python wrapper """
import os.path
import ctypes
from ctypes import c_double, c_size_t, byref, pointer, POINTER
import numpy as np
from numpy.ctypeslib import ndpointer
# Assume spqr_wrapper.so (or a link to it) is in the same directory as this file
spqrlib = ctypes.cdll.LoadLibrary(os.path.dirname(__file__) + os.path.sep + "spqr_wrapper.so")
# Function prototypes for spqr_wrapper.so
# void qr_solve(double const *A_data, double const *A_row, double const *A_col, size_t A_nnz, size_t A_m, size_t A_n, double const *b_data, double *x_data)
spqrlib.qr_solve.argtypes = [
ndpointer(dtype=np.float64, ndim=1, flags='C_CONTIGUOUS'), # A_data
ndpointer(dtype=np.int, ndim=1, flags='C_CONTIGUOUS'), # A_row
ndpointer(dtype=np.int, ndim=1, flags='C_CONTIGUOUS'), # A_col
c_size_t, # A_nnz
c_size_t, # A_m
c_size_t, # A_n
ndpointer(dtype=np.float64, ndim=1, flags='C_CONTIGUOUS'), # b_data
ndpointer(dtype=np.float64, ndim=1, flags=('C_CONTIGUOUS', 'WRITEABLE')), # x_data
]
spqrlib.qr_solve.restype = None
def qr_solve(A_data, A_row, A_col, A_nnz, A_m, A_n, b_data):
""" Python wrapper to qr_solve """
if len(A_data) != len(A_row) != len(A_col) != A_nnz:
raise TypeError("A_data, A_row, A_col, A_nnz must agree")
if len(b_data) != A_m:
raise TypeError("b_data must be A_m long")
x_data = np.empty(A_n, dtype=np.float64)
spqrlib.qr_solve(
np.require(A_data, np.float64, 'C'),
np.require(A_row, np.int64, 'C'),
np.require(A_col, np.int64, 'C'),
A_nnz, A_m, A_n,
np.require(b_data, np.float64, 'C'),
np.require(x_data, np.float64, 'C')
)
return x_data
def main():
print("Testing qr_solve")
A_data = np.array([2, 9, 25], dtype=np.float64)
A_row = np.array([0, 1, 2])
A_col = np.array([0, 1, 2])
b_data = np.array([1, 1, 1], dtype=np.float64)
x_data = qr_solve(A_data, A_row, A_col, len(A_data), 3, 3, b_data)
print(x_data)
if __name__ == "__main__":
main()
```

The test (by calling `python spqr.py`) creates a 3x3 diagonal matrix and computes the inverse of the diagonal. Works for me! To use SPQR in my Python code, it's simply

```
import spqr
...
# A is a scipy.sparse COO matrix that I fill in earlier
x = spqr.qr_solve(A.data, A.row, A.col, A.nnz, A.shape[0], A.shape[1], b)
```

For my problem, it turned out that it takes a lot longer to run than the iterative LSMR solver, but the results are very similar. This makes me a little more confident in my results--I can obtain them irrespective of which algorithm I use to solve $Ax=b$.

## Comments !