Sparse Matrix Data Structures
Introduction
A sparse matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) is a matrix in which the majority of elements are zero. The sparsity is defined as:
where \(\text{nnz}\) denotes the number of non-zero entries. For matrices arising in scientific computing applications (finite element methods, power grids, graph analytics), sparsity values of \(s > 0.99\) are typical, making dense storage prohibitively expensive.
Memory Comparison
For a dense matrix:
For a CSR matrix (see below):
The compression ratio is:
For a typical sparse matrix from the SuiteSparse collection with \(m = n = 10^6\) and \(\text{nnz} = 10^7\):
meaning dense storage would require approximately 47× more memory.
CSR (Compressed Sparse Row) Format
The Compressed Sparse Row (CSR) format, also known as the Harwell-Boeing format, stores a sparse matrix using three arrays:
values \(\in \mathbb{R}^{\text{nnz}}\) — the non-zero values in row-major order
col_index \(\in \{0, 1, \ldots, n-1\}^{\text{nnz}}\) — column indices
row_ptr \(\in \{0, 1, \ldots, \text{nnz}\}^{m+1}\) — row start offsets
The row \(i\) occupies the index range \([\text{row\_ptr}[i], \text{row\_ptr}[i+1])\). This enables \(O(1)\) access to any row.
CSR Array Layout
For a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) with \(\text{nnz}\) non-zeros:
The three arrays are:
where \(\text{nnz}_i\) is the number of non-zeros in row \(i\).
Example: 3×4 Matrix
Consider:
CSR representation:
values = [a, b, c, d, e]
col_index = [0, 3, 1, 0, 2] (0-based columns)
row_ptr = [0, 2, 3, 5] (row 0 has indices [0,2), row 1 → [2,3), row 2 → [3,5))
Formal Invariants
The CSR format maintains the following invariants:
COO (Coordinate) Format
The Coordinate (COO) format stores each non-zero as an explicit \((row, col, value)\) triple:
values \(\in \mathbb{R}^{\text{nnz}}\) — the non-zero values
row \(\in \{0, \ldots, m-1\}^{\text{nnz}}\) — row indices
col \(\in \{0, \ldots, n-1\}^{\text{nnz}}\) — column indices
All three arrays are parallel — entry at index \(k\) represents \((\text{row}[k], \text{col}[k], \text{values}[k])\).
COO is the natural output of the Matrix Market parser and is easier to construct incrementally, but does not support efficient row access (\(O(\text{nnz})\) scan required).
Formal Invariants
COO does not require sorted entries.
COO to CSR Conversion
The conversion from COO to CSR uses a counting sort algorithm with \(O(\text{nnz})\) time complexity and \(O(m)\) auxiliary space.
Algorithm Steps
Step 1: Count non-zeros per row
Step 2: Prefix sum to compute row_ptr
Equivalently:
Step 3: Write entries using write counters
write_offset[i] = row_ptr[i] (copy)
for k = 0 .. nnz-1:
i = row[k]
j = col[k]
v = values[k]
values_csr[write_offset[i]] = v
col_index[write_offset[i]] = j
write_offset[i]++
The write counters ensure each row’s entries are contiguous in the output arrays.
Complexity Analysis
Phase |
Time |
Extra Space |
|---|---|---|
Count |
\(O(nnz)\) |
\(O(m)\) |
Prefix |
\(O(m)\) |
— |
Write |
\(O(nnz)\) |
\(O(m)\) |
Total |
\(O(nnz + m)\) |
\(O(m)\) |
The algorithm is cache-friendly because: - Counting scans COO arrays once with sequential access - Prefix sum is cache-backed for typical \(m \ll \text{nnz}\) - Write phase has strided (but predictable) access patterns
Dense Vector
A DenseVector \(\mathbf{x} \in \mathbb{R}^n\) stores all \(n\) elements explicitly:
Memory footprint:
API Reference
- struct SparseMatrix
CSR-format sparse matrix.
-
int64_t rows
Number of matrix rows (\(m\)).
-
int64_t cols
Number of matrix columns (\(n\)).
-
int64_t nnz
Number of non-zero entries.
-
std::vector<double> values
Non-zero values, row-major order, size \(\text{nnz}\).
-
std::vector<int64_t> col_index
Column index for each value, size \(\text{nnz}\), 0-based.
-
std::vector<int64_t> row_ptr
Row start offsets, size \(m + 1\).
-
int64_t rows
References
Saad, Y. (2003). Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM.
Davis, T. A. (2006). Direct Methods for Sparse Linear Systems. SIAM.
Boisvert, R., Pozo, R., Remming, K. & Suzuki, J. (1997). The Matrix Market Exchange Formats. NIST Report NISTIR-6025.