2019-08-22

A recent project of mine has been to formalise and reformalise some basic linear algebra in Agda. This was motivated directly by our recent result that many syntactic properties of usage-tracking type systems benefit from being stated in terms of vectors and matrices. This form of usage tracking appears to be exactly powerful enough to capture intuitionistic linear *logic*, so involvement of linear *algebra* is in some sense unsurprising. The similarities go further than the names.

The aim for this post is to describe and justify the method I have settled on. The challenge is to take an arbitrary semiring (rig) `R`

of coefficients and define vectors, matrices, matrix addition, and matrix multiplication, and then prove all of the expected properties of these operations. We will not consider performance, mainly because I neither know nor care at the moment.

For the impatient, my recommendations are the following four points.

- Define vectors and matrices as functions from positions to coefficients.
- Positions are (single/pairs of)
`Fin`

s, with definitional equality as good as equivalence. - Define the identity matrix via an equality test on the indices of each position, rather than trying an inductive construction.
- Do not define this equality test via the decidability of equality of
`Fin n`

.

Obligatory literate blog post notes: this post comes from a literate Agda file, of which many parts have been surpressed. Defined names are clickable. The full source code is available at VecMat.lagda.md, and depends just on the Standard Library. I have tried to make the Agda code readable at a high level, particularly by using equational reasoning syntax.

If you just want to read the right way of doing things, skip to Using what we learnt.

Everyone’s favourite dependent type family is the *vectors*, defined as follows.

infixr 5 _∷_ data Vec (A : Set a) : (n : ℕ) → Set a where [] : Vec A zero _∷_ : A → Vec A n → Vec A (suc n)

These are lists with a specified length. At a technical level, `Vec`

is an inductive type family being constrained by the inductive structure of a natural number `n`

. This means that any function we write out of a `Vec`

will be by induction on the vector’s structure.

We can define matrices as vectors of row vectors.

Mat : (A : Set a) (m n : ℕ) → Set a Mat A m n = Vec (Vec A n) m

We will also need to lift relations pointwise to vectors, mainly to talk about equivalence. For the sake of code reuse, we will take a heterogeneous definition.

data Pw (R : REL A B ℓ) : REL (Vec A n) (Vec B n) ℓ where [] : Pw R [] [] _∷_ : ∀ {x y xs ys} → R x y → Pw R {n = n} xs ys → Pw R (x ∷ xs) (y ∷ ys)

Addition and the zero matrix will follow from the applicative structure of `λ A → Vec A n`

(that is, `zip`

and `replicate`

, respectively). With a few more standard list functions, we will be able to define the identity matrix and matrix multiplication.

outer : (A → B → C) → (Vec A m → Vec B n → Mat C m n) outer f [] ys = [] outer f (x ∷ xs) ys = map (f x) ys ∷ outer f xs ys

We will multiply matrices by taking the outer product of each column/row pair, giving a collection of matrices, and then summing these matrices together. This is why we define the function `outer`

.

Accessing rows of matrices is easy, because we can view the first row by pattern matching. However, columns are buried, and to access them we need a function defined by induction on the rows.

left/rest : Mat A m (suc n) → Vec A m × Mat A m n left/rest [] = [] , [] left/rest ((x ∷ v) ∷ M) = Σ.map (x ∷_) (v ∷_) (left/rest M)

`lest/rest-map-∷`

is a lemma we will need later. It says that adding a constant column and then viewing a column gives back that constant column and the original matrix.

left/rest-map-∷ : (x : A) (M : Mat A m n) → left/rest (map (x ∷_) M) ≡ (replicate x , M) left/rest-map-∷ x [] = ≡.refl left/rest-map-∷ x (u ∷ M) rewrite left/rest-map-∷ x M = ≡.refl

With these done, we can focus on vector/matrix operations. Vectors have a zero, addition, and scaling by a coefficient.

module WithSemiring (R : Semiring c ℓ) where open Semiring R infixr 7 _*ᵥ_ _*ₘ_ _⊗ₒ_ infixr 6 _+ᵥ_ _+ₘ_ 0ᵥ : Vec Carrier m 0ᵥ = replicate 0# _+ᵥ_ : (u v : Vec Carrier m) → Vec Carrier m _+ᵥ_ = zip _+_ _*ᵥ_ : Carrier → Vec Carrier m → Vec Carrier m x *ᵥ u = map (x *_) u

These definitions are convenient for use in matrix operations.

0ₘ : Mat Carrier m n 0ₘ = replicate 0ᵥ _+ₘ_ : (M N : Mat Carrier m n) → Mat Carrier m n _+ₘ_ = zip _+ᵥ_ 1ₘ : Mat Carrier m m 1ₘ {zero} = [] 1ₘ {suc m} = (1# ∷ 0ᵥ) ∷ map (0# ∷_) (1ₘ {m}) _⊗ₒ_ : Vec Carrier m → Vec Carrier n → Mat Carrier m n _⊗ₒ_ = outer _*_ _*ₘ_ : Mat Carrier m n → Mat Carrier n o → Mat Carrier m o _ *ₘ [] = 0ₘ uM *ₘ (v ∷ N) = let u , M = left/rest uM in u ⊗ₒ v +ₘ M *ₘ N

This implementation of `_*ₘ_`

is by no means canonical. However, I argue that it fits the spirit of inductive vectors well. In particular, this definition is primarily by induction on `N`

, one of the inputs. I have tried several other implementations in the past, but none were particularly nice.

In the Agda standard library, algebraic structures each come with an equivalence relation `_≈_`

, which is weaker than propositional equality. Algebraic laws only hold up to `_≈_`

, so we will need to have similar equivalence relations for vectors and matrices.

infix 4 _≈ᵥ_ _≈ₘ_ _≈ᵥ_ : (u v : Vec Carrier n) → Set ℓ _≈ᵥ_ = Pw _≈_ _≈ₘ_ : (M N : Mat Carrier m n) → Set ℓ _≈ₘ_ = Pw _≈ᵥ_

We want to prove that `1ₘ`

is the unit of `_*ₘ_`

. Our approach is to follow the definition of `_*ₘ_`

and do induction on `M`

. This conveniently tells us something about the dimension `m`

, allowing `1ₘ`

to compute.

*ₘ-identityˡ : (M : Mat Carrier m n) → 1ₘ *ₘ M ≈ₘ M *ₘ-identityˡ [] = [] *ₘ-identityˡ (u ∷ M) = begin 1ₘ *ₘ (u ∷ M) ≡⟨⟩ (1# *ᵥ u ∷ proj₁ (left/rest (map (0# ∷_) 1ₘ)) ⊗ₒ u) +ₘ (0ᵥ ∷ proj₂ (left/rest (map (0# ∷_) 1ₘ))) *ₘ M ≈⟨ (let q = left/rest-map-∷ 0# 1ₘ in +ₘ-cong (*ᵥ-identityˡ u ∷ ⊗ₒ-cong (pw reflexive (≡⇒Pw-≡ (≡.cong proj₁ q))) reflᵥ) (pw (pw reflexive) (pw ≡⇒Pw-≡ (≡⇒Pw-≡ (≡.cong (λ z → (0ᵥ ∷ z) *ₘ M) (≡.cong proj₂ q)))))) ⟩ (u ∷ 0ᵥ ⊗ₒ u) +ₘ (0ᵥ ∷ 1ₘ) *ₘ M ≈⟨ +ₘ-cong (reflᵥ ∷ ⊗ₒ-zeroˡ u) (transₘ (lemma0 0ᵥ 1ₘ M) (lemma1 M Pw-++ reflₘ)) ⟩ (u ∷ 0ₘ) +ₘ (0ᵥ ∷ 1ₘ *ₘ M) ≡⟨⟩ u +ᵥ 0ᵥ ∷ 0ₘ +ₘ 1ₘ *ₘ M ≈⟨ +ᵥ-identityʳ u ∷ transₘ (+ₘ-identityˡ (1ₘ *ₘ M)) (*ₘ-identityˡ M) ⟩ u ∷ M ∎ where open module R {m n} = SetoidReasoning (Pw-setoid (Pw-setoid setoid n) m) lemma0 : (u : Vec _ n) (M : Mat _ m n) (N : Mat _ n o) → (u ∷ M) *ₘ N ≈ₘ (u ∷ []) *ₘ N ++ M *ₘ N lemma0 u M [] = reflₘ lemma0 (x ∷ u) vM (w ∷ N) with left/rest vM ... | v , M = begin (x *ᵥ w ∷ v ⊗ₒ w) +ₘ (u ∷ M) *ₘ N ≈⟨ +ₘ-cong reflₘ (lemma0 u M N) ⟩ (x *ᵥ w ∷ v ⊗ₒ w) +ₘ ((u ∷ []) *ₘ N ++ M *ₘ N) ≈⟨ pseudowith ((u ∷ []) *ₘ N) ⟩ (x *ᵥ w ∷ []) +ₘ (u ∷ []) *ₘ N ++ v ⊗ₒ w +ₘ M *ₘ N ∎ where pseudowith : ∀ u*N → (x *ᵥ w ∷ v ⊗ₒ w) +ₘ (u*N ++ M *ₘ N) ≈ₘ (x *ᵥ w ∷ []) +ₘ u*N ++ v ⊗ₒ w +ₘ M *ₘ N pseudowith (u*N ∷ []) = reflₘ lemma1 : (N : Mat _ m n) → (0ᵥ ∷ []) *ₘ N ≈ₘ 0ᵥ ∷ [] lemma1 [] = reflₘ lemma1 (v ∷ N) = begin (0# *ᵥ v ∷ []) +ₘ (0ᵥ ∷ []) *ₘ N ≈⟨ +ₘ-cong ((*ᵥ-zeroˡ v) ∷ []) (lemma1 N) ⟩ (0ᵥ ∷ []) +ₘ (0ᵥ ∷ []) ≈⟨ +ₘ-identityˡ 0ₘ ⟩ 0ᵥ ∷ [] ∎

Okay, we did it, but it’s a bit of a mess. If you look at the proof, you notice that a lot of it is about getting round the fact that, for an `m × n`

matrix, we only have indirect access to the `n`

dimension, the columns. The `left/rest`

function is the worst offender for this. It says we want access to a column of a matrix, but the inductive structure only gives us access to the rows. So we have to pick off the first coefficient in each row, and call that a column.

Ultimately, this means that we should be caring more about the inductive structure of the *dimensions*, rather than any inductive structure on matrices themselves. Dimensions always appear alone, so doing induction over them is always easy, whereas matrices have two dimensions. So, to stop doing induction on matrices, let’s make them not inductive data structures.

How can we avoid prioritising one dimension of a matrix over the other? The only way of doing this I can think of is by defining vectors and matrices as *functions from positions to values*. Vectors and matrices are now no longer inductive structures, but rather functions, and our primary way of defining and using them is via *index notation*. Let’s do it.

Vec : (A : Set a) (n : ℕ) → Set a Vec A n = Fin n → A Mat : (A : Set a) (m n : ℕ) → Set a Mat A m n = Vec (Vec A n) m -- equivalently, Fin m → Fin n → A

This time, defining our own equivalence is even more critical. Even if the coefficients are up to propositional equality, standard Agda functions don’t support extensionality for propositional equality, so we’d have trouble proving anything.

Pw : REL A B ℓ → REL (Vec A n) (Vec B n) ℓ Pw R u v = ∀ i → R (u i) (v i)

Next, we can think about matrix operations.

module WithSemiring (R : Semiring c ℓ) where open Semiring R hiding (zero)

infix 4 _≈ᵥ_ _≈ₘ_ _≈ᵥ_ : Rel (Vec Carrier n) ℓ _≈ᵥ_ = Pw _≈_ _≈ₘ_ : Rel (Mat Carrier m n) ℓ _≈ₘ_ = Pw _≈ᵥ_

The additive structure is particularly easy. Everything is pointwise in a way that even Agda can see, so the lemmas are easy too.

0ₘ : Mat Carrier m n 0ₘ _ _ = 0# _+ₘ_ : (M N : Mat Carrier m n) → Mat Carrier m n (M +ₘ N) i j = M i j + N i j -- An example property: +ₘ-identityˡ : (M : Mat Carrier m n) → 0ₘ +ₘ M ≈ₘ M +ₘ-identityˡ M i j = +-identityˡ (M i j)

The multiplicative structure still requires some thought. In particular, there are some careful choices to be made about the identity matrix, which I will explain when we prove things about it.

_==_ : (i j : Fin n) → Bool zero == zero = true zero == suc j = false suc i == zero = false suc i == suc j = i == j 1ₘ : Mat Carrier m m 1ₘ i j = if i == j then 1# else 0# sum : Vec Carrier n → Carrier sum {zero} v = 0# sum {suc n} v = v zero + sum {n} (v ∘ suc) _*ₘ_ : Mat Carrier m n → Mat Carrier n o → Mat Carrier m o (M *ₘ N) i k = sum \ j → M i j * N j k

These definitions should better resemble standard mathematical practice. Rather than our input-driven, parallel definition of `_*ₘ_`

from before, we have an output-driven definition in terms of lots of little `sum`

s (mathematical Σ). `1ₘ`

has gone from the rather alien functional programming construction to the Kronecker δ.

Now, how about our test case of proving properties about multiplication?

*ₘ-identityˡ : (M : Mat Carrier m n) → 1ₘ *ₘ M ≈ₘ M *ₘ-identityˡ {suc m} {n} M zero k = begin (1ₘ *ₘ M) zero k ≡⟨⟩ 1ₘ {suc m} zero zero * M zero k + (sum \ j → 1ₘ zero (suc j) * M (suc j) k) ≡⟨⟩ 1# * M zero k + (sum \ j → 0# * M (suc j) k) ≈⟨ +-cong (*-identityˡ _) (*-sum 0# \ j → M (suc j) k) ⟩ M zero k + 0# * (sum \ j → M (suc j) k) ≈⟨ +-cong refl (zeroˡ _) ⟩ M zero k + 0# ≈⟨ +-identityʳ _ ⟩ M zero k ∎ where open SetoidReasoning setoid *ₘ-identityˡ {suc m} {n} M (suc i) k = begin (1ₘ *ₘ M) (suc i) k ≡⟨⟩ 1ₘ (suc i) zero * M zero k + (sum \ j → 1ₘ (suc i) (suc j) * M (suc j) k) ≡⟨⟩ 0# * M zero k + (sum \ j → 1ₘ i j * M (suc j) k) ≡⟨⟩ 0# * M zero k + (1ₘ *ₘ (λ j k → M (suc j) k)) i k ≈⟨ +-cong (zeroˡ _) (*ₘ-identityˡ {m} {n} (λ j k → M (suc j) k) i k) ⟩ 0# + M (suc i) k ≈⟨ +-identityˡ _ ⟩ M (suc i) k ∎ where open SetoidReasoning setoid

The structure of this proof is very different from what it was last time, reflecting the differences in our definitions. In line with things being output-driven, the proof is by induction on the index `i`

– a row number in the result. It can also be seen as being by induction on the dimension `m`

, and then by cases on the index `i`

. The steps of reasoning should look familiar to a mathematician. This is a proof by index notation where equations are all between coefficients rather than matrices, and we use both simple algebraic properties and properties of summation to simplify expressions. I invite the reader to actually read the reasoning steps, and to this end I have included some steps which are just computation and tried to move the proof terms (the ones inside `≈⟨ _ ⟩`

) out of the way.

There is still some sleight of hand at play in these proofs. The least sleight is the behaviour of `1ₘ`

. For slick proofs, we require all four cases (`1ₘ zero zero`

, `1ₘ zero (suc j)`

, `1ₘ (suc i) zero`

, and `1ₘ (suc i) (suc j)`

) to compute in a convenient way. The first three should compute to their values, and the final one should compute to `1ₘ i j`

so that we’re still in the realm of our inductive hypothesis. Without paying attention to this, it is quite easy to make an awkward definition of `1ₘ`

that will make you miserable.

Perhaps the most obvious thing for an Agda programmer to do would be to think of what the identity matrix looks like, and implement it in such a way that this intuition is visible in the code. The thing we all remember about the identity matrix is that, for the position `(i, j)`

, the coefficient is 1 when `i`

= `j`

, and 0 otherwise. So, let’s use the fact that equality is decidable on `Fin n`

, and just write this down.

1ₘd : Mat Carrier m m 1ₘd i j with i ≟ j ... | yes _ = 1# ... | no _ = 0#

This satisfies three of the computation rules, but normalising `1ₘd (suc i) (suc j)`

gives us `1ₘd (suc i) (suc j) | (suc i ≟ suc j | i ≟ j)`

. In other words, computation depends on the result of `suc i ≟ suc j`

, which is in turn not going to compute until we know more about `i ≟ j`

. This is a real pain. When we used the boolean-valued equality test `_==_`

earlier, we could say that `suc i == suc j`

was *equal* to `i == j`

. But with intrinsic decidable equality, `suc i ≟ suc j`

and `i ≟ j`

have different types, so we have to deconstruct and reconstruct our `Dec`

values at the recursive call, fixing up the proofs depending on which side we’re on. One can go to the effort of proving that `1ₘd (suc i) (suc j) ≡ 1ₘd i j`

, but I’m sure one would rather not. It would be an afterthought.

The sneakiest trick I pulled (though admittedly the one with least impact) is in the index `zero`

. Notice that, in the case `*ₘ-identityˡ M zero k`

, the right-hand side of our equation is `M zero k`

. The `zero`

here comes about because we’re in the case where `i = zero`

. However, working from the left of the equation, we see that we get a new `zero`

from `sum`

. Specifically, `sum {suc n} v = v zero + sum {n} (v ∘ suc)`

, and the `zero`

in this expression is something we made up from nothing. Thus, we benefited from the fact that all `zero`

es are judgementally equal.

It may seem obvious that there is only one `zero`

, but when reïmplementing `Fin`

as thinnings from 1, this is no longer easy. To recap, here is a definition of thinnings.

data Th : (m n : ℕ) → Set where stop : Th zero zero keep : Th m n → Th (suc m) (suc n) drop : Th m n → Th m (suc n)

The intended semantics is that a value of type `Th m n`

is the instructions for how to take a vector of length `n`

and produce a vector of smaller length `m`

by only keeping some of the original elements. Thus, a value of `Th 1 n`

describes how to pick 1 element out of a vector of length `n`

.

`Th 1 n`

is equivalent to `Fin n`

. Moreover, the induction on them is very similar, as shown by this conversion function.

Th-to-Fin : Th 1 n → Fin n Th-to-Fin (keep θ) = zero Th-to-Fin (drop θ) = suc (Th-to-Fin θ)

There is no case for `stop`

, as `1 ≠ 0`

. `keep`

means that we’re keeping the first element, or in other words choosing the first element, so this corresponds to the index `zero`

. `drop`

means that we’re not keeping the first element, so the index must be somewhere later.

In the `keep`

case, `θ`

has type `Th 0 n`

. Referring back to the semantics, this means that we’re not going to pick any more elements, so it must be that `θ`

is `n`

-many `drop`

s followed by a `stop`

. However, we cannot see this judgementally, so whenever we come across a case `keep θ`

, we must reason it equal to any other representation of `zero`

. In our proof `*ₘ-identityˡ`

, this means adding a fixup line at the end of the `i = zero`

case that rewrites by a propositional equality proof.

I like this case study because several of the recommendations I make go against one piece of common wisdom or another. One really has to think critically about requirements rather than just doing what has worked before (or, in my case, try out every possible approach and see which one works best).

It tells us something about the Standard Library, or more generally about the requirements of any standard library for Agda. In particular, we can’t replace `Fin`

by `Th 1`

and expect everything to keep working, and even when we have `Dec`

proofs, sometimes a `Bool`

version of the same thing is necessary. Perhaps we should have a type `EDec`

for extrinsic decidability, complementing intrinsic decidability from `Dec`

.

EDec : Set a → Set a EDec A = ∃ \ (b : Bool) → if b then A else ¬ A _≟ᴱ_ : (i j : Fin n) → EDec (i ≡ j) zero ≟ᴱ zero = true , ≡.refl zero ≟ᴱ suc j = false , λ () suc i ≟ᴱ zero = false , λ () suc i ≟ᴱ suc j = Σ.map id go (i ≟ᴱ j) where go : ∀ {b} → if b then i ≡ j else i ≢ j → if b then suc i ≡ suc j else suc i ≢ suc j go {false} i≢j si≡sj = i≢j (suc-injective si≡sj) go {true} i≡j = ≡.cong suc i≡j 1ₘe : Mat Carrier m m 1ₘe i j = if proj₁ (i ≟ᴱ j) then 1# else 0#

We can take this all the way back to the fact that Σ, 2, and large eliminations are enough to implement binary sums. It’s interesting to note, though, the differences at the level of judgemental equality. Indeed, `proj₁ (suc i ≟ᴱ suc j)`

computes to `proj₁ (i ≟ᴱ j)`

, as we wanted.

The second definition of `Vec`

is similar to stdlib’s `Data.Table`

. I import it here just so you can click and find the definition.

open import Data.Table using (Table)

The only difference between my `Vec`

and `Table`

is that `Table`

is a single-field record rather than just a definition. The technique of creating a single-field record is often a useful one. For example, if we had dependent tables, where elements at different indices could have different types, the following definition would be best.

record DTable n (A : Fin n → Set a) : Set a where constructor mk field get : (i : Fin n) → A i open DTable public

When we have a term `t : DTable n A`

, unification can easily recover both `n`

and `A`

because they are there in the type. Instead of `DTable n A`

computing to `(i : Fin n) → A i`

, the record declaration intuitively blocks the computation until we explicitly use the `get`

accessor. Unification has trouble with inferring `A`

from `(i : Fin n) → A i`

because it is a higher-order term. However, in our specific case for `Vec`

(as in stdlib’s `Table`

), `A`

is not a higher-order term, and there are no problems extracting `n`

and `A`

from `Fin n → A`

. Thus stdlib’s definition of `Table`

is overkill.

Questions and comments about all of this post are welcome.