The Structure Theorem over PID’s

In this post I’ll describe the structure theorem over PID’s which generalizes the following results:

  • Finite dimensional vector fields over {k} are all of the form {k^{\oplus n}},
  • The classification theorem for finitely generated abelian groups,
  • The Frobenius normal form of a matrix,
  • The Jordan decomposition of a matrix.

1. Some ring theory prerequisites

Prototypical example for this section: {R = \mathbb Z}.

Before I can state the main theorem, I need to define a few terms for UFD’s, which behave much like {\mathbb Z}: Our intuition from the case {R = \mathbb Z} basically carries over verbatim. We don’t even need to deal with prime ideals and can factor elements instead.

Definition 1

If {R} is a UFD, then {p \in R} is a prime element if {(p)} is a prime ideal and {p \neq 0}. For UFD’s this is equivalent to the following property: if {p = xy} then either {x} or {y} is a unit.

So for example in {\mathbb Z} the set of prime elements is {\{\pm2, \pm3, \pm5, \dots\}}. Now, since {R} is a UFD, every element {r} factors into a product of prime elements

\displaystyle  r = u p_1^{e_1} p_2^{e_2} \dots p_m^{e_m}

Definition 2

We say {r} divides {s} if {s = r'r} for some {r' \in R}. This is written {r \mid s}.

Example 3 (Divisibility in {\mathbb Z})

The number {0} is divisible by every element of {\mathbb Z}. All other divisibility as expected.

Ques 4

Show that {r \mid s} if and only if the exponent of each prime in {r} is less than or equal to the corresponding exponent in {s}.

Now, the case of interest is the even stronger case when {R} is a PID:

Proposition 5 (PID’s are Noetherian UFD’s)

If {R} is a PID, then it is Noetherian and also a UFD.

Proof: The fact that {R} is Noetherian is obvious. For {R} to be a UFD we essentially repeat the proof for {\mathbb Z}, using the fact that {(a,b)} is principal in order to extract {\gcd(a,b)}. \Box

In this case, we have a Chinese remainder theorem for elements.

Theorem 6 (Chinese remainder theorem for rings)

Let {m} and {n} be relatively prime elements, meaning {(m) + (n) = (1)}. Then

\displaystyle  R / (mn) \cong R/m \times R/n.

Proof: This is the same as the proof of the usual Chinese remainder theorem. First, since {(m,n)=(1)} we have {am+bn=1} for some {a} and {b}. Then we have a map

\displaystyle  R/m \times R/n \rightarrow R/(mn) \quad\text{by}\quad (r,s) \mapsto r \cdot bn + s \cdot am.

One can check that this map is well-defined and an isomorphism of rings. (Diligent readers invited to do so.) \Box

Finally, we need to introduce the concept of a Noetherian {R}-module.

Definition 7

An {R}-module {M} is Noetherian if it satisfies one of the two equivalent conditions:

  • Its submodules obey the ascending chain condition: there is no infinite sequence of modules {M_1 \subsetneq M_2 \subsetneq \dots}.
  • All submodules of {M} (including {M} itself) are finitely generated.

This generalizes the notion of a Noetherian ring: a Noetherian ring {R} is one for which {R} is Noetherian as an {R}-module.

Ques 8

Check these two conditions are equivalent. (Copy the proof for rings.)

2. The structure theorem

Our structure theorem takes two forms:

Theorem 9 (Structure theorem, invariant form)

Let {R} be a PID and let {M} be any finitely generated {R}-module. Then

\displaystyle  M \cong \bigoplus_{i=1}^m R/s_i

for some {s_i} satisfying {s_1 \mid s_2 \mid \dots \mid s_m}.

Corollary 10 (Structure theorem, primary form)

Let {R} be a PID and let {M} be any finitely generated {R}-module. Then

\displaystyle  M \cong R^{\oplus r} \oplus R/(q_1) \oplus R/(q_2) \oplus \dots \oplus R/(q_m)

where {q_i = p_i^{e_i}} for some prime element {p_i} and integer {e_i \ge 1}.

Proof: Factor each {s_i} into prime factors (since {R} is a UFD), then use the Chinese remainder theorem. \Box

Remark 11

In both theorems the decomposition is unique up to permutations of the summands; good to know, but I won’t prove this.

3. Reduction to maps of free {R}-modules

The proof of the structure theorem proceeds in two main steps. First, we reduce the problem to a linear algebra problem involving free {R}-modules {R^{\oplus d}}. Once that’s done, we just have to play with matrices; this is done in the next section.

Suppose {M} is finitely generated by {d} elements. Then there is a surjective map of {R}-modules

\displaystyle  R^{\oplus d} \twoheadrightarrow M

whose image on the basis of {R^{\oplus d}} are the generators of {M}. Let {K} denote the kernel.

We claim that {K} is finitely generated as well. To this end we prove that

Lemma 12 (Direct sum of Noetherian modules is Noetherian)

Let {M} and {N} be two Noetherian {R}-modules. Then the direct sum {M \oplus N} is also a Noetherian {R}-module.

Proof: It suffices to show that if {L \subseteq M \oplus N}, then {L} is finitely generated. It’s unfortunately not true that {L = P \oplus Q} (take {M = N = \mathbb Z} {L = \{(n,n) \mid n \in \mathbb Z\}}) so we will have to be more careful.

Consider the submodules

\displaystyle  \begin{aligned} A &= \left\{ x \in M \mid (x,0) \in L \right\} \subseteq M \\ B &= \left\{ y \in N \mid \exists x \in M : (x,y) \in L \right\} \subseteq N. \end{aligned}

(Note the asymmetry for {A} and {B}: the proof doesn’t work otherwise.) Then {A} is finitely generated by {a_1}, \dots, {a_k}, and {B} is finitely generated by {b_1}, \dots, {b_\ell}. Let {x_i = (a_i, 0)} and let {y_i = (\ast, b_i)} be elements of {L} (where the {\ast}‘s are arbitrary things we don’t care about). Then {x_i} and {y_i} together generate {L}. \Box

Ques 13

Deduce that for {R} a PID, {R^{\oplus d}} is Noetherian.

Hence {K \subseteq R^{\oplus d}} is finitely generated as claimed. So we can find another surjective map {R^{\oplus f} \twoheadrightarrow K}. Consequently, we have a composition

\displaystyle  R^{\oplus f} \twoheadrightarrow K \hookrightarrow R^{\oplus d} \twoheadrightarrow M.

Observe that {M} is the cokernel of the composition {T : R^{\oplus f} \rightarrow R^{\oplus d}}, i.e. we have that

\displaystyle  M \cong R^{\oplus d} / \text{im }(T).

So it suffices to understand the map {T} well.

4. Smith normal form

The idea is now that we have reduced our problem to studying linear maps {T : R^{\oplus m} \rightarrow R^{\oplus n}}, which can be thought of as a generic matrix

\displaystyle  T = \begin{pmatrix} a_{11} & \dots & a_{1m} \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nm} \end{pmatrix}

for the standard basis {e_1}, \dots, {e_m} of {R^{\oplus m}} and {f_1}, \dots, {f_n} of {N}.

Of course, as you might expect it ought to be possible to change the given basis of {T} such that {T} has a nicer matrix form. We already saw this in Jordan form, where we had a map {T : V \rightarrow V} and changed the basis so that {T} was “almost diagonal”. This time, we have two sets of bases we can change, so we would hope to get a diagonal basis, or even better.

Before proceeding let’s think about how we might edit the matrix: what operations are permitted? Here are some examples:

  • Swapping rows and columns, which just corresponds to re-ordering the basis.
  • Adding a multiple of a column to another column. For example, if we add {3} times the first column to the second column, this is equivalent to replacing the basis

    \displaystyle  (e_1, e_2, e_3, \dots, e_m) \mapsto (e_1, e_2+3e_1, e_3, \dots, e_m).

  • Adding a multiple of a row to another row. One can see that adding {3} times the first row to the second row is equivalent to replacing the basis

    \displaystyle  (f_1, f_2, f_3, \dots, f_n) \mapsto (f_1-3f_2, f_2, f_3, \dots, f_n).

More generally, If {A} is an invertible {n \times n} matrix we can replace {T} with {AT}. This corresponds to replacing

\displaystyle  (f_1, \dots, f_n) \mapsto (A(f_1), \dots, A(f_n))

(the “invertible” condition just guarantees the latter is a basis). Of course similarly we can replace {X} with {XB} where {B} is an invertible {m \times m} matrix; this corresponds to

\displaystyle  (e_1, \dots, e_m) \mapsto (B^{-1}(e_1), \dots, B^{-1}(e_m))

Armed with this knowledge, we can now approach the following result.

Theorem 14 (Smith normal form)

Let {R} be a PID. Let {M = R^{\oplus m}} and {N = R^{\oplus n}} be free {R}-modules and let {T : M \rightarrow N} be a linear map. Set {k = \min(m,n)}.

Then we can select a pair of new bases for {M} and {N} such that {T} has only diagonal entries {s_1}, {s_2}, \dots, {s_k} and {s_1 \mid s_2 \mid \dots \mid s_k}.

So if {m > n}, the matrix should take the form

\displaystyle  \begin{pmatrix} s_1 & 0 & 0 & 0 & \dots & 0 \\ 0 & s_2 & 0 & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots & \dots & \vdots \\ 0 & 0 & 0 & s_n & \dots & 0 \end{pmatrix}.

and similarly when {m \le n}.

Ques 15

Show that Smith normal form implies the structure theorem.

Remark 16

Note that this is not a generalization of Jordan form.

  • In Jordan form we consider maps {T : V \rightarrow V}; note that the source and target space are the same, and we are considering one basis for the space {V}.
  • In Smith form the maps {T : M \rightarrow N} are between different modules, and we pick two sets of bases (one for {M} and one for {N}).

Example 17 (Example of Smith normal form)

To give a flavor of the idea of the proof, let’s work through a concrete example with the following matrix with entries from {\mathbb Z}:

\displaystyle  \begin{pmatrix} 18 & 38 & 48 \\ 14 & 30 & 38 \end{pmatrix}.

The GCD of all the entries is {2}, and so motivated by this, we perform the Euclidean algorithm on the left column: subtract the second row from the first row, then three times the first row from the second:

\displaystyle  \begin{pmatrix} 18 & 38 & 48 \\ 14 & 30 & 38 \end{pmatrix} \mapsto \begin{pmatrix} 4 & 8 & 10 \\ 14 & 30 & 38 \end{pmatrix} \mapsto \begin{pmatrix} 4 & 8 & 10 \\ 2 & 6 & 2 \end{pmatrix}.

Now that the GCD of {2} is present, we move it to the upper-left by switching the two rows, and then kill off all the entries in the same row/column; since {2} was the GCD all along, we isolate {2} completely:

\displaystyle  \begin{pmatrix} 4 & 8 & 10 \\ 2 & 6 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 6 & 2 \\ 4 & 8 & 10 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 6 & 2 \\ 0 & -4 & 6 \\ \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 6 \end{pmatrix}.

This reduces the problem to a {1 \times 2} matrix. So we just apply the Euclidean algorithm again there:

\displaystyle  \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 6 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & 0 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \end{pmatrix}.

Now all we have to do is generalize this proof to work with any PID. It’s intuitively clear how to do this: the PID condition more or less lets you perform a Euclidean algorithm.

Proof: Begin with a generic matrix

\displaystyle  T = \begin{pmatrix} a_{11} & \dots & a_{1m} \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nm} \end{pmatrix}

We want to show, by a series of operations (gradually changing the given basis) that we can rearrange the matrix into Smith normal form.

Define {\gcd(x,y)} to be any generator of the principal ideal {(x,y)}.

Claim 18 (“Euclidean algorithm”)

If {a} and {b} are entries in the same row or column, we can change bases to replace {a} with {\gcd(a,b)} and {b} with something else.

Proof: We do just the case of columns. By hypothesis, {\gcd(a,b) = xa+yb} for some {x,y \in R}. We must have {(x,y) = (1)} now (we’re in a UFD). So there are {u} and {v} such that {xu + yv = 1}. Then

\displaystyle  \begin{pmatrix} x & y \\ -v & u \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} \gcd(a,b) \\ \text{something} \end{pmatrix}

and the first matrix is invertible (check this!), as desired. \Box
Let {s_1 = (a_{ij})_{i,j}} be the GCD of all entries. Now by repeatedly applying this algorithm, we can cause {s} to appear in the upper left hand corner. Then, we use it to kill off all the entries in the first row and the first column, thus arriving at a matrix

\displaystyle  \begin{pmatrix} s_1 & 0 & 0 & \dots & 0 \\ 0 & a_{22}' & a_{23}' & \dots & a_{2n}' \\ 0 & a_{32}' & a_{33}' & \dots & a_{3n}' \\ \vdots&\vdots&\vdots&\ddots&\vdots \\ 0 & a_{m2}' & a_{m3}' & \dots & a_{mn}' \\ \end{pmatrix}.

Now we repeat the same procedure with this lower-right {(m-1) \times (n-1)} matrix, and so on. This gives the Smith normal form. \Box

With the Smith normal form, we have in the original situation that

\displaystyle  M \cong R^{\oplus d} / \text{im } T

and applying the theorem to {T} completes the proof of the structure theorem.

5. Applications

Now, we can apply our structure theorem! I’ll just sketch proofs of these and let the reader fill in details.

Corollary 19 (Finite-dimensional vector spaces are all isomorphic)

A vector space {V} over a field {k} has a finite spanning set of vectors. Then for some {n}, {V \cong k^{\oplus n}}.

Proof: In the structure theorem, {k / (s_i) \in \{0,k\}}. \Box

Corollary 20 (Frobenius normal form)

Let {T : V \rightarrow V} where {V} is a finite-dimensional vector space over an arbitrary field {k} (not necessarily algebraically closed). Then one can write {T} as a block-diagonal matrix whose blocks are all of the form

\displaystyle  \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & \ast \\ 1 & 0 & 0 & \dots & 0 & \ast \\ 0 & 1 & 0 & \dots & 0 & \ast \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots \\ 0 & 0 & 0 & \dots & 1 & \ast \\ \end{pmatrix}.

Proof: View {V} as a {k[x]}-module with action {x \cdot v = T(v)}. By theorem {V \cong \bigoplus_i k[x] / (s_i)} for some polynomials {s_i}, where {s_1 \mid s_2 \mid s_3 \mid \dots}. Write each block in the form described. \Box

Corollary 21 (Jordan normal form)

Let {T : V \rightarrow V} where {V} is a finite-dimensional vector space over an arbitrary field {k} which is algebraically closed. Prove that {T} can be written in Jordan form.

Proof: We now use the structure theorem in its primary form. Since {k[x]} is algebraically closed each {p_i} is a linear factor, so every summand looks like {k[x] / (x-a)^m} for some {m}. \Box

This is a draft of Chapter 15 of the Napkin.

Rant: Matrices Are Not Arrays of Numbers

The following is an excerpt from a current work of mine. I thought I’d share it here, as some people have told me they enjoyed it.

As I’ll stress repeatedly, a matrix represents a linear map between two vector spaces. Writing it in the form of an {m \times n} matrix is merely a very convenient way to see the map concretely. But it obfuscates the fact that this map is, well, a map, not an array of numbers.

If you took high school precalculus, you’ll see everything done in terms of matrices. To any typical high school student, a matrix is an array of numbers. No one is sure what exactly these numbers represent, but they’re told how to magically multiply these arrays to get more arrays. They’re told that the matrix

\displaystyle \left( \begin{array}{cccc} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \\ \end{array} \right)

is an “identity matrix”, because when you multiply by another matrix it doesn’t change. Then they’re told that the determinant is some magical combination of these numbers formed by this weird multiplication rule. No one knows what this determinant does, other than the fact that {\det(AB) = \det A \det B}, and something about areas and row operations and Cramer’s rule.

Then you go into linear algebra in college, and you do more magic with these arrays of numbers. You’re told that two matrices {T_1} and {T_2} are similar if

\displaystyle T_2 = ST_1S^{-1}

for some invertible matrix {S}. You’re told that the trace of a matrix {\text{Tr } T} is the sum of the diagonal entries. Somehow this doesn’t change if you look at a similar matrix, but you’re not sure why. Then you define the characteristic polynomial as

\displaystyle p_T = \det (XI - T).

Somehow this also doesn’t change if you take a similar matrix, but now you really don’t know why. And then you have the Cayley-Hamilton Theorem in all its black magic: {p_T(T)} is the zero map. Out of curiosity you Google the proof, and you find some ad-hoc procedure which still leaves you with no idea why it’s true.

This is terrible. Who gives a — about {T_2 = ST_1S^{-1}}? Only if you know that the matrices are linear maps does this make sense: {T_2} is just {T_1} rewritten with a different choice of basis.

In my eyes, this mess is evil. Linear algebra is the study of linear maps, but it is taught as the study of arrays of numbers, and no one knows what these numbers mean. And for a good reason: the numbers are meaningless. They are a highly convenient way of encoding the matrix, but they are not the main objects of study, any more than the dates of events are the main objects of study in history.

When I took Math 55a as a freshman at Harvard, I got the exact opposite treatment: we did all of linear algebra without writing down a single matrix. During all this time I was quite confused. What’s wrong with a basis? I didn’t appreciate until later that this approach was the morally correct way to treat the subject: it made it clear what was happening.

Throughout this project, I’ve tried to strike a balance between these two approaches, using matrices to illustrate the maps and to simplify proofs, but writing theorems and definitions in their morally correct form. I hope that this has both the advantage of giving the “right” definitions while being concrete enough to be digested. But I would just like to say for the record that, if I had to pick between the high school approach and the 55a approach, I would pick 55a in a heartbeat.