The Structure Theorem over PID’s

In this post I’ll describe the structure theorem over PID’s which generalizes the following results:

  • Finite dimensional vector fields over {k} are all of the form {k^{\oplus n}},
  • The classification theorem for finitely generated abelian groups,
  • The Frobenius normal form of a matrix,
  • The Jordan decomposition of a matrix.

1. Some ring theory prerequisites

Prototypical example for this section: {R = \mathbb Z}.

Before I can state the main theorem, I need to define a few terms for UFD’s, which behave much like {\mathbb Z}: Our intuition from the case {R = \mathbb Z} basically carries over verbatim. We don’t even need to deal with prime ideals and can factor elements instead.

Definition 1

If {R} is a UFD, then {p \in R} is a prime element if {(p)} is a prime ideal and {p \neq 0}. For UFD’s this is equivalent to the following property: if {p = xy} then either {x} or {y} is a unit.

So for example in {\mathbb Z} the set of prime elements is {\{\pm2, \pm3, \pm5, \dots\}}. Now, since {R} is a UFD, every element {r} factors into a product of prime elements

\displaystyle  r = u p_1^{e_1} p_2^{e_2} \dots p_m^{e_m}

Definition 2

We say {r} divides {s} if {s = r'r} for some {r' \in R}. This is written {r \mid s}.

Example 3 (Divisibility in {\mathbb Z})

The number {0} is divisible by every element of {\mathbb Z}. All other divisibility as expected.

Ques 4

Show that {r \mid s} if and only if the exponent of each prime in {r} is less than or equal to the corresponding exponent in {s}.

Now, the case of interest is the even stronger case when {R} is a PID:

Proposition 5 (PID’s are Noetherian UFD’s)

If {R} is a PID, then it is Noetherian and also a UFD.

Proof: The fact that {R} is Noetherian is obvious. For {R} to be a UFD we essentially repeat the proof for {\mathbb Z}, using the fact that {(a,b)} is principal in order to extract {\gcd(a,b)}. \Box

In this case, we have a Chinese remainder theorem for elements.

Theorem 6 (Chinese remainder theorem for rings)

Let {m} and {n} be relatively prime elements, meaning {(m) + (n) = (1)}. Then

\displaystyle  R / (mn) \cong R/m \times R/n.

Proof: This is the same as the proof of the usual Chinese remainder theorem. First, since {(m,n)=(1)} we have {am+bn=1} for some {a} and {b}. Then we have a map

\displaystyle  R/m \times R/n \rightarrow R/(mn) \quad\text{by}\quad (r,s) \mapsto r \cdot bn + s \cdot am.

One can check that this map is well-defined and an isomorphism of rings. (Diligent readers invited to do so.) \Box

Finally, we need to introduce the concept of a Noetherian {R}-module.

Definition 7

An {R}-module {M} is Noetherian if it satisfies one of the two equivalent conditions:

  • Its submodules obey the ascending chain condition: there is no infinite sequence of modules {M_1 \subsetneq M_2 \subsetneq \dots}.
  • All submodules of {M} (including {M} itself) are finitely generated.

This generalizes the notion of a Noetherian ring: a Noetherian ring {R} is one for which {R} is Noetherian as an {R}-module.

Ques 8

Check these two conditions are equivalent. (Copy the proof for rings.)

2. The structure theorem

Our structure theorem takes two forms:

Theorem 9 (Structure theorem, invariant form)

Let {R} be a PID and let {M} be any finitely generated {R}-module. Then

\displaystyle  M \cong \bigoplus_{i=1}^m R/s_i

for some {s_i} satisfying {s_1 \mid s_2 \mid \dots \mid s_m}.

Corollary 10 (Structure theorem, primary form)

Let {R} be a PID and let {M} be any finitely generated {R}-module. Then

\displaystyle  M \cong R^{\oplus r} \oplus R/(q_1) \oplus R/(q_2) \oplus \dots \oplus R/(q_m)

where {q_i = p_i^{e_i}} for some prime element {p_i} and integer {e_i \ge 1}.

Proof: Factor each {s_i} into prime factors (since {R} is a UFD), then use the Chinese remainder theorem. \Box

Remark 11

In both theorems the decomposition is unique up to permutations of the summands; good to know, but I won’t prove this.

3. Reduction to maps of free {R}-modules

The proof of the structure theorem proceeds in two main steps. First, we reduce the problem to a linear algebra problem involving free {R}-modules {R^{\oplus d}}. Once that’s done, we just have to play with matrices; this is done in the next section.

Suppose {M} is finitely generated by {d} elements. Then there is a surjective map of {R}-modules

\displaystyle  R^{\oplus d} \twoheadrightarrow M

whose image on the basis of {R^{\oplus d}} are the generators of {M}. Let {K} denote the kernel.

We claim that {K} is finitely generated as well. To this end we prove that

Lemma 12 (Direct sum of Noetherian modules is Noetherian)

Let {M} and {N} be two Noetherian {R}-modules. Then the direct sum {M \oplus N} is also a Noetherian {R}-module.

Proof: It suffices to show that if {L \subseteq M \oplus N}, then {L} is finitely generated. It’s unfortunately not true that {L = P \oplus Q} (take {M = N = \mathbb Z} {L = \{(n,n) \mid n \in \mathbb Z\}}) so we will have to be more careful.

Consider the submodules

\displaystyle  \begin{aligned} A &= \left\{ x \in M \mid (x,0) \in L \right\} \subseteq M \\ B &= \left\{ y \in N \mid \exists x \in M : (x,y) \in L \right\} \subseteq N. \end{aligned}

(Note the asymmetry for {A} and {B}: the proof doesn’t work otherwise.) Then {A} is finitely generated by {a_1}, \dots, {a_k}, and {B} is finitely generated by {b_1}, \dots, {b_\ell}. Let {x_i = (a_i, 0)} and let {y_i = (\ast, b_i)} be elements of {L} (where the {\ast}‘s are arbitrary things we don’t care about). Then {x_i} and {y_i} together generate {L}. \Box

Ques 13

Deduce that for {R} a PID, {R^{\oplus d}} is Noetherian.

Hence {K \subseteq R^{\oplus d}} is finitely generated as claimed. So we can find another surjective map {R^{\oplus f} \twoheadrightarrow K}. Consequently, we have a composition

\displaystyle  R^{\oplus f} \twoheadrightarrow K \hookrightarrow R^{\oplus d} \twoheadrightarrow M.

Observe that {M} is the cokernel of the composition {T : R^{\oplus f} \rightarrow R^{\oplus d}}, i.e. we have that

\displaystyle  M \cong R^{\oplus d} / \text{im }(T).

So it suffices to understand the map {T} well.

4. Smith normal form

The idea is now that we have reduced our problem to studying linear maps {T : R^{\oplus m} \rightarrow R^{\oplus n}}, which can be thought of as a generic matrix

\displaystyle  T = \begin{pmatrix} a_{11} & \dots & a_{1m} \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nm} \end{pmatrix}

for the standard basis {e_1}, \dots, {e_m} of {R^{\oplus m}} and {f_1}, \dots, {f_n} of {N}.

Of course, as you might expect it ought to be possible to change the given basis of {T} such that {T} has a nicer matrix form. We already saw this in Jordan form, where we had a map {T : V \rightarrow V} and changed the basis so that {T} was “almost diagonal”. This time, we have two sets of bases we can change, so we would hope to get a diagonal basis, or even better.

Before proceeding let’s think about how we might edit the matrix: what operations are permitted? Here are some examples:

  • Swapping rows and columns, which just corresponds to re-ordering the basis.
  • Adding a multiple of a column to another column. For example, if we add {3} times the first column to the second column, this is equivalent to replacing the basis

    \displaystyle  (e_1, e_2, e_3, \dots, e_m) \mapsto (e_1, e_2+3e_1, e_3, \dots, e_m).

  • Adding a multiple of a row to another row. One can see that adding {3} times the first row to the second row is equivalent to replacing the basis

    \displaystyle  (f_1, f_2, f_3, \dots, f_n) \mapsto (f_1-3f_2, f_2, f_3, \dots, f_n).

More generally, If {A} is an invertible {n \times n} matrix we can replace {T} with {AT}. This corresponds to replacing

\displaystyle  (f_1, \dots, f_n) \mapsto (A(f_1), \dots, A(f_n))

(the “invertible” condition just guarantees the latter is a basis). Of course similarly we can replace {X} with {XB} where {B} is an invertible {m \times m} matrix; this corresponds to

\displaystyle  (e_1, \dots, e_m) \mapsto (B^{-1}(e_1), \dots, B^{-1}(e_m))

Armed with this knowledge, we can now approach the following result.

Theorem 14 (Smith normal form)

Let {R} be a PID. Let {M = R^{\oplus m}} and {N = R^{\oplus n}} be free {R}-modules and let {T : M \rightarrow N} be a linear map. Set {k = \min(m,n)}.

Then we can select a pair of new bases for {M} and {N} such that {T} has only diagonal entries {s_1}, {s_2}, \dots, {s_k} and {s_1 \mid s_2 \mid \dots \mid s_k}.

So if {m > n}, the matrix should take the form

\displaystyle  \begin{pmatrix} s_1 & 0 & 0 & 0 & \dots & 0 \\ 0 & s_2 & 0 & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots & \dots & \vdots \\ 0 & 0 & 0 & s_n & \dots & 0 \end{pmatrix}.

and similarly when {m \le n}.

Ques 15

Show that Smith normal form implies the structure theorem.

Remark 16

Note that this is not a generalization of Jordan form.

  • In Jordan form we consider maps {T : V \rightarrow V}; note that the source and target space are the same, and we are considering one basis for the space {V}.
  • In Smith form the maps {T : M \rightarrow N} are between different modules, and we pick two sets of bases (one for {M} and one for {N}).

Example 17 (Example of Smith normal form)

To give a flavor of the idea of the proof, let’s work through a concrete example with the following matrix with entries from {\mathbb Z}:

\displaystyle  \begin{pmatrix} 18 & 38 & 48 \\ 14 & 30 & 38 \end{pmatrix}.

The GCD of all the entries is {2}, and so motivated by this, we perform the Euclidean algorithm on the left column: subtract the second row from the first row, then three times the first row from the second:

\displaystyle  \begin{pmatrix} 18 & 38 & 48 \\ 14 & 30 & 38 \end{pmatrix} \mapsto \begin{pmatrix} 4 & 8 & 10 \\ 14 & 30 & 38 \end{pmatrix} \mapsto \begin{pmatrix} 4 & 8 & 10 \\ 2 & 6 & 2 \end{pmatrix}.

Now that the GCD of {2} is present, we move it to the upper-left by switching the two rows, and then kill off all the entries in the same row/column; since {2} was the GCD all along, we isolate {2} completely:

\displaystyle  \begin{pmatrix} 4 & 8 & 10 \\ 2 & 6 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 6 & 2 \\ 4 & 8 & 10 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 6 & 2 \\ 0 & -4 & 6 \\ \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 6 \end{pmatrix}.

This reduces the problem to a {1 \times 2} matrix. So we just apply the Euclidean algorithm again there:

\displaystyle  \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 6 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & -4 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & 0 & 2 \end{pmatrix} \mapsto \begin{pmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \end{pmatrix}.

Now all we have to do is generalize this proof to work with any PID. It’s intuitively clear how to do this: the PID condition more or less lets you perform a Euclidean algorithm.

Proof: Begin with a generic matrix

\displaystyle  T = \begin{pmatrix} a_{11} & \dots & a_{1m} \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nm} \end{pmatrix}

We want to show, by a series of operations (gradually changing the given basis) that we can rearrange the matrix into Smith normal form.

Define {\gcd(x,y)} to be any generator of the principal ideal {(x,y)}.

Claim 18 (“Euclidean algorithm”)

If {a} and {b} are entries in the same row or column, we can change bases to replace {a} with {\gcd(a,b)} and {b} with something else.

Proof: We do just the case of columns. By hypothesis, {\gcd(a,b) = xa+yb} for some {x,y \in R}. We must have {(x,y) = (1)} now (we’re in a UFD). So there are {u} and {v} such that {xu + yv = 1}. Then

\displaystyle  \begin{pmatrix} x & y \\ -v & u \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} \gcd(a,b) \\ \text{something} \end{pmatrix}

and the first matrix is invertible (check this!), as desired. \Box
Let {s_1 = (a_{ij})_{i,j}} be the GCD of all entries. Now by repeatedly applying this algorithm, we can cause {s} to appear in the upper left hand corner. Then, we use it to kill off all the entries in the first row and the first column, thus arriving at a matrix

\displaystyle  \begin{pmatrix} s_1 & 0 & 0 & \dots & 0 \\ 0 & a_{22}' & a_{23}' & \dots & a_{2n}' \\ 0 & a_{32}' & a_{33}' & \dots & a_{3n}' \\ \vdots&\vdots&\vdots&\ddots&\vdots \\ 0 & a_{m2}' & a_{m3}' & \dots & a_{mn}' \\ \end{pmatrix}.

Now we repeat the same procedure with this lower-right {(m-1) \times (n-1)} matrix, and so on. This gives the Smith normal form. \Box

With the Smith normal form, we have in the original situation that

\displaystyle  M \cong R^{\oplus d} / \text{im } T

and applying the theorem to {T} completes the proof of the structure theorem.

5. Applications

Now, we can apply our structure theorem! I’ll just sketch proofs of these and let the reader fill in details.

Corollary 19 (Finite-dimensional vector spaces are all isomorphic)

A vector space {V} over a field {k} has a finite spanning set of vectors. Then for some {n}, {V \cong k^{\oplus n}}.

Proof: In the structure theorem, {k / (s_i) \in \{0,k\}}. \Box

Corollary 20 (Frobenius normal form)

Let {T : V \rightarrow V} where {V} is a finite-dimensional vector space over an arbitrary field {k} (not necessarily algebraically closed). Then one can write {T} as a block-diagonal matrix whose blocks are all of the form

\displaystyle  \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & \ast \\ 1 & 0 & 0 & \dots & 0 & \ast \\ 0 & 1 & 0 & \dots & 0 & \ast \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots \\ 0 & 0 & 0 & \dots & 1 & \ast \\ \end{pmatrix}.

Proof: View {V} as a {k[x]}-module with action {x \cdot v = T(v)}. By theorem {V \cong \bigoplus_i k[x] / (s_i)} for some polynomials {s_i}, where {s_1 \mid s_2 \mid s_3 \mid \dots}. Write each block in the form described. \Box

Corollary 21 (Jordan normal form)

Let {T : V \rightarrow V} where {V} is a finite-dimensional vector space over an arbitrary field {k} which is algebraically closed. Prove that {T} can be written in Jordan form.

Proof: We now use the structure theorem in its primary form. Since {k[x]} is algebraically closed each {p_i} is a linear factor, so every summand looks like {k[x] / (x-a)^m} for some {m}. \Box

This is a draft of Chapter 15 of the Napkin.

Models of ZFC

Model theory is really meta, so you will have to pay attention here.

Roughly, a “model of {\mathsf{ZFC}}” is a set with a binary relation that satisfies the {\mathsf{ZFC}} axioms, just as a group is a set with a binary operation that satisfies the group axioms. Unfortunately, unlike with groups, it is very hard for me to give interesting examples of models, for the simple reason that we are literally trying to model the entire universe.

1. Models

Prototypical example for this section: {(\omega, \in)} obeys {\mathrm{PowerSet}}, {V_\kappa} is a model for {\kappa} inaccessible (later).

Definition 1 A model {\mathscr M} consists of a set {M} and a binary relation {E \subseteq M \times M}. (The {E} relation is the “{\in}” for the model.)

Remark 2 I’m only considering set-sized models where {M} is a set. Experts may be aware that I can actually play with {M} being a class, but that would require too much care for now.

If you have a model, you can ask certain things about it. For example, you can ask “does it satisfy {\mathrm{EmptySet}}?”. Let me give you an example of what I mean, and then make it rigorous.

Example 3 (A Stupid Model) Let’s take {\mathscr M = (M,E) = \left( \omega, \in \right)}. This is not a very good model of {\mathsf{ZFC}}, but let’s see if we can make sense of some of the first few axioms.

  1. {\mathscr M} satisfies {\mathrm{Extensionality}}, which is the sentence

    \displaystyle \forall x \forall y \forall a : \left( a \in x \iff a \in y \right) \implies x = y.

    This just follows from the fact that {E} is actually {\in}.

     

  2. {\mathscr M} satisfies {\mathrm{EmptySet}}, which is the sentence

    \displaystyle \exists a : \forall x \; \neg (x \in a).

    Namely, take {a = \varnothing \in \omega}.

  3. {\mathscr M} does not satisfy {\mathrm{Pairing}}, since {\{1,3\}} is not in {\omega}, even though {1, 3 \in \omega}
  4. Miraculously, {\mathscr M} satisfies {\mathrm{Union}}, since for any {n \in \omega}, {\cup n} is {n-1} (unless {n=0}). The Union axiom statements that

    \displaystyle \forall a \exists z \quad \forall x \; (x \in z) \iff (\exists y : x \in y \in z).

    An important thing to notice is that the “{\forall a}” ranges only over the sets in the model of the universe, {\mathscr M}.

Example 4 (Important: This Stupid Model Satisfies {\mathrm{PowerSet}}) Most incredibly of all: {\mathscr M = (\omega, \in)} satisfies {\mathrm{PowerSet}}. This is a really important example. You might think this is ridiculous. Look at {2 = \{0,1\}}. The power set of this is {\{0, 1, 2, \{1\}\}} which is not in the model, right?

Well, let’s look more closely at {\mathrm{PowerSet}}. It states that:

\displaystyle \forall x \exists a \forall y (y \in a \iff y \subseteq x).

What happens if we set {x = 2 = \{0,1\}}? Well, actually, we claim that {a = 3 = \{0,1,2\}} works. The key point is “for all {y}” — this only ranges over the objects in {\mathscr M}. In {\mathscr M}, the only subsets of {2} are {0 = \varnothing}, {1 = \{0\}} and {2 = \{0,1\}}. The “set” {\{1\}} in the “real world” (in {V}) is not a set in the model {\mathscr M}.

In particular, you might say that in this strange new world, we have {2^n = n+1}, since {n = \{0,1,\dots,n-1\}} really does have only {n+1} subsets.

Example 5 (Sentences with Parameters) The sentences we ask of our model are allowed to have “parameters” as well. For example, if {\mathscr M = (\omega, \in)} as before then {\mathscr M} satisfies the sentence

\displaystyle \forall x \in 3 (x \in 5).

2. Sentences and Satisfaction

With this intuitive notion, we can define what it means for a model to satisfy a sentence.

Definition 6 Note that any sentence {\phi} can be written in one of the following five forms:

  • {x \in y}
  • {x = y}
  • {\neg \psi} (“not {\psi}”) for some shorter sentence {\psi}
  • {\psi_1 \lor \psi_2} (“`{\psi_1} or {\psi_2}”) for some shorter sentences {\psi_1}, {\psi_1}
  • {\exists x \psi} (“exists {x}”) for some shorter sentence {\psi}.

Ques 7 What happened to {\land} (and) and {\forall} (for all)? (Hint: use {\neg}.)

Often (almost always, actually) we will proceed by so-called “induction on formula complexity”, meaning that we define or prove something by induction using this. Note that we require all formulas to be finite.

Now suppose we have a sentence {\phi}, like {a = b} or {\exists a \forall x \neg (x \in a)}, plus a model {\mathscr M = (M,E)}. We want to ask whether {\mathscr M} satisfies {\phi}.

To give meaning to this, we have to designate certain variables as parameters. For example, if I asked you “Does {a=b}?” the first question you would ask is what {a} and {b} are. So {a}, {b} would be parameters: I have to give them values for this sentence to make sense.

On the other hand, if I asked you “Does {\exists a \forall x \neg (x \in a)}?” then you would just say “yes”. In this case, {x} and {a} are not parameters. In general, parameters are those variables whose meaning is not given by some {\forall} or {\exists}.

In what follows, we will let {\phi(x_1, \dots, x_n)} denote a formula {\phi}, whose parameters are {x_1}, \dots, {x_n}. Note that possibly {n=0}, for example all {\mathsf{ZFC}} axioms have no parameters.

Ques 8 Try to guess the definition of satisfaction before reading it below. (It’s not very hard to guess!)

Definition 9 Let {\mathscr M=(M,E)} be a model. Let {\phi(x_1, \dots, x_n)} be a sentence, and let {b_1, \dots, b_n \in M}. We will define a relation

\displaystyle \mathscr M \vDash \phi[b_1, \dots, b_n]

and say {\mathscr M} satisfies the sentence {\phi} with parameters {b_1, \dots, b_n}.

The relationship is defined by induction on formula complexity as follows:

  • If {\phi} is “{x_1=x_2}” then {\mathscr M \vDash \phi[b_1, b_2] \iff b_1 = b_2}.
  • If {\phi} is “{x_1\in x_2}” then {\mathscr M \vDash \phi[b_1, b_2] \iff b_1 \; E \; b_2}.
    (This is what we mean by “{E} interprets {\in}”.)
  • If {\phi} is “{\neg \psi}” then {\mathscr M \vDash \phi[b_1, \dots, b_n] \iff \mathscr M \not\vDash \phi[b_1, \dots, b_n]}.
  • If {\phi} is “{\psi_1 \lor \psi_2}” then {\mathscr M \vDash \phi[b_1, \dots, b_n]} means {\mathscr M \vDash \psi_i[b_1, \dots, b_n]} for some {i=1,2}.
  • Most important case: suppose {\phi} is {\exists x \psi(x,x_1, \dots, x_n)}. Then {\mathscr M \vDash \phi[b_1, \dots, b_n]} if and only if

    \displaystyle \exists b \in M \text{ such that } \mathscr M \vDash \psi[b, b_1, \dots, b_n].

    Note that {\psi} has one extra parameter.

Notice where the information of the model actually gets used. We only ever use {E} in interpreting {x_1 \in x_2}; unsurprising. But we only ever use the set {M} when we are running over {\exists} (and hence {\forall}). That’s well-worth keeping in mind: The behavior of a model essentially comes from {\exists} and {\forall}, which search through the entire model {M}.

And finally,

Definition 10 A model of {\mathsf{ZFC}} is a model {\mathscr M = (M,E)} satisfying all {\mathsf{ZFC}} axioms.

We are especially interested in models of the form {(M, \in)}, where {M} is a transitive set. (We want our universe to be transitive, otherwise we would have elements of sets which are not themselves in the universe, which is very strange.) Such a model is called a transitive model. If {M} is a transitive set, the model {(M, \in)} will be abbreviated to just {M}.

Definition 11 An inner model of {\mathsf{ZFC}} is a transitive model satisfying {\mathsf{ZFC}}.

3. The Levy Hierarchy

Prototypical example for this section: {\mathtt{isSubset}(x,y)} is absolute. The axiom {\mathrm{EmptySet}} is {\Sigma_1}, {\mathtt{isPowerSetOf}(X,x)} is {\Pi_1}.

A key point to remember is that the behavior of a model is largely determined by {\exists} and {\forall}. It turns out we can say even more than this.

Consider a formula such as

\displaystyle \mathtt{isEmpty}(x) : \neg \exists a (a \in x)

which checks whether a given set {x} has a nonempty element. Technically, this has an “{\exists}” in it. But somehow this {\exists} does not really search over the entire model, because it is bounded to search in {x}. That is, we might informally rewrite this as

\displaystyle \neg (\exists x \in a)

which doesn’t fit into the strict form, but points out that we are only looking over {a \in x}. We call such a quantifier a bounded quantifier.

We like sentences with bounded quantifiers because they designate properties which are absolute over transitive models. It doesn’t matter how strange your surrounding model {M} is. As long as {M} is transitive,

\displaystyle M \vDash \mathtt{isEmpty}(\varnothing)

will always hold. Similarly, the sentence

\displaystyle \mathtt{isSubset}(x,y) : x \subseteq y \text { i.e. } \forall a \in x (a \in y).

Sentences with this property are called {\Sigma_0} or {\Pi_0}.

The situation is different with a sentence like

\displaystyle \mathtt{isPowerSetOf}(y,x) : \forall z \left( z \subseteq x \iff z \in y \right)

which in English means “{y} is the power set of {x}”, or just {y = \mathcal P(x)}. The {\forall z} is not bounded here. This weirdness is what allows things like

\displaystyle \omega \vDash \text{``} \{0,1,2\} \text{ is the power set of }\{0,1\}\text{''}

and hence

\displaystyle \omega \vDash \mathrm{PowerSet}

which was our stupid example earlier. The sentence {\mathtt{isPowerSetOf}} consists of an unbounded {\forall} followed by an absolute sentence, so we say it is {\Pi_1}.

More generally, the Levy hierarchy keeps track of how bounded our quantifiers are. Specifically,

  • Formulas which have only bounded quantifiers are {\Delta_0 = \Sigma_0 = \Pi_0}.
  • Formulas of the form {\exists x_1 \dots \exists x_k \psi} where {\psi} is {\Pi_n} are consider {\Sigma_{n+1}}.
  • Formulas of the form {\forall x_1 \dots \forall x_k \psi} where {\psi} is {\Sigma_n} are consider {\Pi_{n+1}}.

(A formula which is both {\Sigma_n} and {\Pi_n} is called {\Delta_n}, but we won’t use this except for {n=0}.)

Example 12 (Examples of {\Delta_0} Sentences) {\empty}

  1. The sentences {\mathtt{isEmpty}(x)}, {x \subseteq y}, as discussed above.
  2. The formula “{x} is transitive” can be expanded as a {\Delta_0} sentence.
  3. The formula “{x} is an ordinal” can be expanded as a {\Delta_0} sentence.

Exercise 13 Write out the expansions for “{x} is transitive” and “{x} is ordinal” in a {\Delta_0} form.

Example 14 (More Complex Formulas) {\empty}

  1. The axiom {\mathrm{EmptySet}} is {\Sigma_1}; it is {\exists a (\mathtt{isEmpty}(a))}, and {\mathtt{isEmpty}(a)} is {\Delta_0}.
  2. The formula “{y = \mathcal P(x)}” is {\Pi_1}, as discussed above.
  3. The formula “{x} is countable” is {\Sigma_1}. One way to phrase it is “{\exists f} an injective map {x \hookrightarrow \omega}”, which necessarily has an unbounded “{\exists f}”.
  4. The axiom {\mathrm{PowerSet}} is {\Pi_3}:

    \displaystyle \forall y \exists P \forall x (x\subseteq y \iff x \in P).

4. Substructures, and Tarski-Vaught

Let {\mathscr M_1 = (M_1, E_1)} and {\mathscr M_2 = (M_2, E_2)} be models.

Definition 15 We say that {\mathscr M_1 \subseteq \mathscr M_2} if {M_1 \subseteq M_2} and {E_1} agrees with {E_2}; we say {\mathscr M_1} is a substructure of {\mathscr M_2}.

That’s boring. The good part is:

Definition 16 We say {\mathscr M_1 \prec \mathscr M_2}, or {\mathscr M_1} is an elementary substructure of {\mathscr M_2}, if for every sentence {\phi(x_1, \dots, x_n)} and parameters {b_1, \dots, b_n \in M_1}, we have

\displaystyle \mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n].

In other words, {\mathscr M_1} and {\mathscr M_2} agree on every sentence possible. Note that the {b_i} have to come from {M_1}; if the {b_i} came from {\mathscr M_2} then asking something of {\mathscr M_1} wouldn’t make sense.

Let’s ask now: how would {\mathscr M_1 \prec \mathscr M_2} fail to be true? If we look at the possibly sentences, none of the atomic formulas, nor the “{\land}” and “{\neg}”, are going to cause issues.

The intuition you should be getting by now is that things go wrong once we hit {\forall} and {\exists}. They won’t go wrong for bounded quantifiers. But unbounded quantifiers search the entire model, and that’s where things go wrong.

To give a “concrete example”: imagine {\mathscr M_1} is MIT, and {\mathscr M_2} is the state of Massachusetts. If {\mathscr M_1} thinks there exist hackers at MIT, certainly there exist hackers in Massachusetts. Where things go wrong is something like:

\displaystyle \mathscr M_2 \vDash \text{``} \exists x : x \text{ is a course numbered }> 50\text{''}.

This is true for {\mathscr M_2} because we can take the witness {x = \text{Math 55}}, say. But it’s false for {\mathscr M_1}, because at MIT all courses are numbered {18.701} or something similar. The issue is that the witness for statements in {\mathscr M_2} do not necessarily propagate up down to witnesses for {\mathscr M_1}, even though they do from {\mathscr M_1} to {\mathscr M_2}.

The Tarski-Vaught test says this is the only impediment: if every witness in {\mathscr M_2} can be replaced by one in {\mathscr M_1} then {\mathscr M_1 \prec \mathscr M_2}.

Lemma 17 (Tarski-Vaught) Let {\mathscr M_1 \subseteq \mathscr M_2}. Then {\mathscr M_1 \prec \mathscr M_2} if and only if for every sentence {\phi(x, x_1, \dots, x_n)} and parameters {b_1, \dots, b_n \in M_1}: if there is a witness {\tilde b \in M_2} to {\mathscr M_2 \vDash \phi(\tilde b, b_1 \dots, b_n)} then there is a witness {b \in M_1} to {\mathscr M_1 \vDash \phi(b, b_1, \dots, b_n)}.

Proof: Easy after the above discussion. To formalize it, use induction on formula complexity. \Box

5. Obtaining the Axioms of {\mathsf{ZFC}}

Extending the above ideas, one can obtain without much difficulty the following. The idea is that almost all the {\mathsf{ZFC}} axioms are just {\Sigma_1} claims about certain desired sets, and so verifying an axiom reduces to checking some appropriate “closure” condition: that the witness to the axiom is actually in the model.

For example, the {\mathrm{EmptySet}} axiom is “{\exists a (\mathtt{isEmpty}(a))}”, and so we’re happy as long as {\varnothing \in M}, which is of course true for any nonempty transitive set {M}.

Lemma 18 (Transitive Sets Inheriting {\mathsf{ZFC}}) Let {M} be a nonempty transitive set. Then

  1. {M} satisfies {\mathrm{Extensionality}}, {\mathrm{Foundation}}, {\mathrm{EmptySet}}.
  2. {M \vDash \mathrm{Pairing}} if {x,y \in M \implies \{x,y\} \in M}.
  3. {M \vDash \mathrm{Union}} if {x \in M \implies \cup x \in M}.
  4. {M \vDash \mathrm{PowerSet}} if {x \in M \implies \mathcal P(x) \cap M \in M}.
  5. {M \vDash \mathrm{Replacement}} if for every {x \in M} and every function {F : x \rightarrow M} which is {M}-definable with parameters, we have {F``x \in M} as well.
  6. {M \vDash \mathrm{Infinity}} as long as {\omega \in M}.

Here, a set {X \subseteq M} is {M}-definable with parameters if it can be realized as

\displaystyle X = \left\{ x \in M \mid \phi[x, b_1, \dots, b_n] \right\}

for some (fixed) choice of parameters {b_1,\dots,b_n \in M}. We allow {n=0}, in which case we say {X} is {M}-definable without parameters. Note that {X} need not itself be in {M}! As a trivial example, {X = M} is {M}-definable without parameters (just take {\phi[x]} to always be true), and certainly we do not have {X \in M}.

Exercise 19 Verify (i)-(iv) above.

Remark 20 Converses to the statements of Lemma 18 are true for all claims other than (vii).

6. Mostowski Collapse

Up until now I have been only talking about transitive models, because they were easier to think about. Here’s a second, better reason we might only care about transitive models.

Lemma 21 (Mostowski Collapse) Let {\mathscr X = (X,E)} be a model such that {\mathscr X \vDash \mathrm{Extensionality} + \mathrm{Foundation}}. Then there exists an isomorphism {\pi : \mathscr X \rightarrow M} for a transitive model {M = (M,\in)}.

This is also called the transitive collapse. In fact, both {\pi} and {M} are unique.

Proof: The idea behind the proof is very simple. Since {E} is well-founded and extensional, we can look at the {E}-minimal element {x_\varnothing} of {X} with respect to {E}. Clearly, we want to send that to {0 = \varnothing}.

Then we take the next-smallest set under {E}, and send it to {1 = \{\varnothing\}}. We “keep doing this”; it’s not hard to see this does exactly what we want.

To formalize, define {\pi} by transfinite recursion:

\displaystyle \pi(x) \overset{\mathrm{def}}{=} \left\{ \pi(y) \mid y \; E \; x \right\}.

This {\pi}, by construction, does the trick. \Box

The picture of this is quite “collapsing” the elements of {M} down to the bottom of {V}, hence the name.

7. Adding an Inaccessible, Skolem Hulls, and Going Insane

Prototypical example for this section: {V_\kappa}

At this point you might be asking, well, where’s my model of {\mathsf{ZFC}}?

I unfortunately have to admit now: {\mathsf{ZFC}} can never prove that there is a model of {\mathsf{ZFC}} (unless {\mathsf{ZFC}} is inconsistent, but that would be even worse). This is a result called Gödel’s Incompleteness Theorem.

Nonetheless, with some very modest assumptions added, we can actually show that a model does exist: for example, assuming that there exists a strongly inaccessible cardinal {\kappa} would do the trick, it turns out {V_\kappa} will be such a model. Intuitively you can see why: {\kappa} is so big that any set of rank lower than it can’t escape it even if we take their power sets, or any other method that {\mathsf{ZFC}} lets us do.

More pessimistically, this shows that it’s impossible to prove in {\mathsf{ZFC}} that such a {\kappa} exists. Nonetheless, we now proceed under {\mathsf{ZFC}^+} for convenience, which adds the existence of such a {\kappa} as a final axiom. So we now have a model {V_\kappa} to play with. Joy!

Great. Now we do something really crazy.

Theorem 22 (Countable Transitive Model) Assume {\mathsf{ZFC}^+}. Then there exists a transitive model {M} of {\mathsf{ZFC}} such that {M} is a countable set.

Proof: Fasten your seat belts.

Start with the set {X_0 = \varnothing}. Then for every integer {n}, we do the following to get {X_{n+1}}.

  • Start with {X_{n+1}} containing very element of {X_n}.
  • Consider a formula {\phi(x, x_1, \dots, x_n)} and {b_1, \dots, b_n} in {X_n}. Suppose that {M} thinks there is an {b \in M} for which

    \displaystyle M \vDash \phi[b, b_1, \dots, b_n].

    We then add in the element {b} to {X_{n+1}}.

  • We do this for every possible formula in the language of set theory. We also have to put in every possible set of parameters from the previous set {X_n}.

At every step {X_n} is countable. Reason: there are countably many possible finite sets of parameters in {X_n}, and countably many possible formulas, so in total we only ever add in countably many things at each step. This exhibits an infinite nested sequence of countable sets

\displaystyle X_0 \subseteq X_1 \subseteq X_2 \subseteq \dots

None of these is a substructure of {M}, because each {X_n} by relies on witnesses in {X_{n+1}}. So we instead take the union:

\displaystyle X = \bigcup_n X_n.

This satisfies the Tarski-Vaught test, and is countable.

There is one minor caveat: {X} might not be transitive. We don’t care, because we just take its Mostowski collapse. \Box

Please take a moment to admire how insane this is. It hinges irrevocably on the fact that there are countably many sentences we can write down.

Remark 23 This proof relies heavily on the Axiom of Choice when we add in the element {b} to {X_{n+1}}. Without Choice, there is no way of making these decisions all at once.

Usually, the right way to formalize the Axiom of Choice usage is, for every formula {\phi(x, x_1, \dots, x_n)}, to pre-commit (at the very beginning) to a function {f_\phi(x_1, \dots, x_n)}, such that given any {b_1, \dots, b_n} {f_\phi(b_1, \dots, b_n)} will spit out the suitable value of {b} (if one exists). Personally, I think this is hiding the spirit of the proof, but it does make it clear how exactly Choice is being used.

These {f_\phi}‘s have a name: Skolem functions.

The trick we used in the proof works in more general settings:

Theorem 24 (Downward Löwenheim-Skolem Theorem) Let {\mathscr M = (M,E)} be a model, and {A \subseteq M}. Then there exists a set {B} (called the Skolem hull of {A}) with {A \subseteq B \subseteq M}, such that {(B,E) \prec \mathscr M}, and

\displaystyle \left\lvert B \right\rvert < \max \left\{ \omega, \left\lvert A \right\rvert \right\}.

In our case, what we did was simply take {A} to be the empty set.

Ques 25 Prove this. (Exactly the same proof as before.)

8. FAQ’s on Countable Models

The most common one is “how is this possible?”, with runner-up “what just happened”.

Let me do my best to answer the first question. It seems like there are two things running up against each other:

  1. {M} is a transitive model of {\mathsf{ZFC}}, but its universe is uncountable.
  2. {\mathsf{ZFC}} tells us there are uncountable sets!

(This has confused so many people it has a name, Skolem’s paradox.)

The reason this works I actually pointed out earlier: countability is not absolute, it is a {\Sigma_1} notion.

Recall that a set {x} is countable if there exists an injective map {x \hookrightarrow \omega}. The first statement just says that in the universe {V}, there is a injective map {F: M \hookrightarrow \omega}. In particular, for any {x \in M} (hence {x \subseteq M}, since {M} is transitive), {x} is countable in {V}. This is the content of the first statement.

But for {M} to be a model of {\mathsf{ZFC}}, {M} only has to think statements in {\mathsf{ZFC}} are true. More to the point, the fact that {\mathsf{ZFC}} tells us there are uncountable sets means

\displaystyle M \vDash \exists x \text{ uncountable}.

In other words,

\displaystyle M \vDash \exists x \forall f \text{ If } f : x \rightarrow \omega \text{ then } f \text{ isn't injective}.

The key point is the {\forall f} searches only functions in our tiny model {M}. It is true that in the “real world” {V}, there are injective functions {f : x \rightarrow \omega}. But {M} has no idea they exist! It is a brain in a vat: {M} is oblivious to any information outside it.

So in fact, every ordinal which appears in {M} is countable in the real world. It is just not countable in {M}. Since {M \vDash \mathsf{ZFC}}, {M} is going to think there is some smallest uncountable cardinal, say {\aleph_1^M}. It will be the smallest (infinite) ordinal in {M} with the property that there is no bijection in the model {M} between {\aleph_1^M} and {\omega}. However, we necessarily know that such a bijection is going to exist in the real world {V}.

Put another way, cardinalities in {M} can look vastly different from those in the real world, because cardinality is measured by bijections, which I guess is inevitable, but leads to chaos.

9. Picturing Inner Models

Here is a picture of a countable transitive model {M}.

countable-transitive-model

Note that {M} and {V} must agree on finite sets, since every finite set has a formula that can express it. However, past {V_\omega} the model and the true universe start to diverge.

The entire model {M} is countable, so it only occupies a small portion of the universe, below the first uncountable cardinal {\aleph_1^V} (where the superscript means “of the true universe {V}”). The ordinals in {M} are precisely the ordinals of {V} which happen to live inside the model, because the sentence “{\alpha} is an ordinal” is absolute. On the other hand, {M} has only a portion of these ordinals, since it is only a lowly set, and a countable set at that. To denote the ordinals of {M}, we write {\mathrm{On}^M}, where the superscript means “the ordinals as computed in {M}”. Similarly, {\mathrm{On}^V} will now denote the “set of true ordinals”.

Nonetheless, the model {M} has its own version of the first uncountable cardinal {\aleph_1^M}. In the true universe, {\aleph_1^M} is countable (below {\aleph_1^V}), but the necessary bijection witnessing this might not be inside {M}. That’s why {M} can think {\aleph_1^M} is uncountable, even if it is a countable cardinal in the original universe.

So our model {M} is a brain in a vat. It happens to believe all the axioms of {\mathsf{ZFC}}, and so every statement that is true in {M} could conceivably be true in {V} as well. But {M} can’t see the universe around it; it has no idea that what it believes is the uncountable {\aleph_1^M} is really just an ordinary countable cardinal.

10. Exercises

Problem 1 Show that for any transitive model {M}, the set of ordinals in {M} is itself some ordinal.

Problem 2 Assume {\mathscr M_1 \subseteq \mathscr M_2}. Show that

  1. If {\phi} is {\Delta_0}, then {\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}.
  2. If {\phi} is {\Sigma_1}, then {\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}.
  3. If {\phi} is {\Pi_1}, then {\mathscr M_2 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_1 \vDash \phi[b_1, \dots, b_n]}.

Problem 3 (Reflection) Let {\kappa} be an inaccessible cardinal such that {|V_\alpha| < \kappa} for all {\alpha < \kappa}. Prove that for any {\delta < \kappa} there exists {\delta < \alpha < \kappa} such that {V_\alpha \prec V_\kappa}; in other words, the set of {\alpha} such that {V_\alpha \prec V_\kappa} is unbounded in {\kappa}. This means that properties of {V_\kappa} reflect down to properties of {V_\alpha}.

Problem 4 (Inaccessible Cardinal Produce Models) Let {\kappa} be an inaccessible cardinal. Prove that {V_\kappa} is a model of {\mathsf{ZFC}}.

Cardinals

(Standard post on cardinals, as a prerequisite for forthcoming theory model post.)

An ordinal measures a total ordering. However, it does not do a fantastic job at measuring size. For example, there is a bijection between the elements of {\omega} and {\omega+1}:

\displaystyle \begin{array}{rccccccc} \omega+1 = & \{ & \omega & 0 & 1 & 2 & \dots & \} \\ \omega = & \{ & 0 & 1 & 2 & 3 & \dots & \}. \end{array}

In fact, as you likely already know, there is even a bijection between {\omega} and {\omega^2}:

\displaystyle \begin{array}{l|cccccc} + & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & 0 & 1 & 3 & 6 & 10 & \dots \\ \omega & 2 & 4 & 7 & 11 & \dots & \\ \omega \cdot 2 & 5 & 8 & 12 & \dots & & \\ \omega \cdot 3 & 9 & 13 & \dots & & & \\ \omega \cdot 4 & 14 & \dots & & & & \end{array}

So ordinals do not do a good job of keeping track of size. For this, we turn to the notion of a cardinal number.

1. Equinumerous Sets and Cardinals

Definition 1 Two sets {A} and {B} are equinumerous, written {A \approx B}, if there is a bijection between them.

Definition 2 A cardinal is an ordinal {\kappa} such that for no {\alpha < \kappa} do we have {\alpha \approx \kappa}.

Example 3 (Examples of Cardinals) Every finite number is a cardinal. Moreover, {\omega} is a cardinal. However, {\omega+1}, {\omega^2}, {\omega^{2015}} are not, because they are countable.

Example 4 ({\omega^\omega} is Countable) Even {\omega^\omega} is not a cardinal, since it is a countable union

\displaystyle \omega^\omega = \bigcup_n \omega^n

and each {\omega^n} is countable.

Ques 5 Why must an infinite cardinal be a limit ordinal?

Remark 6 There is something fishy about the definition of a cardinal: it relies on an external function {f}. That is, to verify {\kappa} is a cardinal I can’t just look at {\kappa} itself; I need to examine the entire universe {V} to make sure there does not exist a bijection {f : \kappa \rightarrow \alpha} for {\alpha < \kappa}. For now this is no issue, but later in model theory this will lead to some highly counterintuitive behavior.

2. Cardinalities

Now that we have defined a cardinal, we can discuss the size of a set by linking it to a cardinal.

Definition 7 The cardinality of a set {X} is the least ordinal {\kappa} such that {X \approx \kappa}. We denote it by {\left\lvert X \right\rvert}.

Ques 8 Why must {\left\lvert X \right\rvert} be a cardinal?

Remark 9 One needs the Well-Ordering Theorem (equivalently, Choice) in order to establish that such an ordinal {\kappa} actually exists.

Since cardinals are ordinals, it makes sense to ask whether {\kappa_1 \le \kappa_2}, and so on. Our usual intuition works well here.

Proposition 10 (Restatement of Cardinality Properties) Let {X} and {Y} be sets.

  1. {X \approx Y} if and only {\left\lvert X \right\rvert = \left\lvert Y \right\rvert}, if and only if there is a bijection between {X} and {Y}.
  2. {\left\lvert X \right\rvert \le \left\lvert Y \right\rvert} if and only if there is an injective map {X \hookrightarrow Y}.

Ques 11 Prove this.

3. Aleph Numbers

Prototypical example for this section: {\aleph_0} is {\omega}, and {\aleph_1} is the first uncountable

First, let us check that cardinals can get arbitrarily large:

Proposition 12 We have {\left\lvert X \right\rvert < \left\lvert \mathcal P(X) \right\rvert} for every set {X}.

Proof: There is an injective map {X \hookrightarrow \mathcal P(X)} but there is no injective map {\mathcal P(X) \hookrightarrow X} by Cantor’s diagonal argument. \Box

Thus we can define:

Definition 13 For a cardinal {\kappa}, we define {\kappa^+} to be the least cardinal above {\kappa}, called the successor cardinal.

This {\kappa^+} exists and has {\kappa^+ \le \left\lvert \mathcal P(\kappa) \right\rvert}.

Next, we claim that:

Exercise 14 Show that if {A} is a set of cardinals, then {\cup A} is a cardinal.

Thus by transfinite induction we obtain that:

Definition 15 For any {\alpha \in \mathrm{On}}, we define the aleph numbers as

\displaystyle \begin{aligned} \aleph_0 &= \omega \\ \aleph_{\alpha+1} &= \left( \aleph_\alpha \right)^+ \\ \aleph_{\lambda} &= \bigcup_{\alpha < \lambda} \aleph_\alpha. \end{aligned}

Thus we have the following sequence of cardinals:

\displaystyle 0 < 1 < 2 < \dots < \aleph_0 < \aleph_1 < \dots < \aleph_\omega < \aleph_{\omega+1} < \dots

By definition, {\aleph_0} is the cardinality of the natural numbers, {\aleph_1} is the first uncountable ordinal, \dots.

We claim the aleph numbers constitute all the cardinals:

Lemma 16 (Aleph Numbers Constitute All Infinite Cardinals) If {\kappa} is a cardinal then either {\kappa} is finite (i.e. {\kappa \in \omega}) or {\kappa = \aleph_\alpha} for some {\alpha \in \mathrm{On}}.

Proof: Assume {\kappa} is infinite, and take {\alpha} minimal with {\aleph_\alpha \ge \kappa}. Suppose for contradiction that we have {\aleph_\alpha > \kappa}. We may assume {\alpha > 0}, since the case {\alpha = 0} is trivial.

If {\alpha = \overline{\alpha} + 1} is a successor, then

\displaystyle \aleph_{\overline{\alpha}} < \kappa < \aleph_{\alpha} = (\aleph_{\overline{\alpha}})^+

which contradicts the fact the definition of the successor cardinal. If {\alpha = \lambda} is a limit ordinal, then {\aleph_\lambda} is the supremum {\bigcup_{\gamma < \lambda} \aleph_\gamma}. So there must be some {\gamma < \lambda} has {\aleph_\gamma > \kappa}, which contradicts the minimality of {\alpha}. \Box

Definition 17 An infinite cardinal which is not a successor cardinal is called a limit cardinal. It is exactly those cardinals of the form {\aleph_\lambda}, for {\lambda} a limit ordinal, plus {\aleph_0}.

4. Cardinal Arithmetic

Prototypical example for this section: {\aleph_0 \cdot \aleph_0 = \aleph_0 + \aleph_0 = \aleph_0}

Recall the way we set up ordinal arithmetic. Note that in particular, {\omega + \omega > \omega} and {\omega^2 > \omega}. Since cardinals count size, this property is undesirable, and we want to have

\displaystyle \begin{aligned} \aleph_0 + \aleph_0 &= \aleph_0 \\ \aleph_0 \cdot \aleph_0 &= \aleph_0 \end{aligned}

because {\omega + \omega} and {\omega \cdot \omega} are countable. In the case of cardinals, we simply “ignore order”.

The definition of cardinal arithmetic is as expected:

Definition 18 (Cardinal Arithmetic) Given cardinals {\kappa} and {\mu}, define

\displaystyle \kappa + \mu \overset{\mathrm{def}}{=} \left\lvert \left( \left\{ 0 \right\} \times \kappa \right) \cup \left( \left\{ 1 \right\} \times \mu \right) \right\rvert

and

\displaystyle k \cdot \mu \overset{\mathrm{def}}{=} \left\lvert \mu \times \kappa \right\rvert .

Ques 19 Check this agrees with what you learned in pre-school for finite cardinals.

This is a slight abuse of notation since we are using the same symbols as for ordinal arithmetic, even though the results are different ({\omega \cdot \omega = \omega^2} but {\aleph_0 \cdot \aleph_0 = \aleph_0}). In general, I’ll make it abundantly clear whether I am talking about cardinal arithmetic or ordinal arithmetic. To help combat this confusion, we use separate symbols for ordinals and cardinals. Specifically, {\omega} will always refer to {\{0,1,\dots\}} viewed as an ordinal; {\aleph_0} will always refer to the same set viewed as a cardinal. More generally,

Definition 20 Let {\omega_\alpha = \aleph_\alpha} viewed as an ordinal.

However, as we’ve seen already we have that {\aleph_0 \cdot \aleph_0 = \aleph_0}. In fact, this holds even more generally:

Theorem 21 (Infinite Cardinals Squared) Let {\kappa} be an infinite cardinal. Then {\kappa \cdot \kappa = \kappa}.

Proof: Obviously {\kappa \cdot \kappa \ge \kappa}, so we want to show {\kappa \cdot \kappa \le \kappa}.

The idea is to repeat the same proof that we had for {\aleph_0 \cdot \aleph_0 = \aleph_0}, so we re-iterate it here. We took the “square” of elements of {\aleph_0}, and then re-ordered it according to the diagonal:

\displaystyle \begin{array}{l|cccccc} & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & 0 & 1 & 3 & 6 & 10 & \dots \\ 1 & 2 & 4 & 7 & 11 & \dots & \\ 2 & 5 & 8 & 12 & \dots & & \\ 3 & 9 & 13 & \dots & & & \\ 4 & 14 & \dots & & & & \end{array}

Let’s copy this idea for a general {\kappa}.

We proceed by transfinite induction on {\kappa}. The base case is {\kappa = \aleph_0}, done above. For the inductive step, first we put the “diagonal” ordering {<_{\text{diag}}} on {\kappa \times \kappa} as follows: for {(\alpha_1, \beta_1)} and {(\alpha_1, \beta_2)} in {\kappa \times \kappa} we declare {(\alpha_1, \beta_1) <_{\text{diag}} (\alpha_2, \beta_2)} if

  • {\max \left\{ \alpha_1, \beta_1 \right\} < \max \left\{ \alpha_2, \beta_2 \right\}} (they are on different diagonals), or
  • {\max \left\{ \alpha_1, \beta_1 \right\} = \max \left\{ \alpha_2, \beta_2 \right\}} and {\alpha_1 < \alpha_2} (same diagonal).

Then {<_{\text{diag}}} is a well-ordering of {\kappa \times \kappa}, so we know it is in order-preserving bijection with some ordinal {\gamma}. Our goal is to show that {\left\lvert \gamma \right\rvert \le \kappa}. To do so, it suffices to prove that for any {\overline{\gamma} \in \gamma}, we have {\left\lvert \overline{\gamma} \right\rvert < \kappa}.

Suppose {\overline{\gamma}} corresponds to the point {(\alpha, \beta) \in \kappa} under this bijection. If {\alpha} and {\beta} are both finite, then certainly {\overline{\gamma}} is finite too. Otherwise, let {\overline{\kappa} = \max \{\alpha, \beta\} < \kappa}; then the number of points below {\overline{\gamma}} is at most

\displaystyle \left\lvert \alpha \right\rvert \cdot \left\lvert \beta \right\rvert \le \overline{\kappa} \cdot \overline{\kappa} = \overline{\kappa}

by the inductive hypothesis. So {\left\lvert \overline{\gamma} \right\rvert \le \overline{\kappa} < \kappa} as desired. \Box

From this it follows that cardinal addition and multiplication is really boring:

Theorem 22 (Infinite Cardinal Arithmetic is Trivial) Given cardinals {\kappa} and {\mu}, one of which is infinite, we have

\displaystyle \kappa \cdot \mu = \kappa + \mu = \max\left( \kappa, \mu \right).

Proof: The point is that both of these are less than the square of the maximum. Writing out the details:

\displaystyle \begin{aligned} \max \left( \kappa, \mu \right) &\le \kappa + \mu \\ &\le \kappa \cdot \mu \\ &\le \max \left( \kappa, \mu \right) \cdot \max \left( \kappa, \mu \right) \\ &= \max\left( \kappa, \mu \right). \end{aligned}

\Box

5. Cardinal Exponentiation

Prototypical example for this section: {2^\kappa = \left\lvert \mathcal P(\kappa) \right\rvert}.

Definition 23 Suppose {\kappa} and {\lambda} are cardinals. Then

\displaystyle \kappa^\lambda \overset{\mathrm{def}}{=} \left\lvert \mathscr F(\lambda, \kappa) \right\rvert.

Here {\mathscr F(A,B)} is the set of functions from {A} to {B}.

As before, we are using the same notation for both cardinal and ordinal arithmetic. Sorry!

In particular, {2^\kappa = \left\lvert \mathcal P(\kappa) \right\rvert > \kappa}, and so from now on we can use the notation {2^\kappa} freely. (Note that this is totally different from ordinal arithmetic; there we had {2^\omega = \bigcup_{n\in\omega} 2^n = \omega}. In cardinal arithmetic {2^{\aleph_0} > \aleph_0}.)

I have unfortunately not told you what {2^{\aleph_0}} equals. A natural conjecture is that {2^{\aleph_0} = \aleph_1}; this is called the Continuum Hypothesis. It turns out to that this is undecidable — it is not possible to prove or disprove this from the {\mathsf{ZFC}} axioms.

6. Cofinality

Prototypical example for this section: {\aleph_0}, {\aleph_1}, \dots are all regular, but {\aleph_\omega} has cofinality {\omega}.

Definition 24 Let {\lambda} be a limit ordinal, and {\alpha} another ordinal. A map {f : \alpha \rightarrow \lambda} of ordinals is called cofinal if for every {\overline{\lambda} < \lambda}, there is some {\overline{\alpha} \in \alpha} such that {f(\overline{\alpha}) \ge \overline{\lambda}}. In other words, the map reaches arbitrarily high into {\lambda}.

Example 25 (Example of a Cofinal Map) {\empty}

  1. The map {\omega \rightarrow \omega^\omega} by {n \mapsto \omega^n} is cofinal.
  2. For any ordinal {\alpha}, the identity map {\alpha \rightarrow \alpha} is cofinal.

Definition 26 Let {\lambda} be a limit ordinal. The cofinality of {\lambda}, denoted {\text{cof }(\lambda)}, is the smallest ordinal {\alpha} such that there is a cofinal map {\alpha \rightarrow \lambda}.

Ques 27 Why must {\alpha} be an infinite cardinal?

Usually, we are interested in taking the cofinality of a cardinal {\kappa}.

Pictorially, you can imagine standing at the bottom of the universe and looking up the chain of ordinals to {\kappa}. You have a machine gun and are firing bullets upwards, and you want to get arbitrarily high but less than {\kappa}. The cofinality is then the number of bullets you need to do this.

We now observe that “most” of the time, the cofinality of a cardinal is itself. Such a cardinal is called regular.

Example 28 ({\aleph_0} is Regular) {\text{cof }(\aleph_0) = \aleph_0}, because no finite subset of {\omega} can reach arbitrarily high.

Example 29 ({\aleph_1} is Regular) {\text{cof }(\aleph_1) = \aleph_1}. Indeed, assume for contradiction that some countable set of ordinals {A = \{ \alpha_0, \alpha_1, \dots \} \subseteq \omega_1} reaches arbitrarily high inside {\omega_1}. Then {\Lambda = \cup A} is a countable ordinal, because it is a countable union of countable ordinals. In other words {\Lambda \in \omega_1}. But {\Lambda} is an upper bound for {A}, contradiction.

On the other hand, there are cardinals which are not regular; since these are the “rare” cases we call them singular.

Example 30 ({\aleph_\omega} is Not Regular) Notice that {\aleph_0 < \aleph_1 < \aleph_2 < \dots} reaches arbitrarily high in {\aleph_\omega}, despite only having {\aleph_0} terms. It follows that {\text{cof }(\aleph_\omega) = \aleph_0}.

We now confirm a suspicion you may have:

Theorem 31 (Successor Cardinals Are Regular) If {\kappa = \overline{\kappa}^+} is a successor cardinal, then it is regular.

Proof: We copy the proof that {\aleph_1} was regular.

Assume for contradiction that for some {\mu \le \overline{\kappa}}, there are {\mu} sets reaching arbitrarily high in {\kappa} as a cardinal. Observe that each of these sets must have cardinality at most {\overline{\kappa}}. We take the union of all {\mu} sets, which gives an ordinal {\Lambda} serving as an upper bound.

The number of elements in the union is at most

\displaystyle \#\text{sets} \cdot \#\text{elms} \le \mu \cdot \overline{\kappa} = \overline{\kappa}

and hence {\left\lvert \Lambda \right\rvert \le \overline{\kappa} < \kappa}. \Box

7. Inaccessible Cardinals

So, what about limit cardinals? It seems to be that most of them are singular: if {\aleph_\lambda \ne \aleph_0} is a limit ordinal, then the sequence {\{\aleph_\alpha\}_{\alpha \in \lambda}} (of length {\lambda}) is certainly cofinal.

Example 32 (Beth Fixed Point) Consider the monstrous cardinal

\displaystyle \kappa = \aleph_{\aleph_{\aleph_{\ddots}}}.

This might look frighteningly huge, as {\kappa = \aleph_\kappa}, but its cofinality is {\omega} as it is the limit of the sequence

\displaystyle \aleph_0, \aleph_{\aleph_0}, \aleph_{\aleph_{\aleph_0}}, \dots

More generally, one can in fact prove that

\displaystyle \text{cof }(\aleph_\lambda) = \text{cof }(\lambda).

But it is actually conceivable that {\lambda} is so large that {\left\lvert \lambda \right\rvert = \left\lvert \aleph_\lambda \right\rvert}.

A regular limit cardinal other than {\aleph_0} has a special name: it is weakly inaccessible. Such cardinals are so large that it is impossible to prove or disprove their existence in {\mathsf{ZFC}}. It is the first of many so-called “large cardinals”.

An infinite cardinal {\kappa} is a strong limit cardinal if

\displaystyle \forall \overline{\kappa} < \kappa \quad 2^{\overline{\kappa}} < \kappa

for any cardinal {\overline{\kappa}}. For example, {\aleph_0} is a strong limit cardinal.

Ques 33 Why must strong limit cardinals actually be limit cardinals? (This is offensively easy.)

A regular strong limit cardinal other than {\aleph_0} is called strongly inaccessible.

8. Exercises

Problem 1 Compute {\left\lvert V_\omega \right\rvert}.

Problem 2 Prove that for any limit ordinal {\alpha}, {\text{cof }(\alpha)} is a regular cardinal.

Sproblem 3 (Strongly Inaccessible Cardinals) Show that for any strongly inaccessible {\kappa}, we have {\left\lvert V_\kappa \right\rvert = \kappa}.

Problem 4 (Konig’s Theorem) Show that

\displaystyle \kappa^{\text{cof }(\kappa)} > \kappa

for every infinite cardinal {\kappa}.

(This post is a draft of a chapter from my Napkin project.)