Set Theory, Part 1: An Intro to ZFC

Back in high school, I sometimes wondered what all the big deal about ZFC and the Axiom of Choice was, but I never really understood what I read in the corresponding Wikipedia page. In this post, I’ll try to explain what axiomatic set theory is trying to do in a way accessible to those with just a high school background.

1. Motivation

What we’re going to try to lay out something like a “machine code” for math: a way of making math completely rigorous, to the point where it can be verified by a machine. This would make sure that our foundation on which we do our high-level theorem proving is sound. As we’ll see in just a moment, this is actually a lot harder to do than it sounds — there are some traps if we try to play too loosely with our definitions.

First of all, since this is a set theory post, the first thing we need to do is define exactly what a set is. Well, we know what a set is, right? It’s just a collection of objects, and two sets are the same if they have the same objects. For example, we have our empty set ${\varnothing}$ that contains no objects. We can have a set that ${\{1, 2, 3\}}$ , or maybe the set of natural numbers ${\mathbb N = \{0, 1, 2, \dots \}}$ . (For the purposes of set theory, ${0}$ is usually considered a natural number.) Sets can even contain other sets, like ${\left\{ \mathbb Z, \mathbb Q, \mathbb N \right\}}$ . Fine and dandy, right?

The trouble is that this definition actually isn’t good enough, and here’s why. If we just say “a set is any collection of objects”, then we can consider a really big set ${S}$ , the set of all sets. So far no problem, right? ${S}$ has the oddity that it has to contain itself ${S \in S}$ , but oh well, no big deal. In fact, for this section, let’s even give a name to these sets — we’ll call a set extraordinary if it happens to contain itself, and ordinary otherwise. (Note that this isn’t actually a math term, it’s just for my convenience.)

Now here comes the problem, called Russell’s Paradox (link here). If I can look at the set of all sets ${S}$ , I can also probably look at the set of all ordinary sets, which I’ll call ${X}$ . Here’s the question: is ${X}$ ordinary?

Well, if ${X}$ is ordinary, then ${X \in X}$ . Hence ${X}$ is extraordinary by definition, which is impossible.
Now suppose ${X}$ is extraordinary. Then ${X}$ is not ordinary, so that implies ${X \notin X}$ . But that contradicts the assumption that ${X}$ is extraordinary!

So that’s all a contradiction! Just by considering the set of all ordinary sets, we’ve run into a trap.

Now if you’re not a set theorist, you could probably just brush this off, saying “oh well, I guess you can’t look at certain sets”. But if you’re a set theorist, this worries you, because you realize it means that you can’t just define a set as “a collection of objects”, because then everything would blow up. Something more is necessary.

2. The Language of Set Theory

We need a way to refer to sets other than “collection of objects”. So here’s what we’re going to do. We’ll start by defining a formal language of set theory, a way of writing logical statements. First of all we can throw in our usual logical operators, like ` ${\forall}$ ‘, for all; ${\exists}$ , exists; ${=}$ , equals; and ${X \implies Y}$ , our “if ${X}$ then ${Y}$ ”. Let’s also add in ${\land}$ , which means “and”, and ${\lor}$ , which means “or”, as well as ${\neg}$ , which means “not”. Now we can write down logical statements.

Since we’re doing set theory, there’s one more operator we’ll add in: the inclusion ${\in}$ . And that’s all we’re going to use (for now).

So how do we express something like “the set ${\{1, 2\}}$ ”? The trick is that we’re not going to actually “construct” any sets, but rather refer to them indirectly, like so:

$\displaystyle \exists S : x \in S \iff \left( (x=1) \lor (x=2) \right).$

“There exists an ${S}$ such that ${x}$ is in ${S}$ if and only if either ${1}$ is in ${S}$ or ${2}$ is in ${S}$ ”. We don’t have to refer to sets as objects in and of themselves anymore — we now have a way to “create” our sets, by writing formulas for exactly what they contain. This is something a machine can parse.

Well, what are going to do with things like ${1}$ and ${2}$ , which are not sets? Here’s the answer: we’re going to make everything into a set. Natural numbers will be sets. Ordered pairs will be sets. Functions will be sets. In later posts, I’ll tell you exactly how we manage to do something like encode ${1}$ as a set. For now, all you need to know is that that sets don’t just hold objects; they hold other sets.

So now it makes sense to talk about whether something is a set or not: ${\exists x}$ means “ ${x}$ is a set”, while ${\nexists x}$ means “ ${x}$ is not a set”. In other words, we’ve rephrased the problem of deciding whether something is a set to whether it exists, which makes it easier to deal with in our formal language. That means that our axiom system had better find some way to let us show a lot of things exist, without letting us prove the following formula:

$\displaystyle \exists X : x \in X \iff x \notin x.$

For if we prove this formula, then we have our “bad” set from Bertrand’s paradox that caused us to go down the rabbit hole in the first place.

3. The Axioms of ZF

What the axioms of ZF is do is tell a computer exactly what it’s allowed to do. Since the axioms want to be perfectly crystal clear, they’re all written in formal symbols. Humans don’t actually think in these formal symbols, unless they’re set theorists and have also gone insane. But the point of these symbols is that there is absolutely no confusion what is true and isn’t true.

It’s worth noting that there are several versions of ZF, but they’re all equivalent to one another. Also, we’ll start representing sets with lowercase letters below.

First, we agreed before that two sets are the same if they share the same elements. We’d like to encode that in an axiom: ${x=y}$ if and only if for every ${a}$ , ${a \in x \iff a \in y}$ . (And now you’re starting to see how convenient it is that everything’s a set!)

This leads us to the first axiom of ZF, called Extensionality.

Axiom 1 (Extensionality) ${\forall x \forall y \forall a : \left( a \in x \iff a \in y \right) \implies x = y}$ .

This is machine code for “if ${x}$ and ${y}$ have the property that ${a \in x \iff a \in y}$ , then ${x = y}$ ”. Now our computer can recognize that two sets are equal if they share exactly the same elements. Great.

Unfortunately, our computer still doesn’t know that there even exist any sets. We have to tell it that. So what’s the first thing we tell it? There exists the empty set.

Axiom 2 (Empty Set Exists) ${\exists a : \forall x \; \neg (x \in a)}$ .

Now you notice that I have ${\neg (x \in a)}$ rather than the ${x \notin a}$ that we’re all used to. That’s going to get tiring very soon, so we’ll make our first “shortcut”: whenever we write ${x \notin a}$ , we really mean ${\neg (x \in a)}$ . This makes it easier for us humans to read the formula, but we’re happy because in principle, we could just expand this shortcut and give the computer something to read.

Now our computer knows the empty set is a set. But does our computer know that there’s only one? In fact, yes! If we have two sets ${a}$ and ${b}$ , and both of them have no elements, then our computer can prove ${a=b}$ by using Extensionality now. So in fact, there’s only one empty set. Now we’ll take another shortcut and give this set a name, “ ${\varnothing}$ ”.

The next three axioms provide us with some various ways of building more sets.

Axiom 3 (Pairing) Given two elements ${x}$ and ${y}$ , there exists a set ${a}$ containing only those two elements. In machine code,

$\displaystyle \forall x \forall y \exists a \quad \forall z, \; z \in a \iff \left( (z=x) \lor (z=y) \right).$

Note that ${x}$ and ${y}$ do not have to be different! What that means is that our machine can now build a set by taking ${x = y = \varnothing}$ . Pairing gives a set ${a}$ such that ${z \in a}$ if and only if ${z = \varnothing}$ . In our human mind, what we’re thinking is “we’ve built the set ${a = \{\varnothing\}}$ !” No need for the computer to know that, though.

We can the apply Pairing again to build some more sets, like ${\{\{\varnothing\}\}}$ , or ${\{\varnothing, \{\varnothing\}\}}$ . Can you see how? And now if I use pairing on those two sets, I get ${\{ \{\varnothing, \{\varnothing\}\}, \{\{\varnothing\}\}\}}$ . Yipee!

At this point I’m going to cheat and use ${0,1,2,\dots}$ for some examples as natural “objects”, since all those braces are getting hard to read. All you need to know at this point is that these guys are secretly sets too, and you’ll have to wait until next time for me to tell you what their elements are.

So Pairing tells you that if I have my hands on ${1}$ , and I have my hands on ${2}$ , then I can get my hands on the set ${\{1,2\}}$ . Cool. Now suppose I told you: ${a = \{1,2\}}$ and ${b = \{3,4\}}$ . Or rather, the computer told you

$\displaystyle \exists a : x \in a \iff \left( (x=1) \lor (x=2) \right)$

and

$\displaystyle \exists b : x \in b \iff \left( (x=3) \lor (x=4) \right).$

Now you want to construct the set ${\{1,2,3,4\}}$ . You might try pairing, but that just gives you the set ${\{ \{1,2\}, \{3,4\} \}}$ . It turns out we’re actually super clumsy: our axioms so far don’t even let us get our hands on ${1}$ and ${2}$ , even though we have a set that contains them!

Maybe this is actually not that surprising. If I write down the formula

$\displaystyle \exists a : x \in a \iff \left( (x=1) \lor (x=2) \lor (x=\text{unicorn}) \right)$

but I know there are no unicorns, then I shouldn’t expect to actually be able to prove that ${\text{unicorn}}$ exists. However, I’d really like to ignore that and get the set ${\{1,2,3,4\}}$ . The axiom that lets us do that is called Union.

Axiom 4 (Union) Given a set ${a}$ , we can create ${\cup a}$ , the union of the elements of ${a}$ . For example, if ${a = \{ \{1,2\}, \{3,4\} \}}$ , then ${z = \{1,2,3,4\}}$ is a set. Formally,

$\displaystyle \forall a \exists z \quad \forall x \; (x \in z) \iff (\exists y : x \in y \in a).$

By repeatedly using Pairing and Union, we can take any finite collection of elements and gather them into one set.

Finally, we’ll also give our computer a way to build a power set. Recall that the power set of a set ${S}$ , which I write ${\mathcal P(S)}$ , consists of all its subsets. For example, in human world, we had ${\mathcal P(\{1,2\}) = \{\varnothing, \{1\}, \{2\}, \{1,2\}\}}$ .

Axiom 5 (Power Set) We can construct ${\mathcal P(x)}$ . Formally,

$\displaystyle \forall x \exists a \forall y (y \in a \iff y \subseteq x)$

where ${y \subseteq x}$ is short for ${\forall z (z \in y \implies z \in x)}$ .

4. Closing

That lays down the first five axioms of ZF.

Though I don’t have the space to show you how to encode the integers here, I’ll give you an idea for some ways we can hack together “encodings” of things as et. For example, what if we want an ordered pair ${(x,y)}$ ? I claim we can encode this as

$\displaystyle (x,y) \overset{\text{def}}{=} \left\{ \{x\}, \{x,y\} \right\}.$

Indeed, you can check that ${(x_1, y_1) = (x_2, y_2)}$ if and only if ${x_1 = x_2}$ and ${y_1 = y_2}$ , which is what we wanted. You can see that this is indeed a set by some applications of Pairing.

Extending this, if we have a relation ${\sim}$ , we can encode it as

$\displaystyle {\sim} \overset{\text{def}}{=} \left\{ (x,y) \mid x \sim y \right\}.$

A function ${f}$ is a special type of relation such that if ${(x,y_1) \in f}$ and ${(x,y_2) \in f}$ , then ${y_1 = y_2}$ . In other words, we represent functions by their “graphs” of ordered pairs ${(x, f(x))}$ .

In the next post, I’ll explain some of the more involved axioms, and I’ll also finally tell you what the elements of ${1}$ and ${2}$ are. Specifically, I’ll be constructing the ordinal numbers.

Thanks to Peter Koellner, Harvard, for his class Math 145a, which taught me this material. My notes for this course can be downloaded from my math website.