On choosing exercises

Finally, if you attempt to read this without working through a significant number of exercises (see §0.0.1), I will come to your house and pummel you with [Gr-EGA] until you beg for mercy. It is important to not just have a vague sense of what is true, but to be able to actually get your hands dirty. As Mark Kisin has said, “You can wave your hands all you want, but it still won’t make you fly.”

— Ravi Vakil, The Rising Sea: Foundations of Algebraic Geometry

When people learn new areas in higher math, they are usually required to do some exercises. I think no one really disputes this: you have to actually do math to make any progress.

However, from the teacher’s side, I want to make the case that there is some art to picking exercises, too. In the process of writing my Napkin as well as taking way too many math classes I began to see some patterns in which exercises or problems I tended to add to the Napkin, or which exercises I found helpful when learning myself. So, I want to explicitly record some of these thoughts here.

1. How not to do it

So in my usual cynicism I’ll start by saying what I think people typically do, and why I don’t think it works well. As far as I can tell, the criteria used in most classes is:

  1. The student is reasonably able to (at least in theory) eventually solve it.
  2. A student with a solid understanding of the material should be able to do it.
  3. (Optional) The result itself is worth knowing.

Both of these criteria are good. My problem is that I don’t think they are sufficient.

To explain why, let me give a concrete example of something that is definitely assigned in many measure theory classes.

Okay example (completion of a measure space). Let {(X, \mathcal A, \mu)} be a measure space. Let {\overline{\mathcal A}} denote all subsets of {X} which are the union of a set in {\mathcal A} and a null set. Show that {\overline{\mathcal A}} is a sigma-algebra there is a unique extension of the measure {\mu} to it.

I can see why it’s tempting to give this as an exercise. It is a very fundamental result that the student should know. The proof is not too difficult, and the student will understand it better if they do it themselves than if they passively read it. And, if a student really understands measures well, they should find the exercise quite straightforward. For this reason I think this is an okay choice.

But I think we can do better.

In many classes I’ve taken, nearly all the exercises looked like this one. I think when you do this, there are a couple blind spots that sometimes get missed:

  • There’s a difference between “things you should be able to do after learning Z well” and “things you should be able to do when first learning Z“. I would argue that the above example is the former category, but not the latter one — if a student is learning about measures for the first time, my first priority would be to make sure they get a good conceptual understanding first, and in particular can understand why the statement should be true. Then we can worry about actually proving it.
  • Assigning an exercise which checks if you understand X is not the same as actually teaching it. Okay exercises can verify if you understand something, great exercises will actively help you understand it.

2. An example that I found enlightening

In contrast, this year I was given an exercise which I thought was so instructive that I’ll post it here. It comes from algebraic geometry.

Exercise: The punctured gyrotop is the open subset {U} of {X = \mathrm{Spec} \mathbb C[x,y,z] / (xy, z)} obtained by deleting the origin {(x,y,z)} from {X}. Compute {\mathcal O_X(U)}.

It was after I did this exercise that I finally felt like I understood why distinguished open sets are so important when defining an affine scheme. For that matter, it finally clicked why sheaves on a base are worth caring about.

I had read lots and lots of words and pushed symbols around all day. I had even proved, on paper already, that {\mathcal O(U \sqcup V) = \mathcal O(U) \times \mathcal O(V)}. But I never really felt it. This exercise changed that for me, because suddenly I had an example in front of me that I could actually see.

3. Some suggested additional criteria

So here are a few suggested guidelines which I think can help pick exercises like that one.

A. They should be as concrete as possible.

This is me yelling at people to use more examples, once again. But I think having students work through examples as an exercise is just as important (if not more) than reading them aloud in lecture.

One other benefit of using concrete examples is that you can avoid the risk of students solving the exercise by “symbol pushing”. I think many of us know the feeling of solving some textbook exercise by just unwinding a definition and doing a manipulation, or black-boxing some theorem and blindly applying it. In this way one ends up with correct but unenlightening proofs. The issue is that nothing written down resonates with System 1, and so the result doesn’t get internalized.

When you give a concrete exercise with a specific group/scheme/whatever, there is much less chance of something like that happening. You almost have to see the example in order to work with it. I really think internalizing theorems and definitions is better done in this concrete way, rather than the more abstract or general manipulations.

B. They should be enjoyable.

Math majors are humans too. If a whole page of exercises looks boring, students are less likely to do them.

This is one place where I think people could really learn from the math contest community. When designing exams like IMO or USAMO, people fight over which problems they think are the prettiest. The nicest and most instructive exam problems are passed down from generation to generation like prized heirlooms. (Conveniently, the problems are even named, e.g. “IMO 2008/3”, which I privately think helps a ton; it gives the problems a name and face. The most enthusiastic students will often be able to recall where a good problem was from if shown the statement again.) Imagine if the average textbook exercises had even a tenth of that enthusiasm put into crafting them.

Incidentally, I think being concrete helps a lot with this. Part of the reason I enjoyed the punctured gyrotop so much was that I could immediately draw a picture of it, and I had a sense that I should be able to compute the answer, even though I wasn’t experienced enough yet to see what it was. So it was as if the exercise was leading me on the whole way.

For an example of how not to do it, here’s what I think my geometry book would look like if done wrong.

C. They should not be too tricky.

People are always dumber than you think when they first learn a subject; things which should be obvious often are not. So difficulty should be used in moderation: if you assign a hard exercise, you should assume by default the student will not solve it, so there better be some reason you’re adding some extra frustration.

I should at this point also mention some advice most people won’t be able to take (because it is so time-consuming): I think it’s valuable to write full solutions for students, especially on difficult problems. When someone is learning something for the first time, that is the most important time for the students to be able to read the full details of solutions, precisely because they are not yet able to do it themselves.

In math contests, the ideal feedback cycle is something like: a student works on a problem P, makes some progress (possibly solving it), then they look at the solution and see what they were missing or where they could have cleaned up their solution or what they could have done differently, et cetera. This lets them update their intuition or toolkit before going on. If you cut out this last step by not providing solutions, you lose the only real chance you had to give feedback to the student.

4. Memorability

I have, on more occasions than I’m willing to admit, run into the following situation. I solve some exercise in a textbook. Sometime later, I am reading about some other result, and I need some intermediate result, which looks like it could be true but I don’t how to prove it immediately. So I look it up, and then find out it was the exercise I did (and then have to re-do the exercise again because I didn’t write up the solution).

I think you can argue that if you don’t even recognize the statement later, you didn’t learn anything from it. So I think the following is a good summarizing test: how likely is the student to actually remember it later?

Circular optimization

This post will mostly be focused on construction-type problems in which you’re asked to construct something satisfying property {P}.

Minor spoilers for USAMO 2011/4, IMO 2014/5.

1. What is a leap of faith?

Usually, a good thing to do whenever you can is to make “safe moves” which are implied by the property {P}. Here’s a simple example.

Example 1 (USAMO 2011)

Find an integer {n} such that the remainder when {2^n} is divided by {n} is odd.

It is easy to see, for example, that {n} itself must be odd for this to be true, and so we can make our life easier without incurring any worries by restricting our search to odd {n}. You might therefore call this an “optimization”: a kind of move that makes the problem easier, essentially for free.

But often times such “safe moves” or not enough to solve the problem, and you have to eventually make “leap-of-faith moves”. For example, maybe in the above problem, we might try to focus our attention on numbers {n = pq} for primes {p} and {q}. This does make our life easier, because we’ve zoomed in on a special type of {n} which is easy to compute. But it runs the risk that maybe there is no such example of {n}, or that the smallest one is difficult to find.

2. Circular reasoning can sometimes save the day

However, a strange type of circular reasoning can sometimes happen, in which a move that would otherwise be a leap-of-faith is actually known to be safe because you also know that the problem statement you are trying to prove is true. I can hardly do better than to give the most famous example:

Example 2 (IMO 2014)

For every positive integer {n}, the Bank of Cape Town issues coins of denomination {\frac 1n}. Given a finite collection of such coins (of not necessarily different denominations) with total value at most {99 + \frac12}, prove that it is possible to split this collection into {100} or fewer groups, such that each group has total value at most {1}.

Let’s say in this problem we find ourselves holding two coins of weight {1/6}. Perhaps we wish to put these coins in the same group, so that we have one less decision to make. However, this could rightly be viewed as a “leap-of-faith”, because there’s no logical reason why the task must remain possible after making this first move.

Except there is a non-logical reason: this is the same as trading the two coins of weight {1/6} for a single coin of weight {1/3}. Why is the task still possible? Because the problem says so: the very problem we are trying to solve includes this case, too. If the problem is going to be true, then it had better be true after we make this trade.

Thus by a perverse circular reasoning we can rest assured that our leap-of-faith here will not come back to bite us. (And in fact, this optimization is a major step of the solution.)

3. More examples of circular optimization

Here’s some more examples of problems you can try that I think have a similar idea.

Problem 1

Prove that in any connected graph {G} on {2004} vertices one can delete some edges to obtain a graph (also with {2004} vertices) whose degrees are all odd.

Problem 2 (USA TST 2017)

In a sports league, each team uses a set of at most {t} signature colors. A set {S} of teams is color-identifiable if one can assign each team in {S} one of their signature colors, such that no team in {S} is assigned any signature color of a different team in {S}. For all positive integers {n} and {t}, determine the maximum integer {g(n,t)} such that: In any sports league with exactly {n} distinct colors present over all teams, one can always find a color-identifiable set of size at least {g(n,t)}.

Feel free to post more examples in the comments.

IMO 2019 Aftermath

Here is my commentary for the 2019 International Math Olympiad, consisting of pictures and some political statements about the problem. Summary This year’s USA delegation consisted of leader Po-Shen Loh and deputy leader Yang Liu. The USA scored 227 points, … Continue reading

Hard and soft techniques

In yet another contest-based post, I want to distinguish between two types of thinking: things that could help you solve a problem, and things that could help you understand the problem better. Then I’ll talk a little about how you can use the latter. (I’ve talked about this in my own classes for a while by now, but only recently realized I’ve never gotten the whole thing in writing. So here goes.)

1. More silly terminology

As usual, to make these things easier to talk about, I’m going to introduce some words to describe these two. Taking a page from martial arts, I’m going to run with hard and soft techniques.

A hard technique is something you try in the hopes it will prove something — ideally, solve the problem, but at least give you some intermediate lemma. Perhaps a better definition is “things that will end up in the actual proof”. Examples include:

  • Angle chasing in geometry, or proving quadrilaterals are cyclic.
  • Throwing complex numbers at a geometry problem.
  • Plugging in some values into a functional equation (which gives more equations to work with).
  • Taking a given Diophantine equation modulo {p} to get some information, or taking {p}-adic evaluations.
  • Trying to perform an induction, for example by deleting an element.
  • Trying to write down an inequality that when summed cyclically gives the desired conclusion.
  • Reducing the problem to one or more equivalent claims.

and so on. I’m sure you can come up with more examples.

In contrast, a soft technique is something you might try to help you understand the problem better — even if it might not prove anything. Perhaps a better definition is “things not written up”. Examples include:

  • Examining particular small cases of the problem.
  • Looking at the equality cases of a min/max problem.
  • Considering variants of the problem (for example, adding or deleting conditions).
  • Coming up with lots of concrete examples and playing with them.
  • Trying to come with a counterexample to the problem’s assertion and seeing what the obstructions are.
  • Drawing pictures, even on non-geometry problems (see JMO2 and JMO5 in my 2019 notes for example).
  • Deciding whether or not a geometry problem is “purely projective”.
  • Counting the algebraic degrees of freedom in a geometry problem.
  • Checking all the linear/polynomial solutions to a functional equation, in order to get a guess what the answer might be.
  • Blindly trying to guess solutions to an algebraic equation.
  • Making up an artificial unnatural function in a functional equation, and then trying to see why it doesn’t work (or occasionally being surprised that it does work).
  • Thinking about why a certain hard technique you tried failed, or even better convincing yourself it cannot work (for example, this Diophantine equation has a solution modulo every prime, so stop trying to one-shot by mods).
  • Giving a heuristic argument that some claim should be true or false (“probably {2^n \bmod n} is odd infinitely often”), or even easy/hard to prove.

and so on. There is some grey area between these two, some of the examples above might be argued to be in the other category (especially in context of specific problems), but hopefully this gives you a sense of what I’m talking about.

If you look at things I wrote back when I was in high school, you’ll see this referred to as “attacking” and “scouting” instead. This is too silly for me now even by my standards, but back then it was because I played a lot of StarCraft: Brood War (I’ve since switched to StarCraft II). The analogy there is pretty self-explanatory: knowing what your opponent is doing is important because your army composition and gameplay decisions should change in reaction to more information.

2. Using soft techniques: an example

Now after all that blabber, here’s the action item for you all: you should try soft techniques when stuck.

When you first start doing a problem, you will often have some good ideas for what to try. (For example: a wild geometry appeared, let’s scout for cyclic quadrilaterals.) Sometimes if you are lucky enough (especially if the problem is easier) this will be enough to topple the problem, and you can move on. But more often what happens is that eventually you run out of steam, and the problem is still standing. When that happens, my advice is to try doing some soft techniques if you haven’t already done so.

Here’s an example that I like to give.

Example 1 (USA TST 2009)

Find all real numbers {x}, {y}, {z} which satisfy

\displaystyle  \begin{aligned} x^3 &= 3x - 12y + 50,\\ y^3 &= 12y + 3z - 2,\\ z^3 &= 27z + 27x. \end{aligned}

A common first thing that people will try to do is add the first two equations, since that will cause the {12y} terms to cancel. This gives a factor of {x+y} in the left and an {x+z} in the right, so then maybe you try to submit that into the {27(x+z)} in the last equation, so you get {z^3 = 9(x^3+y^3-48)}, cool, there’s no more linear terms. Then. . .

Usually this doesn’t end well. You add this and subtract that and in the end all you see is equation after equation, and after a while you realize you’re not getting anywhere.

So we’re stuck now. What to do? I’ll now bring in two of the soft techniques I mentioned earlier:

  1. Let’s imagine the problem had {\mathbb R} replaced with {\mathbb C}. In this new problem, you can imagine solving for {y} in terms of {x} using the first equation, then {z} in terms of {y}, and then finally putting everything into the last equation to find a degree {27} polynomial in {x}. I say “imagine” because wow would that be ugly.

    But here’s the kicker: it’s a polynomial. It should have exactly {27} complex roots, with multiplicity. That’s a lot. Really?

    So here’s a hint you might take: there’s a good reason this is over {\mathbb R} but not {\mathbb C}. Often these kind of things end up being because there’s an inequality going on somewhere, so there will only be a few real solutions even though there might be tons of complex ones.

  2. Okay, but there’s an even more blatant thing we don’t know yet: what is the answer, anyways?

    This was more than a little bit embarrassing. We’re half an hour in to the problem and thoroughly stuck, and we don’t even have a single {(x,y,z)} that works? Maybe it’d be a good idea to fix that, like, right now. In the simplest way possible: guess and check.

    It’s much easier than it sounds, since if you pick a value of {z}, say, then you get {x} from the third equation, {y} from the first, then check whether it fits the second. If we restrict our search to integer values of {z}, then there aren’t so many that are reasonable.

I won’t spoil what the answer {(x,y,z)} is, other than saying there is an integer triple and it’s not hard to find it as I described. Once you have these two meta-considerations, you suddenly have a much better foothold, and it’s not too hard to solve the problem from here (for a USA TST problem anyways).

I pick this example because it really illustrates how hopeless repeatedly using hard techniques can be if you miss the right foothold (and also because in this problem it’s unusually tempting to just think that more manipulation is enough). It’s not impossible to solve the problem without first realizing what the answer is, but it is certainly way more difficult.

3. Improving at soft techniques

What this also means is that, in the after-math of a problem (when you’ve solved/given up on a problem and are reading and reflecting on the solution), you should also add soft techniques into the list of possible answers to “how might I have thought of that?”. An example of this is at the end of my earlier post On Reading Solutions, in which I describe how you can come up with solutions to two Putnam problems by thinking carefully about what should be the equality case.

Doing this is harder than it sounds, because the soft techniques are the ones that by definition won’t appear in most written solutions, and many people don’t explicitly even recognize them. But soft techniques are the things that tell you which hard techniques to use, which is why they’re so valuable to learn well.

In writing this post, I’m hoping to make the math contest world more aware that these sorts of non-formalizable ideas are things that can (and should) be acknowledged and discussed, the same way that the hard techniques are. In particular, just as there are a plethora of handouts on every hard technique in the olympiad literature, it should also be possible to design handouts aimed at practicing one or more particular soft techniques.

At MOP every year, I’m starting to see more and more classes to this effect (alongside the usual mix of classes called “inversion” or “graph theory” or “induction” or whatnot). I would love to see more! End speech.

A few shockingly linear graphs

There’s a recent working paper by economists Ruchir Agarwal and Patrick Gaule which I think would be of much interest to this readership: a systematic study of IMO performance versus success as a mathematician later on.

Here is a link to the working paper.

Despite the click-baity title and dreamy introduction about the Millenium Prizes, the rest of the paper is fascinating, and the figures section is a gold mine. Here are two that stood out to me:

There’s also one really nice idea they had, which was to investigate the effect of getting one point less than a gold medal, versus getting exactly a gold medal. This is a pretty clever way to account for the effect of the prestige of the IMO, since “IMO gold” sounds so much better on a CV than “IMO silver” even though in any given year they may not differ so much. To my surprise, the authors found that “being awarded a better medal appears to have no additional impact on becoming a professional mathematician or future knowledge production”. I included the relevant graph below here.

The data used in the paper spans from IMO 1981 to IMO 2000. This is before the rise of Art of Problem Solving and the Internet (and the IMO was smaller back then, anyways), so I imagine these graphs might look different if we did them in 2040 using IMO 2000 – IMO 2020 data, although I’m not even sure whether I expect the effects to be larger or smaller.

(As usual: I do not mean to suggest that non-IMO participants cannot do well in math later. This is so that I do not get flooded with angry messages like last time.)

A trailer for p-adic analysis, second half: Mahler coefficients

In the previous post we defined {p}-adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions {f \colon \mathbb Z_p \rightarrow \mathbb Q_p}. Then we give the famous proof of the Skolem-Mahler-Lech theorem using {p}-adic analysis.

1. Digression on {\mathbb C_p}

Before I go on, I want to mention that {\mathbb Q_p} is not algebraically closed. So, we can take its algebraic closure {\overline{\mathbb Q_p}} — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain {\mathbb C_p}. In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space {\mathbb C_p} which is algebraically closed and complete. This space is called the {p}-adic complex numbers.

We won’t need {\mathbb C_p} at all in what follows, so you can forget everything you just read.

2. Mahler coefficients: a description of continuous functions on {\mathbb Z_p}

One of the big surprises of {p}-adic analysis is that we can concretely describe all continuous functions {\mathbb Z_p \rightarrow \mathbb Q_p}. They are given by a basis of functions

\displaystyle  \binom xn \overset{\mathrm{def}}{=} \frac{x(x-1) \dots (x-(n-1))}{n!}

in the following way.

Theorem 1 (Mahler; see Schikhof Theorem 51.1 and Exercise 51.B)

Let {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} be continuous, and define

\displaystyle  a_n = \sum_{k=0}^n \binom nk (-1)^{n-k} f(n).  \ \ \ \ \ (1)

Then {\lim_n a_n = 0} and

\displaystyle  f(x) = \sum_{n \ge 0} a_n \binom xn.

Conversely, if {a_n} is any sequence converging to zero, then {f(x) = \sum_{n \ge 0} a_n \binom xn} defines a continuous function satisfying (1).

The {a_i} are called the Mahler coefficients of {f}.

Exercise 2

Last post we proved that if {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is continuous and {f(n) = (-1)^n} for every {n \in \mathbb Z_{\ge 0}} then {p = 2}. Re-prove this using Mahler’s theorem, and this time show conversely that a unique such {f} exists when {p=2}.

You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.

\displaystyle  \begin{aligned} a_0 &= f(0) \\ a_1 &= f(1) - f(0) \\ a_2 &= f(2) - 2f(1) - f(0) \\ a_3 &= f(3) - 3f(2) + 3f(1) - f(0). \end{aligned}

Thus one can think of {a_n \rightarrow 0} as saying that the values of {f(0)}, {f(1)}, \dots behave like a polynomial modulo {p^e} for every {e \ge 0}. Amusingly, this fact was used on a USA TST in 2011:

Exercise 3 (USA TST 2011/3)

Let {p} be a prime. We say that a sequence of integers {\{z_n\}_{n=0}^\infty} is a {p}-pod if for each {e \geq 0}, there is an {N \geq 0} such that whenever {m \geq N}, {p^e} divides the sum

\displaystyle  \sum_{k=0}^m (-1)^k \binom mk z_k.

Prove that if both sequences {\{x_n\}_{n=0}^\infty} and {\{y_n\}_{n=0}^\infty} are {p}-pods, then the sequence {\{x_n y_n\}_{n=0}^\infty} is a {p}-pod.

3. Analytic functions

We say that a function {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is analytic if it has a power series expansion

\displaystyle  \sum_{n \ge 0} c_n x^n \quad c_n \in \mathbb Q_p \qquad\text{ converging for } x \in \mathbb Z_p.

As before there is a characterization in terms of the Mahler coefficients:

Theorem 4 (Schikhof Theorem 54.4)

The function {f(x) = \sum_{n \ge 0} a_n \binom xn} is analytic if and only if

\displaystyle  \lim_{n \rightarrow \infty} \frac{a_n}{n!} = 0.

Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on {\mathbb Z_p}.

Theorem 5 (Strassmann’s theorem)

Let {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} be analytic. Then {f} has finitely many zeros.

4. Skolem-Mahler-Lech

We close off with an application of the analyticity results above.

Theorem 6 (Skolem-Mahler-Lech)

Let {(x_i)_{i \ge 0}} be an integral linear recurrence. Then the zero set of {x_i} is eventually periodic.

Proof: According to the theory of linear recurrences, there exists a matrix {A} such that we can write {x_i} as a dot product

\displaystyle  x_i = \left< A^i u, v \right>.

Let {p} be a prime not dividing {\det A}. Let {T} be an integer such that {A^T \equiv \mathbf{1} \pmod p}.

Fix any {0 \le r < N}. We will prove that either all the terms

\displaystyle  f(n) = x_{nT+r} \qquad n = 0, 1, \dots

are zero, or at most finitely many of them are. This will conclude the proof.

Let {A^T = \mathbf{1} + pB} for some integer matrix {B}. We have

\displaystyle  \begin{aligned} f(n) &= \left< A^{nT+r} u, v \right> = \left< (\mathbf1 + pB)^n A^r u, v \right> \\ &= \sum_{k \ge 0} \binom nk \cdot p^n \left< B^n A^r u, v \right> \\ &= \sum_{k \ge 0} a_n \binom nk \qquad \text{ where } a_n = p^n \left< B^n A^r u, v \right> \in p^n \mathbb Z. \end{aligned}

Thus we have written {f} in Mahler form. Initially, we define {f \colon \mathbb Z_{\ge 0} \rightarrow \mathbb Z}, but by Mahler’s theorem (since {\lim_n a_n = 0}) it follows that {f} extends to a function {f \colon \mathbb Z_p \rightarrow \mathbb Q_p}. Also, we can check that {\lim_n \frac{a_n}{n!} = 0} hence {f} is even analytic.

Thus by Strassman’s theorem, {f} is either identically zero, or else it has finitely many zeros, as desired. \Box

A trailer for p-adic analysis, first half: USA TST 2003

I think this post is more than two years late in coming, but anywhow…

This post introduces the {p}-adic integers {\mathbb Z_p}, and the {p}-adic numbers {\mathbb Q_p}. The one-sentence description is that these are “integers/rationals carrying full mod {p^e} information” (and only that information).

The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.

In this whole post, {p} is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.

1. Motivation

Before really telling you what {\mathbb Z_p} and {\mathbb Q_p} are, let me tell you what you might expect them to do.

In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:

  • Taking modulo a prime {p} or prime {p^e}, and
  • Looking at the exponent {\nu_p}.

Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo {p} to gain something else to work with. Unfortunately, taking modulo {p} loses some information: (the reduction {\mathbb Z \twoheadrightarrow \mathbb Z/p} is far from injective).

If we want finer control, we could consider instead taking modulo {p^2}, rather than taking modulo {p}. This can also give some new information (cubes modulo {9}, anyone?), but it has the disadvantage that {\mathbb Z/p^2} isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo {p}.

One of the goals of {p}-adic numbers is that we can get around these two issues I described. The {p}-adic numbers we introduce is going to have the following properties:

  1. You can “take modulo {p^e} for all {e} at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with {p}-adic numbers, we won’t have to make that choice. An equation of {p}-adic numbers carries enough information to take modulo {p^e}.
  2. The numbers {\mathbb Q_p} form a field, the nicest possible algebraic structure: {1/p} makes sense. Contrast this with {\mathbb Z/p^2}, which is not even an integral domain.
  3. It doesn’t lose as much information as taking modulo {p} does: rather than the surjective {\mathbb Z \twoheadrightarrow \mathbb Z/p} we have an injective map {\mathbb Z \hookrightarrow \mathbb Z_p}.
  4. Despite this, you “ignore” some “irrelevant” data. Just like taking modulo {p}, you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation { a^2 + b^2 + c^2 + d^2 = -1} has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime {p}, because once you take modulo {p} you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in {p}-adics: the above equation has a solution in {\mathbb Z_p} for every prime {p}.)

So, you can think of {p}-adic numbers as the right tool to use if you only really care about modulo {p^e} information, but normal {\mathbb Z/p^e} isn’t quite powerful enough.

To be more concrete, I’ll give a poster example now:

Example 1 (USA TST 2002/2)

For a prime {p}, show the value of

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}

does not depend on {x}.

Here is a problem where we clearly only care about {p^e}-type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod {p^3} (try it!). The basic issue is that there is no good way to deal with the denominators modulo {p^3} (in part {\mathbb Z/p^3} is not even an integral domain).

However, with {p}-adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity

\displaystyle \left( 1 + \frac{px}{k} \right)^{-2} = \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^n.

Such an identity makes no sense over {\mathbb Q} or {\mathbb R} for converge reasons, but it will work fine over the {\mathbb Q_p}, which is all we need.

2. Algebraic perspective

We now construct {\mathbb Z_p} and {\mathbb Q_p}. I promised earlier that a {p}-adic integer will let you look at “all residues modulo {p^e}” at once. This definition will formalize this.

2.1. Definition of {\mathbb Z_p}

Definition 2 (Introducing {\mathbb Z_p})

A {p}-adic integer is a sequence

\displaystyle x = (x_1 \bmod p, \; x_2 \bmod{p^2}, \; x_3 \bmod{p^3}, \; \dots)

of residues {x_e} modulo {p^e} for each integer {e}, satisfying the compatibility relations {x_i \equiv x_j \pmod{p^i}} for {i < j}.

The set {\mathbb Z_p} of {p}-adic integers forms a ring under component-wise addition and multiplication.

Example 3 (Some {3}-adic integers)

Let {p=3}. Every usual integer {n} generates a (compatible) sequence of residues modulo {p^e} for each {e}, so we can view each ordinary integer as {p}-adic one:

\displaystyle 50 = \left( 2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 50 \bmod{81}, \; 50 \bmod{243}, \; \dots \right).

On the other hand, there are sequences of residues which do not correspond to any usual integer despite satisfying compatibility relations, such as

\displaystyle \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right)

which can be thought of as {x = 1 + p + p^2 + \dots}.

In this way we get an injective map

\displaystyle \mathbb Z \hookrightarrow \mathbb Z_p \qquad n \mapsto \left( n \bmod p, n \bmod{p^2}, n \bmod{p^3}, \dots \right)

which is not surjective. So there are more {p}-adic integers than usual integers.

(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as

\displaystyle \mathbb Z_p \overset{\mathrm{def}}{=} \varprojlim \mathbb Z/p^e \mathbb Z

where the inverse limit is taken across {e \ge 1}.)

Exercise 4

Check that {\mathbb Z_p} is an integral domain.

2.2. Base {p} expansion

Here is another way to think about {p}-adic integers using “base {p}”. As in the example earlier, every usual integer can be written in base {p}, for example

\displaystyle 50 = \overline{1212}_3 = 2 \cdot 3^0 + 1 \cdot 3^1 + 2 \cdot 3^2 + 1 \cdot 3^3.

More generally, given any {x = (x_1, \dots) \in \mathbb Z_p}, we can write down a “base {p}” expansion in the sense that there are exactly {p} choices of {x_k} given {x_{k-1}}. Continuing the example earlier, we would write

\displaystyle \begin{aligned} \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3 \end{aligned}

and in general we can write

\displaystyle x = \sum_{k \ge 0} a_k p^k = \overline{\dots a_2 a_1 a_0}_p

where {a_k \in \{0, \dots, p-1\}}, such that the equation holds modulo {p^e} for each {e}. Note the expansion is infinite to the left, which is different from what you’re used to.

(Amusingly, negative integers also have infinite base {p} expansions: {-4 = \overline{\dots222212}_3}, corresponding to {(2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 77 \bmod{81} \dots)}.)

Thus you may often hear the advertisement that a {p}-adic integer is an “possibly infinite base {p} expansion”. This is correct, but later on we’ll be thinking of {\mathbb Z_p} in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base {p}. Indeed, much of your intuition from generating functions {K[[X]]} (where {K} is a field) will carry over to {\mathbb Z_p}.

2.3. Constructing {\mathbb Q_p}

Here is one way in which your intuition from generating functions carries over:

Proposition 5 (Non-multiples of {p} are all invertible)

The number {x \in \mathbb Z_p} is invertible if and only if {x_1 \ne 0}. In symbols,

\displaystyle x \in \mathbb Z_p^\times \iff x \not\equiv 0 \pmod p.

Contrast this with the corresponding statement for {K[ [ X ] ]}: a generating function {F \in K[ [ X ] ]} is invertible iff {F(0) \neq 0}.

Proof: If {x \equiv 0 \pmod p} then {x_1 = 0}, so clearly not invertible. Otherwise, {x_e \not\equiv 0 \pmod p} for all {e}, so we can take an inverse {y_e} modulo {p^e}, with {x_e y_e \equiv 1 \pmod{p^e}}. As the {y_e} are themselves compatible, the element {(y_1, y_2, \dots)} is an inverse. \Box

Example 6 (We have {-\frac{1}{2} = \overline{\dots1111}_3 \in \mathbb Z_3})

We claim the earlier example is actually

\displaystyle \begin{aligned} -\frac{1}{2} = \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3. \end{aligned}

Indeed, multiplying it by {-2} gives

\displaystyle \left( -2 \bmod 3, \; -8 \bmod 9, \; -26 \bmod{27}, \; -80 \bmod{81}, \; \dots \right) = 1.

(Compare this with the “geometric series” {1 + 3 + 3^2 + \dots = \frac{1}{1-3}}. We’ll actually be able to formalize this later, but not yet.)

Remark 7 ({\frac{1}{2}} is an integer for {p > 2})

The earlier proposition implies that {\frac{1}{2} \in \mathbb Z_3} (among other things); your intuition about what is an “integer” is different here! In olympiad terms, we already knew {\frac{1}{2} \pmod 3} made sense, which is why calling {\frac{1}{2}} an “integer” in the {3}-adics is correct, even though it doesn’t correspond to any element of {\mathbb Z}.

Fun (but trickier) exercise: rational numbers correspond exactly to eventually periodic base {p} expansions.

With this observation, here is now the definition of {\mathbb Q_p}.

Definition 8 (Introducing {\mathbb Q_p})

Since {\mathbb Z_p} is an integral domain, we let {\mathbb Q_p} denote its field of fractions. These are the {p}-adic numbers.

Continuing our generating functions analogy:

\displaystyle \mathbb Z_p \text{ is to } \mathbb Q_p \quad\text{as}\quad K[[X]] \text{ is to } K((X)).

This means {\mathbb Q_p} is “Laurent series with base {p}”, and in particular according to the earlier proposition we deduce:

Proposition 9 ({\mathbb Q_p} looks like formal Laurent series)

Every nonzero element of {\mathbb Q_p} is uniquely of the form

\displaystyle p^k u \qquad \text{ where } k \in \mathbb Z, \; u \in \mathbb Z_p^\times.

Thus, continuing our base {p} analogy, elements of {\mathbb Q_p} are in bijection with “Laurent series”

\displaystyle \sum_{k \ge -n} a_k p^k = \overline{\dots a_2 a_1 a_0 . a_{-1} a_{-2} \dots a_{-n}}_p

for {a_k \in \left\{ 0, \dots, p-1 \right\}}. So the base {p} representations of elements of {\mathbb Q_p} can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).

(Fair warning: the field {\mathbb Q_p} has characteristic zero, not {p}.)

Remark 10 (Warning on fraction field)

This result implies that you shouldn’t think about elements of {\mathbb Q_p} as {x/y} (for {x,y \in \mathbb Z_p}) in practice, even though this is the official definition (and what you’d expect from the name {\mathbb Q_p}). The only denominators you need are powers of {p}.

To keep pushing the formal Laurent series analogy, {K((X))} is usually not thought of as quotient of generating functions but rather as “formal series with some negative exponents”. You should apply the same intuition on {\mathbb Q_p}.

(At this point I want to make a remark about the fact {1/p \in \mathbb Q_p}, connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo {p}, but if you do the quantity {n/p \bmod{p}} doesn’t make sense unless you know {n \bmod{p^2}}. You can’t fix this by just taking modulo {p^2} since then you need {n \bmod{p^3}} to get {n/p \bmod{p^2}}, ad infinitum. You can work around issues like this, but the nice feature of {\mathbb Z_p} and {\mathbb Q_p} is that you have modulo {p^e} information for “all {e} at once”: the information of {x \in \mathbb Q_p} packages all the modulo {p^e} information simultaneously. So you can divide by {p} with no repercussions.)

3. Analytic perspective

3.1. Definition

Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for {p}-adic purposes we only care about modulo {p^e} information. So, we’ll instead think of two elements of {\mathbb Z_p} or {\mathbb Q_p} as “close” if they differ by a large multiple of {p^e}.

For this we’ll borrow the familiar {\nu_p} from elementary number theory.

Definition 11 ({p}-adic valuation and absolute value)

We define the {p}-adic valuation {\nu_p : \mathbb Q_p^\times \rightarrow \mathbb Z} in the following two equivalent ways:

  • For {x = (x_1, x_2, \dots) \in \mathbb Z_p} we let {\nu_p(x)} be the largest {e} such that {x_e \equiv 0 \pmod{p^e}} (or {e=0} if {x \in \mathbb Z_p^\times}). Then extend to all of {\mathbb Q_p^\times} by {\nu_p(xy) = \nu_p(x) + \nu_p(y)}.
  • Each {x \in \mathbb Q_p^\times} can be written uniquely as {p^k u} for {u \in \mathbb Z_p^\times}, {k \in \mathbb Z}. We let {\nu_p(x) = k}.

By convention we set {\nu_p(0) = +\infty}. Finally, define the {p}-adic absolute value {\left\lvert \bullet \right\rvert_p} by

\displaystyle \left\lvert x \right\rvert_p = p^{-\nu_p(x)}.

In particular {\left\lvert 0 \right\rvert_p = 0}.

This fulfills the promise that {x} and {y} are close if they look the same modulo {p^e} for large {e}; in that case {\nu_p(x-y)} is large and accordingly {\left\lvert x-y \right\rvert_p} is small.

3.2. Ultrametric space

In this way, {\mathbb Q_p} and {\mathbb Z_p} becomes a metric space with metric given by {\left\lvert x-y \right\rvert_p}.

Exercise 12

Suppose {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is continuous and {f(n) = (-1)^n} for every {n \in \mathbb Z_{\ge 0}}. Prove that {p = 2}.

In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from {\mathbb R}.

Proposition 13 ({\left\lvert \bullet \right\rvert_p} is an ultrametric)

For any {x,y \in \mathbb Z_p}, we have the strong triangle inequality

\displaystyle \left\lvert x+y \right\rvert_p \le \max \left\{ \left\lvert x \right\rvert_p, \left\lvert y \right\rvert_p \right\}.

Equality holds if (but not only if) {\left\lvert x \right\rvert_p \neq \left\lvert y \right\rvert_p}.

However, {\mathbb Q_p} is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in {\mathbb R} or {\mathbb C}: basically, any notion such as “continuous function”, “convergent series”, et cetera has a {p}-adic analog. In particular, we can define what it means for an infinite sum to converge:

Definition 14 (Convergence notions)

Here are some examples of {p}-adic analogs of “real-world” notions.

  • A sequence {s_1}, \dots converges to a limit {L} if {\lim_{n \rightarrow \infty} \left\lvert s_n - L \right\rvert_p = 0}.
  • The infinite series {\sum_k x_k} converges if the sequence of partial sums {s_1 = x_1}, {s_2 = x_1 + x_2}, \dots, converges to some limit.
  • \dots et cetera \dots

With this definition in place, the “base {p}” discussion we had earlier is now true in the analytic sense: if {x = \overline{\dots a_2 a_1 a_0}_p \in \mathbb Z_p} then

\displaystyle \sum_{k=0}^\infty a_k p^k \quad\text{converges to } x.

Indeed, the {n}th partial sum is divisible by {p^n}, hence the partial sums approach {x} as {n \rightarrow \infty}.

While the definitions are all the same, there are some changes in properties that should be true. For example, in {\mathbb Q_p} convergence of partial sums is simpler:

Proposition 15 ({|x_k|_p \rightarrow 0} iff convergence of series)

A series {\sum_{k=1}^\infty x_k} in {\mathbb Q_p} converges to some limit if and only if {\lim_{k \rightarrow \infty} |x_k|_p = 0}.

Contrast this with {\sum \frac1n = \infty} in {\mathbb R}. You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of {p}, we may assume {x_k \in \mathbb Z_p}. (This isn’t actually necessary, but makes the notation nicer.)

Observe that {x_k \pmod p} must eventually stabilize, since for large enough {n} we have {\left\lvert x_n \right\rvert_p < 1 \iff \nu_p(x_n) \ge 1}. So let {a_1} be the eventual residue modulo {p} of {\sum_{k=0}^N x_k \pmod p} for large {N}. In the same way let {a_2} be the eventual residue modulo {p^2}, and so on. Then one can check we approach the limit {a = (a_1, a_2, \dots)}. \Box

Here’s a couple exercises to get you used to thinking of {\mathbb Z_p} and {\mathbb Q_p} as metric spaces.

Exercise 16 ({\mathbb Z_p} is compact)

Show that {\mathbb Q_p} is not compact, but {\mathbb Z_p} is. (For the latter, I recommend using sequential continuity.)

Exercise 17 (Totally disconnected)

Show that both {\mathbb Z_p} and {\mathbb Q_p} are totally disconnected: there are no connected sets other than the empty set and singleton sets.

3.3. More fun with geometric series

While we’re at it, let’s finally state the {p}-adic analog of the geometric series formula.

Proposition 18 (Geometric series)

Let {x \in \mathbb Z_p} with {\left\lvert x \right\rvert_p < 1}. Then

\displaystyle \frac{1}{1-x} = 1 + x + x^2 + x^3 + \dots.

Proof: Note that the partial sums satisfy {1 + x + x^2 + \dots + x^n = \frac{1-x^n}{1-x}}, and {x^n \rightarrow 0} as {n \rightarrow \infty} since {\left\lvert x \right\rvert_p < 1}. \Box

So, {1 + 3 + 3^2 + \dots = -\frac{1}{2}} is really a correct convergence in {\mathbb Z_3}. And so on.

If you buy the analogy that {\mathbb Z_p} is generating functions with base {p}, then all the olympiad generating functions you might be used to have {p}-adic analogs. For example, you can prove more generally that:

Theorem 19 (Generalized binomial theorem)

If {x \in \mathbb Z_p} and {\left\lvert x \right\rvert_p < 1}, then for any {r \in \mathbb Q} we have the series convergence

\displaystyle \sum_{n \ge 0} \binom rn x^n = (1+x)^r.

(I haven’t defined {(1+x)^r}, but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of {p}. I won’t elaborate too much on this now, since {p}-adic exponentiation will be described in much more detail in the next post.

3.4. Completeness

Note that the definition of {\left\lvert \bullet \right\rvert_p} could have been given for {\mathbb Q} as well; we didn’t need {\mathbb Q_p} to introduce it (after all, we have {\nu_p} in olympiads already). The big important theorem I must state now is:

Theorem 20 ({\mathbb Q_p} is complete)

The space {\mathbb Q_p} is the completion of {\mathbb Q} with respect to {\left\lvert \bullet \right\rvert_p}.

This is the definition of {\mathbb Q_p} you’ll see more frequently; one then defines {\mathbb Z_p} in terms of {\mathbb Q_p} (rather than vice-versa) according to

\displaystyle \mathbb Z_p = \left\{ x \in \mathbb Q_p : \left\lvert x \right\rvert_p \le 1 \right\}.

(Remark for experts: {\mathbb Q_p} is a field with {\nu_p} a non-Arcihmedian valuation; then {\mathbb Z_p} is its valuation ring.)

Let me justify why this definition is philosophically nice.

Suppose you are a numerical analyst and you want to estimate the value of the sum

\displaystyle S = \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{10000^2}

to within {0.001}. The sum {S} consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series {\sum_{n \ge 1} n^{-2} = \frac16 \pi^2}, and deduce that {S \approx \frac{\pi^2}{6}}, up to some small error term from the terms past {\frac{1}{10001^2}}, which can be bounded.

Of course, in order to have access to enough theory to prove that {S = \pi^2/6}, you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space {\mathbb Q}, where e.g. the sequence {1}, {1.4}, {1.41}, {1.414}, \dots is considered “not convergent” because {\sqrt2 \notin \mathbb Q}. Instead, all analysis is done in the completion of {\mathbb Q}, namely {\mathbb R}.

Now suppose you are an olympiad contestant and want to estimate the sum

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2}

to within mod {p^3} (i.e. to within {p^{-3}} in {\left\lvert \bullet \right\rvert_p}). Even though {f_p(x)} is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod {p^3}). But the space {\mathbb Q} is not complete with respect to {\left\lvert \bullet \right\rvert_p} either, and thus it makes sense to work in the completion of {\mathbb Q} with respect to {\left\lvert \bullet \right\rvert_p}. This is exactly {\mathbb Q_p}.

4. Solving USA TST 2002/2

Let’s finally solve Example~1, which asks to compute

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}.

Armed with the generalized binomial theorem, this becomes straightforward.

\displaystyle \begin{aligned} f_p(x) &= \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} = \sum_{k=1}^{p-1} \frac{1}{k^2} \left( 1 + \frac{px}{k} \right)^{-2} \\ &= \sum_{k=1}^{p-1} \frac{1}{k^2} \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^{n} \\ &= \sum_{n \ge 0} \binom{-2}{n} \sum_{k=1}^{p-1} \frac{1}{k^2} \left( \frac{x}{k} \right)^{n} p^n \\ &\equiv \sum_{k=1}^{p-1} \frac{1}{k^2} - 2x \left( \sum_{k=1}^{p-1} \frac{1}{k^3} \right) p + 3x^2 \left( \sum_{k=1}^{p-1} \frac{1}{k^4} \right) p^2 \pmod{p^3}. \end{aligned}

Using the elementary facts that {p^2 \mid \sum_k k^{-3}} and {p \mid \sum k^{-4}}, this solves the problem.

 

New oly handout: Constructing Diagrams

I’ve added a new Euclidean geometry handout, Constructing Diagrams, to my webpage.

Some of the stuff covered in this handout:

  • Advice for constructing the triangle centers (hint: circumcenter goes first)
  • An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
  • Some mechanical suggestions such as dealing with phantom points
  • Some examples of computer-generated figures

Enjoy.