# A few shockingly linear graphs

There’s a recent working paper by economists Ruchir Agarwal and Patrick Gaule which I think would be of much interest to this readership: a systematic study of IMO performance versus success as a mathematician later on.

Despite the click-baity title and dreamy introduction about the Millenium Prizes, the rest of the paper is fascinating, and the figures section is a gold mine. Here are two that stood out to me:

There’s also one really nice idea they had, which was to investigate the effect of getting one point less than a gold medal, versus getting exactly a gold medal. This is a pretty clever way to account for the effect of the prestige of the IMO, since “IMO gold” sounds so much better on a CV than “IMO silver” even though in any given year they may not differ so much. To my surprise, the authors found that “being awarded a better medal appears to have no additional impact on becoming a professional mathematician or future knowledge production”. I included the relevant graph below here.

The data used in the paper spans from IMO 1981 to IMO 2000. This is before the rise of Art of Problem Solving and the Internet (and the IMO was smaller back then, anyways), so I imagine these graphs might look different if we did them in 2040 using IMO 2000 – IMO 2020 data, although I’m not even sure whether I expect the effects to be larger or smaller.

(As usual: I do not mean to suggest that non-IMO participants cannot do well in math later. This is so that I do not get flooded with angry messages like last time.)

# A trailer for p-adic analysis, second half: Mahler coefficients

In the previous post we defined ${p}$-adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$. Then we give the famous proof of the Skolem-Mahler-Lech theorem using ${p}$-adic analysis.

## 1. Digression on ${\mathbb C_p}$

Before I go on, I want to mention that ${\mathbb Q_p}$ is not algebraically closed. So, we can take its algebraic closure ${\overline{\mathbb Q_p}}$ — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain ${\mathbb C_p}$. In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space ${\mathbb C_p}$ which is algebraically closed and complete. This space is called the ${p}$-adic complex numbers.

We won’t need ${\mathbb C_p}$ at all in what follows, so you can forget everything you just read.

## 2. Mahler coefficients: a description of continuous functions on ${\mathbb Z_p}$

One of the big surprises of ${p}$-adic analysis is that we can concretely describe all continuous functions ${\mathbb Z_p \rightarrow \mathbb Q_p}$. They are given by a basis of functions

$\displaystyle \binom xn \overset{\mathrm{def}}{=} \frac{x(x-1) \dots (x-(n-1))}{n!}$

in the following way.

Theorem 1 (Mahler; see Schikhof Theorem 51.1 and Exercise 51.B)

Let ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ be continuous, and define

$\displaystyle a_n = \sum_{k=0}^n \binom nk (-1)^{n-k} f(n). \ \ \ \ \ (1)$

Then ${\lim_n a_n = 0}$ and

$\displaystyle f(x) = \sum_{n \ge 0} a_n \binom xn.$

Conversely, if ${a_n}$ is any sequence converging to zero, then ${f(x) = \sum_{n \ge 0} a_n \binom xn}$ defines a continuous function satisfying (1).

The ${a_i}$ are called the Mahler coefficients of ${f}$.

Exercise 2

Last post we proved that if ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is continuous and ${f(n) = (-1)^n}$ for every ${n \in \mathbb Z_{\ge 0}}$ then ${p = 2}$. Re-prove this using Mahler’s theorem, and this time show conversely that a unique such ${f}$ exists when ${p=2}$.

You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.

\displaystyle \begin{aligned} a_0 &= f(0) \\ a_1 &= f(1) - f(0) \\ a_2 &= f(2) - 2f(1) - f(0) \\ a_3 &= f(3) - 3f(2) + 3f(1) - f(0). \end{aligned}

Thus one can think of ${a_n \rightarrow 0}$ as saying that the values of ${f(0)}$, ${f(1)}$, \dots behave like a polynomial modulo ${p^e}$ for every ${e \ge 0}$. Amusingly, this fact was used on a USA TST in 2011:

Exercise 3 (USA TST 2011/3)

Let ${p}$ be a prime. We say that a sequence of integers ${\{z_n\}_{n=0}^\infty}$ is a ${p}$-pod if for each ${e \geq 0}$, there is an ${N \geq 0}$ such that whenever ${m \geq N}$, ${p^e}$ divides the sum

$\displaystyle \sum_{k=0}^m (-1)^k \binom mk z_k.$

Prove that if both sequences ${\{x_n\}_{n=0}^\infty}$ and ${\{y_n\}_{n=0}^\infty}$ are ${p}$-pods, then the sequence ${\{x_n y_n\}_{n=0}^\infty}$ is a ${p}$-pod.

## 3. Analytic functions

We say that a function ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is analytic if it has a power series expansion

$\displaystyle \sum_{n \ge 0} c_n x^n \quad c_n \in \mathbb Q_p \qquad\text{ converging for } x \in \mathbb Z_p.$

As before there is a characterization in terms of the Mahler coefficients:

Theorem 4 (Schikhof Theorem 54.4)

The function ${f(x) = \sum_{n \ge 0} a_n \binom xn}$ is analytic if and only if

$\displaystyle \lim_{n \rightarrow \infty} \frac{a_n}{n!} = 0.$

Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on ${\mathbb Z_p}$.

Theorem 5 (Strassmann’s theorem)

Let ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ be analytic. Then ${f}$ has finitely many zeros.

## 4. Skolem-Mahler-Lech

We close off with an application of the analyticity results above.

Theorem 6 (Skolem-Mahler-Lech)

Let ${(x_i)_{i \ge 0}}$ be an integral linear recurrence. Then the zero set of ${x_i}$ is eventually periodic.

Proof: According to the theory of linear recurrences, there exists a matrix ${A}$ such that we can write ${x_i}$ as a dot product

$\displaystyle x_i = \left< A^i u, v \right>.$

Let ${p}$ be a prime not dividing ${\det A}$. Let ${T}$ be an integer such that ${A^T \equiv \mathbf{1} \pmod p}$.

Fix any ${0 \le r < N}$. We will prove that either all the terms

$\displaystyle f(n) = x_{nT+r} \qquad n = 0, 1, \dots$

are zero, or at most finitely many of them are. This will conclude the proof.

Let ${A^T = \mathbf{1} + pB}$ for some integer matrix ${B}$. We have

\displaystyle \begin{aligned} f(n) &= \left< A^{nT+r} u, v \right> = \left< (\mathbf1 + pB)^n A^r u, v \right> \\ &= \sum_{k \ge 0} \binom nk \cdot p^n \left< B^n A^r u, v \right> \\ &= \sum_{k \ge 0} a_n \binom nk \qquad \text{ where } a_n = p^n \left< B^n A^r u, v \right> \in p^n \mathbb Z. \end{aligned}

Thus we have written ${f}$ in Mahler form. Initially, we define ${f \colon \mathbb Z_{\ge 0} \rightarrow \mathbb Z}$, but by Mahler’s theorem (since ${\lim_n a_n = 0}$) it follows that ${f}$ extends to a function ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$. Also, we can check that ${\lim_n \frac{a_n}{n!} = 0}$ hence ${f}$ is even analytic.

Thus by Strassman’s theorem, ${f}$ is either identically zero, or else it has finitely many zeros, as desired. $\Box$

# A trailer for p-adic analysis, first half: USA TST 2003

I think this post is more than two years late in coming, but anywhow…

This post introduces the ${p}$-adic integers ${\mathbb Z_p}$, and the ${p}$-adic numbers ${\mathbb Q_p}$. The one-sentence description is that these are “integers/rationals carrying full mod ${p^e}$ information” (and only that information).

The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.

In this whole post, ${p}$ is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.

## 1. Motivation

Before really telling you what ${\mathbb Z_p}$ and ${\mathbb Q_p}$ are, let me tell you what you might expect them to do.

In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:

• Taking modulo a prime ${p}$ or prime ${p^e}$, and
• Looking at the exponent ${\nu_p}$.

Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo ${p}$ to gain something else to work with. Unfortunately, taking modulo ${p}$ loses some information: (the reduction ${\mathbb Z \twoheadrightarrow \mathbb Z/p}$ is far from injective).

If we want finer control, we could consider instead taking modulo ${p^2}$, rather than taking modulo ${p}$. This can also give some new information (cubes modulo ${9}$, anyone?), but it has the disadvantage that ${\mathbb Z/p^2}$ isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo ${p}$.

One of the goals of ${p}$-adic numbers is that we can get around these two issues I described. The ${p}$-adic numbers we introduce is going to have the following properties:

1. You can “take modulo ${p^e}$ for all ${e}$ at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with ${p}$-adic numbers, we won’t have to make that choice. An equation of ${p}$-adic numbers carries enough information to take modulo ${p^e}$.
2. The numbers ${\mathbb Q_p}$ form a field, the nicest possible algebraic structure: ${1/p}$ makes sense. Contrast this with ${\mathbb Z/p^2}$, which is not even an integral domain.
3. It doesn’t lose as much information as taking modulo ${p}$ does: rather than the surjective ${\mathbb Z \twoheadrightarrow \mathbb Z/p}$ we have an injective map ${\mathbb Z \hookrightarrow \mathbb Z_p}$.
4. Despite this, you “ignore” some “irrelevant” data. Just like taking modulo ${p}$, you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation ${ a^2 + b^2 + c^2 + d^2 = -1}$ has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime ${p}$, because once you take modulo ${p}$ you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in ${p}$-adics: the above equation has a solution in ${\mathbb Z_p}$ for every prime ${p}$.)

So, you can think of ${p}$-adic numbers as the right tool to use if you only really care about modulo ${p^e}$ information, but normal ${\mathbb Z/p^e}$ isn’t quite powerful enough.

To be more concrete, I’ll give a poster example now:

Example 1 (USA TST 2002/2)

For a prime ${p}$, show the value of

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}$

does not depend on ${x}$.

Here is a problem where we clearly only care about ${p^e}$-type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod ${p^3}$ (try it!). The basic issue is that there is no good way to deal with the denominators modulo ${p^3}$ (in part ${\mathbb Z/p^3}$ is not even an integral domain).

However, with ${p}$-adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity

$\displaystyle \left( 1 + \frac{px}{k} \right)^{-2} = \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^n.$

Such an identity makes no sense over ${\mathbb Q}$ or ${\mathbb R}$ for converge reasons, but it will work fine over the ${\mathbb Q_p}$, which is all we need.

## 2. Algebraic perspective

We now construct ${\mathbb Z_p}$ and ${\mathbb Q_p}$. I promised earlier that a ${p}$-adic integer will let you look at “all residues modulo ${p^e}$” at once. This definition will formalize this.

### 2.1. Definition of ${\mathbb Z_p}$

Definition 2 (Introducing ${\mathbb Z_p}$)

A ${p}$-adic integer is a sequence

$\displaystyle x = (x_1 \bmod p, \; x_2 \bmod{p^2}, \; x_3 \bmod{p^3}, \; \dots)$

of residues ${x_e}$ modulo ${p^e}$ for each integer ${e}$, satisfying the compatibility relations ${x_i \equiv x_j \pmod{p^i}}$ for ${i < j}$.

The set ${\mathbb Z_p}$ of ${p}$-adic integers forms a ring under component-wise addition and multiplication.

Example 3 (Some ${3}$-adic integers)

Let ${p=3}$. Every usual integer ${n}$ generates a (compatible) sequence of residues modulo ${p^e}$ for each ${e}$, so we can view each ordinary integer as ${p}$-adic one:

$\displaystyle 50 = \left( 2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 50 \bmod{81}, \; 50 \bmod{243}, \; \dots \right).$

On the other hand, there are sequences of residues which do not correspond to any usual integer despite satisfying compatibility relations, such as

$\displaystyle \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right)$

which can be thought of as ${x = 1 + p + p^2 + \dots}$.

In this way we get an injective map

$\displaystyle \mathbb Z \hookrightarrow \mathbb Z_p \qquad n \mapsto \left( n \bmod p, n \bmod{p^2}, n \bmod{p^3}, \dots \right)$

which is not surjective. So there are more ${p}$-adic integers than usual integers.

(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as

$\displaystyle \mathbb Z_p \overset{\mathrm{def}}{=} \varprojlim \mathbb Z/p^e \mathbb Z$

where the inverse limit is taken across ${e \ge 1}$.)

Exercise 4

Check that ${\mathbb Z_p}$ is an integral domain.

### 2.2. Base ${p}$ expansion

Here is another way to think about ${p}$-adic integers using “base ${p}$”. As in the example earlier, every usual integer can be written in base ${p}$, for example

$\displaystyle 50 = \overline{1212}_3 = 2 \cdot 3^0 + 1 \cdot 3^1 + 2 \cdot 3^2 + 1 \cdot 3^3.$

More generally, given any ${x = (x_1, \dots) \in \mathbb Z_p}$, we can write down a “base ${p}$” expansion in the sense that there are exactly ${p}$ choices of ${x_k}$ given ${x_{k-1}}$. Continuing the example earlier, we would write

\displaystyle \begin{aligned} \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3 \end{aligned}

and in general we can write

$\displaystyle x = \sum_{k \ge 0} a_k p^k = \overline{\dots a_2 a_1 a_0}_p$

where ${a_k \in \{0, \dots, p-1\}}$, such that the equation holds modulo ${p^e}$ for each ${e}$. Note the expansion is infinite to the left, which is different from what you’re used to.

(Amusingly, negative integers also have infinite base ${p}$ expansions: ${-4 = \overline{\dots222212}_3}$, corresponding to ${(2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 77 \bmod{81} \dots)}$.)

Thus you may often hear the advertisement that a ${p}$-adic integer is an “possibly infinite base ${p}$ expansion”. This is correct, but later on we’ll be thinking of ${\mathbb Z_p}$ in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base ${p}$. Indeed, much of your intuition from generating functions ${K[[X]]}$ (where ${K}$ is a field) will carry over to ${\mathbb Z_p}$.

### 2.3. Constructing ${\mathbb Q_p}$

Here is one way in which your intuition from generating functions carries over:

Proposition 5 (Non-multiples of ${p}$ are all invertible)

The number ${x \in \mathbb Z_p}$ is invertible if and only if ${x_1 \ne 0}$. In symbols,

$\displaystyle x \in \mathbb Z_p^\times \iff x \not\equiv 0 \pmod p.$

Contrast this with the corresponding statement for ${K[ [ X ] ]}$: a generating function ${F \in K[ [ X ] ]}$ is invertible iff ${F(0) \neq 0}$.

Proof: If ${x \equiv 0 \pmod p}$ then ${x_1 = 0}$, so clearly not invertible. Otherwise, ${x_e \not\equiv 0 \pmod p}$ for all ${e}$, so we can take an inverse ${y_e}$ modulo ${p^e}$, with ${x_e y_e \equiv 1 \pmod{p^e}}$. As the ${y_e}$ are themselves compatible, the element ${(y_1, y_2, \dots)}$ is an inverse. $\Box$

Example 6 (We have ${-\frac{1}{2} = \overline{\dots1111}_3 \in \mathbb Z_3}$)

We claim the earlier example is actually

\displaystyle \begin{aligned} -\frac{1}{2} = \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3. \end{aligned}

Indeed, multiplying it by ${-2}$ gives

$\displaystyle \left( -2 \bmod 3, \; -8 \bmod 9, \; -26 \bmod{27}, \; -80 \bmod{81}, \; \dots \right) = 1.$

(Compare this with the “geometric series” ${1 + 3 + 3^2 + \dots = \frac{1}{1-3}}$. We’ll actually be able to formalize this later, but not yet.)

Remark 7 (${\frac{1}{2}}$ is an integer for ${p > 2}$)

The earlier proposition implies that ${\frac{1}{2} \in \mathbb Z_3}$ (among other things); your intuition about what is an “integer” is different here! In olympiad terms, we already knew ${\frac{1}{2} \pmod 3}$ made sense, which is why calling ${\frac{1}{2}}$ an “integer” in the ${3}$-adics is correct, even though it doesn’t correspond to any element of ${\mathbb Z}$.

Fun (but trickier) exercise: rational numbers correspond exactly to eventually periodic base ${p}$ expansions.

With this observation, here is now the definition of ${\mathbb Q_p}$.

Definition 8 (Introducing ${\mathbb Q_p}$)

Since ${\mathbb Z_p}$ is an integral domain, we let ${\mathbb Q_p}$ denote its field of fractions. These are the ${p}$-adic numbers.

Continuing our generating functions analogy:

$\displaystyle \mathbb Z_p \text{ is to } \mathbb Q_p \quad\text{as}\quad K[[X]] \text{ is to } K((X)).$

This means ${\mathbb Q_p}$ is “Laurent series with base ${p}$”, and in particular according to the earlier proposition we deduce:

Proposition 9 (${\mathbb Q_p}$ looks like formal Laurent series)

Every nonzero element of ${\mathbb Q_p}$ is uniquely of the form

$\displaystyle p^k u \qquad \text{ where } k \in \mathbb Z, \; u \in \mathbb Z_p^\times.$

Thus, continuing our base ${p}$ analogy, elements of ${\mathbb Q_p}$ are in bijection with “Laurent series”

$\displaystyle \sum_{k \ge -n} a_k p^k = \overline{\dots a_2 a_1 a_0 . a_{-1} a_{-2} \dots a_{-n}}_p$

for ${a_k \in \left\{ 0, \dots, p-1 \right\}}$. So the base ${p}$ representations of elements of ${\mathbb Q_p}$ can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).

(Fair warning: the field ${\mathbb Q_p}$ has characteristic zero, not ${p}$.)

Remark 10 (Warning on fraction field)

This result implies that you shouldn’t think about elements of ${\mathbb Q_p}$ as ${x/y}$ (for ${x,y \in \mathbb Z_p}$) in practice, even though this is the official definition (and what you’d expect from the name ${\mathbb Q_p}$). The only denominators you need are powers of ${p}$.

To keep pushing the formal Laurent series analogy, ${K((X))}$ is usually not thought of as quotient of generating functions but rather as “formal series with some negative exponents”. You should apply the same intuition on ${\mathbb Q_p}$.

(At this point I want to make a remark about the fact ${1/p \in \mathbb Q_p}$, connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo ${p}$, but if you do the quantity ${n/p \bmod{p}}$ doesn’t make sense unless you know ${n \bmod{p^2}}$. You can’t fix this by just taking modulo ${p^2}$ since then you need ${n \bmod{p^3}}$ to get ${n/p \bmod{p^2}}$, ad infinitum. You can work around issues like this, but the nice feature of ${\mathbb Z_p}$ and ${\mathbb Q_p}$ is that you have modulo ${p^e}$ information for “all ${e}$ at once”: the information of ${x \in \mathbb Q_p}$ packages all the modulo ${p^e}$ information simultaneously. So you can divide by ${p}$ with no repercussions.)

## 3. Analytic perspective

### 3.1. Definition

Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for ${p}$-adic purposes we only care about modulo ${p^e}$ information. So, we’ll instead think of two elements of ${\mathbb Z_p}$ or ${\mathbb Q_p}$ as “close” if they differ by a large multiple of ${p^e}$.

For this we’ll borrow the familiar ${\nu_p}$ from elementary number theory.

Definition 11 (${p}$-adic valuation and absolute value)

We define the ${p}$-adic valuation ${\nu_p : \mathbb Q_p^\times \rightarrow \mathbb Z}$ in the following two equivalent ways:

• For ${x = (x_1, x_2, \dots) \in \mathbb Z_p}$ we let ${\nu_p(x)}$ be the largest ${e}$ such that ${x_e \equiv 0 \pmod{p^e}}$ (or ${e=0}$ if ${x \in \mathbb Z_p^\times}$). Then extend to all of ${\mathbb Q_p^\times}$ by ${\nu_p(xy) = \nu_p(x) + \nu_p(y)}$.
• Each ${x \in \mathbb Q_p^\times}$ can be written uniquely as ${p^k u}$ for ${u \in \mathbb Z_p^\times}$, ${k \in \mathbb Z}$. We let ${\nu_p(x) = k}$.

By convention we set ${\nu_p(0) = +\infty}$. Finally, define the ${p}$-adic absolute value ${\left\lvert \bullet \right\rvert_p}$ by

$\displaystyle \left\lvert x \right\rvert_p = p^{-\nu_p(x)}.$

In particular ${\left\lvert 0 \right\rvert_p = 0}$.

This fulfills the promise that ${x}$ and ${y}$ are close if they look the same modulo ${p^e}$ for large ${e}$; in that case ${\nu_p(x-y)}$ is large and accordingly ${\left\lvert x-y \right\rvert_p}$ is small.

### 3.2. Ultrametric space

In this way, ${\mathbb Q_p}$ and ${\mathbb Z_p}$ becomes a metric space with metric given by ${\left\lvert x-y \right\rvert_p}$.

Exercise 12

Suppose ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is continuous and ${f(n) = (-1)^n}$ for every ${n \in \mathbb Z_{\ge 0}}$. Prove that ${p = 2}$.

In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from ${\mathbb R}$.

Proposition 13 (${\left\lvert \bullet \right\rvert_p}$ is an ultrametric)

For any ${x,y \in \mathbb Z_p}$, we have the strong triangle inequality

$\displaystyle \left\lvert x+y \right\rvert_p \le \max \left\{ \left\lvert x \right\rvert_p, \left\lvert y \right\rvert_p \right\}.$

Equality holds if (but not only if) ${\left\lvert x \right\rvert_p \neq \left\lvert y \right\rvert_p}$.

However, ${\mathbb Q_p}$ is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in ${\mathbb R}$ or ${\mathbb C}$: basically, any notion such as “continuous function”, “convergent series”, et cetera has a ${p}$-adic analog. In particular, we can define what it means for an infinite sum to converge:

Definition 14 (Convergence notions)

Here are some examples of ${p}$-adic analogs of “real-world” notions.

• A sequence ${s_1}$, \dots converges to a limit ${L}$ if ${\lim_{n \rightarrow \infty} \left\lvert s_n - L \right\rvert_p = 0}$.
• The infinite series ${\sum_k x_k}$ converges if the sequence of partial sums ${s_1 = x_1}$, ${s_2 = x_1 + x_2}$, \dots, converges to some limit.
• \dots et cetera \dots

With this definition in place, the “base ${p}$” discussion we had earlier is now true in the analytic sense: if ${x = \overline{\dots a_2 a_1 a_0}_p \in \mathbb Z_p}$ then

$\displaystyle \sum_{k=0}^\infty a_k p^k \quad\text{converges to } x.$

Indeed, the ${n}$th partial sum is divisible by ${p^n}$, hence the partial sums approach ${x}$ as ${n \rightarrow \infty}$.

While the definitions are all the same, there are some changes in properties that should be true. For example, in ${\mathbb Q_p}$ convergence of partial sums is simpler:

Proposition 15 (${|x_k|_p \rightarrow 0}$ iff convergence of series)

A series ${\sum_{k=1}^\infty x_k}$ in ${\mathbb Q_p}$ converges to some limit if and only if ${\lim_{k \rightarrow \infty} |x_k|_p = 0}$.

Contrast this with ${\sum \frac1n = \infty}$ in ${\mathbb R}$. You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of ${p}$, we may assume ${x_k \in \mathbb Z_p}$. (This isn’t actually necessary, but makes the notation nicer.)

Observe that ${x_k \pmod p}$ must eventually stabilize, since for large enough ${n}$ we have ${\left\lvert x_n \right\rvert_p < 1 \iff \nu_p(x_n) \ge 1}$. So let ${a_1}$ be the eventual residue modulo ${p}$ of ${\sum_{k=0}^N x_k \pmod p}$ for large ${N}$. In the same way let ${a_2}$ be the eventual residue modulo ${p^2}$, and so on. Then one can check we approach the limit ${a = (a_1, a_2, \dots)}$. $\Box$

Here’s a couple exercises to get you used to thinking of ${\mathbb Z_p}$ and ${\mathbb Q_p}$ as metric spaces.

Exercise 16 (${\mathbb Z_p}$ is compact)

Show that ${\mathbb Q_p}$ is not compact, but ${\mathbb Z_p}$ is. (For the latter, I recommend using sequential continuity.)

Exercise 17 (Totally disconnected)

Show that both ${\mathbb Z_p}$ and ${\mathbb Q_p}$ are totally disconnected: there are no connected sets other than the empty set and singleton sets.

### 3.3. More fun with geometric series

While we’re at it, let’s finally state the ${p}$-adic analog of the geometric series formula.

Proposition 18 (Geometric series)

Let ${x \in \mathbb Z_p}$ with ${\left\lvert x \right\rvert_p < 1}$. Then

$\displaystyle \frac{1}{1-x} = 1 + x + x^2 + x^3 + \dots.$

Proof: Note that the partial sums satisfy ${1 + x + x^2 + \dots + x^n = \frac{1-x^n}{1-x}}$, and ${x^n \rightarrow 0}$ as ${n \rightarrow \infty}$ since ${\left\lvert x \right\rvert_p < 1}$. $\Box$

So, ${1 + 3 + 3^2 + \dots = -\frac{1}{2}}$ is really a correct convergence in ${\mathbb Z_3}$. And so on.

If you buy the analogy that ${\mathbb Z_p}$ is generating functions with base ${p}$, then all the olympiad generating functions you might be used to have ${p}$-adic analogs. For example, you can prove more generally that:

Theorem 19 (Generalized binomial theorem)

If ${x \in \mathbb Z_p}$ and ${\left\lvert x \right\rvert_p < 1}$, then for any ${r \in \mathbb Q}$ we have the series convergence

$\displaystyle \sum_{n \ge 0} \binom rn x^n = (1+x)^r.$

(I haven’t defined ${(1+x)^r}$, but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of ${p}$. I won’t elaborate too much on this now, since ${p}$-adic exponentiation will be described in much more detail in the next post.

### 3.4. Completeness

Note that the definition of ${\left\lvert \bullet \right\rvert_p}$ could have been given for ${\mathbb Q}$ as well; we didn’t need ${\mathbb Q_p}$ to introduce it (after all, we have ${\nu_p}$ in olympiads already). The big important theorem I must state now is:

Theorem 20 (${\mathbb Q_p}$ is complete)

The space ${\mathbb Q_p}$ is the completion of ${\mathbb Q}$ with respect to ${\left\lvert \bullet \right\rvert_p}$.

This is the definition of ${\mathbb Q_p}$ you’ll see more frequently; one then defines ${\mathbb Z_p}$ in terms of ${\mathbb Q_p}$ (rather than vice-versa) according to

$\displaystyle \mathbb Z_p = \left\{ x \in \mathbb Q_p : \left\lvert x \right\rvert_p \le 1 \right\}.$

(Remark for experts: ${\mathbb Q_p}$ is a field with ${\nu_p}$ a non-Arcihmedian valuation; then ${\mathbb Z_p}$ is its valuation ring.)

Let me justify why this definition is philosophically nice.

Suppose you are a numerical analyst and you want to estimate the value of the sum

$\displaystyle S = \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{10000^2}$

to within ${0.001}$. The sum ${S}$ consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series ${\sum_{n \ge 1} n^{-2} = \frac16 \pi^2}$, and deduce that ${S \approx \frac{\pi^2}{6}}$, up to some small error term from the terms past ${\frac{1}{10001^2}}$, which can be bounded.

Of course, in order to have access to enough theory to prove that ${S = \pi^2/6}$, you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space ${\mathbb Q}$, where e.g. the sequence ${1}$, ${1.4}$, ${1.41}$, ${1.414}$, \dots is considered “not convergent” because ${\sqrt2 \notin \mathbb Q}$. Instead, all analysis is done in the completion of ${\mathbb Q}$, namely ${\mathbb R}$.

Now suppose you are an olympiad contestant and want to estimate the sum

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2}$

to within mod ${p^3}$ (i.e. to within ${p^{-3}}$ in ${\left\lvert \bullet \right\rvert_p}$). Even though ${f_p(x)}$ is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod ${p^3}$). But the space ${\mathbb Q}$ is not complete with respect to ${\left\lvert \bullet \right\rvert_p}$ either, and thus it makes sense to work in the completion of ${\mathbb Q}$ with respect to ${\left\lvert \bullet \right\rvert_p}$. This is exactly ${\mathbb Q_p}$.

## 4. Solving USA TST 2002/2

Let’s finally solve Example~1, which asks to compute

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}.$

Armed with the generalized binomial theorem, this becomes straightforward.

\displaystyle \begin{aligned} f_p(x) &= \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} = \sum_{k=1}^{p-1} \frac{1}{k^2} \left( 1 + \frac{px}{k} \right)^{-2} \\ &= \sum_{k=1}^{p-1} \frac{1}{k^2} \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^{n} \\ &= \sum_{n \ge 0} \binom{-2}{n} \sum_{k=1}^{p-1} \frac{1}{k^2} \left( \frac{x}{k} \right)^{n} p^n \\ &\equiv \sum_{k=1}^{p-1} \frac{1}{k^2} - 2x \left( \sum_{k=1}^{p-1} \frac{1}{k^3} \right) p + 3x^2 \left( \sum_{k=1}^{p-1} \frac{1}{k^4} \right) p^2 \pmod{p^3}. \end{aligned}

Using the elementary facts that ${p^2 \mid \sum_k k^{-3}}$ and ${p \mid \sum k^{-4}}$, this solves the problem.

# New oly handout: Constructing Diagrams

I’ve added a new Euclidean geometry handout, Constructing Diagrams, to my webpage.

Some of the stuff covered in this handout:

• Advice for constructing the triangle centers (hint: circumcenter goes first)
• An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
• Some mechanical suggestions such as dealing with phantom points
• Some examples of computer-generated figures

Enjoy.

# Revisiting Arc Midpoints in Complex Numbers

## 1. Synopsis

One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.

The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.

Theorem 1 (Arc midpoint setup for a triangle)

Let ${ABC}$ be a triangle with circumcircle ${\Gamma}$ and let ${M_A}$, ${M_B}$, ${M_C}$ denote the arc midpoints of ${\widehat{BC}}$ opposite ${A}$, ${\widehat{CA}}$ opposite ${B}$, ${\widehat{AB}}$ opposite ${C}$.

Suppose we view ${\Gamma}$ as the unit circle in the complex plane. Then there exist complex numbers ${x}$, ${y}$, ${z}$ such that ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, and

$\displaystyle M_A = -yz, \quad M_B = -zx, \quad M_C = -xy.$

Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates ${-(xy+yz+zx)}$ in the above notation.

Lemma 2 (The incenter is the orthocenter of opposite arc midpoints)

Let ${ABC}$ be a triangle with circumcircle ${\Gamma}$ and let ${M_A}$, ${M_B}$, ${M_C}$ denote the arc midpoints of ${\widehat{BC}}$ opposite ${A}$, ${\widehat{CA}}$ opposite ${B}$, ${\widehat{AB}}$ opposite ${C}$. Then the incenter of ${\triangle ABC}$ coincides with the orthocenter of ${\triangle M_A M_B M_C}$.

Unfortunately, the proof of Theorem 1 in my textbook is wrong, and I cannot find a proof online (though I hear that Lemmas in Olympiad Geometry has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.

## 2. A Word of Warning

I should at once warn the reader that Theorem 1 is an existence result, and thus must be applied carefully.

To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.

Example 3 (JMO 2016, by Zuming feng)

The isosceles triangle ${\triangle ABC}$, with ${AB=AC}$, is inscribed in the circle ${\omega}$. Let ${P}$ be a variable point on the arc ${BC}$ that does not contain ${A}$, and let ${I_B}$ and ${I_C}$ denote the incenters of triangles ${\triangle ABP}$ and ${\triangle ACP}$, respectively. Prove that as ${P}$ varies, the circumcircle of triangle ${\triangle PI_{B}I_{C}}$ passes through a fixed point.

By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc ${\widehat{BC}}$, as seen in the figure below. One might be tempted to write ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, ${P = t^2}$ and assert the two incenters are ${-(xy+yt+xt)}$ and ${-(xz+zt+xt)}$, and that the fixed point is ${-yz}$.

This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices ${A}$ and ${P}$ may not be compatible. In fact, they cannot be compatible, because the arc midpoint of ${\widehat{AP}}$ opposite ${B}$ is different from the arc midpoint of ${\widehat{AP}}$ opposite ${C}$.

In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of ${\triangle P I_B I_C}$ passes through the midpoint of arc ${\widehat{BC}}$ is false if ${P}$ lies on the arc on the same side as ${A}$! In that case it actually passes through ${A}$ instead. Thus the truth of the problem really depends on the fact that the quadrilateral ${ABPC}$ is convex, and any attempt with complex numbers must take this into account to have a chance of working.

## 3. Proof of the theorem for triangles

Fix ${ABC}$ now, so we require ${A = x^2}$, ${B = y^2}$, ${C = z^2}$. There are ${2^3 = 8}$ choices of square roots ${x}$, ${y}$, ${z}$ we can take (differing by a sign); we wish to show one of them works.

We pick an arbitrary choice for ${x}$ first. Then, of the two choices of ${y}$, we pick the one such that ${-xy = M_C}$. Similarly, for the two choices of ${z}$, we pick the one such that ${-xz = M_B}$. Our goal is to show that under these conditions, we have ${M_A = -yz}$ again.

The main trick is to now consider the arc midpoint ${\widehat{BAC}}$, which we denote by ${L}$. It is easy to see that:

Lemma 4 (The isosceles trapezoid trick)

We have ${\overline{AL} \parallel \overline{M_B M_C}}$ (both are perpendicular to the ${\angle A}$ bisector). Thus ${A L M_B M_C}$ is an isosceles trapezoid, and so ${ A \cdot L = M_B \cdot M_C }$.

Thus, we have

$\displaystyle L = \frac{M_B M_C}{A} = \frac{(-xz)(-xy)}{x^2} = +yz.$

Thus

$\displaystyle M_A = -L = -yz$

as desired.

From this we can see why the minus signs are necessary.

Exercise 5

Show that Theorem 1 becomes false if we try to use ${+yz}$, ${+zx}$, ${+xy}$ instead of ${-yz}$, ${-zx}$, ${-xy}$.

## 4. A version for quadrilaterals

We now return to the setting of a convex quadrilateral ${ABPC}$ that we encountered in Example 3. Suppose we preserve the variables ${x}$, ${y}$, ${z}$ that we were given from Theorem 1, but now add a fourth complex number ${t}$ with ${P = t^2}$. How are the new arc midpoints determined? The following theorem answers this question.

Theorem 6 (${xytz}$ setup)

Let ${ABPC}$ be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers ${x}$, ${y}$, ${z}$, ${t}$ such that ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, ${P = t^2}$ and:

• The opposite arc midpoints ${M_A}$, ${M_B}$, ${M_C}$ of triangle ${ABC}$ are given by ${-yz}$, ${-zx}$, ${-xy}$, as before.
• The midpoint of arc ${\widehat{BP}}$ not including ${A}$ or ${C}$ is given by ${+yt}$.
• The midpoint of arc ${\widehat{CP}}$ not including ${A}$ or ${B}$ is given by ${-zt}$.
• The midpoint of arc ${\widehat{ABP}}$ is ${+xt}$ and the midpoint of arc ${\widehat{ACP}}$ is ${-xt}$.

This setup is summarized in the following figure.

Note that unlike Theorem 1, the four arcs cut out by the sides of ${ABCP}$ do not all have the same sign (I chose ${\widehat{BP}}$ to have coordinates ${+yt}$). This asymmetry is inevitable (see if you can understand why from the proof below).

Proof: We select ${x}$, ${y}$, ${z}$ with Theorem 1. Now, pick a choice of ${t}$ such that ${+yt}$ is the arc midpoint of ${\widehat{BP}}$ not containing ${A}$ and ${C}$. Then the arc midpoint of ${\widehat{CP}}$ not containing ${A}$ or ${B}$ is given by

$\displaystyle \frac{z^2}{-yz} \cdot (+yt) = -zt.$

On the other hand, the calculation of ${-xt}$ for the midpoint of ${\widehat{ABP}}$ follows by applying Lemma 4 again. (applied to triangle ${ABP}$). The midpoint of ${\widehat{ACP}}$ is computed similarly. $\Box$

In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled ${ABCD}$ in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.

Corollary 7

Let ${ABCD}$ be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers ${a}$, ${b}$, ${c}$, ${d}$ such that ${A = a^2}$, ${B = b^2}$, ${C = c^2}$, ${D = d^2}$ and:

• The midpoints of ${\widehat{AB}}$, ${\widehat{BC}}$, ${\widehat{CD}}$, ${\widehat{DA}}$ cut out by the sides of ${ABCD}$ are ${-ab}$, ${-bc}$, ${-cd}$, ${+da}$.
• The midpoints of ${\widehat{ABC}}$ and ${\widehat{BCD}}$ are ${+ac}$ and ${+bd}$.
• The midpoints of ${\widehat{CDA}}$ and ${\widehat{DAB}}$ are ${-ac}$ and ${-bd}$.

To test the newfound theorem, here is a cute easy application.

Example 8 (Japanese theorem for cyclic quadrilaterals)

In a cyclic quadrilateral ${ABCD}$, the incenters of ${\triangle ABC}$, ${\triangle BCD}$, ${\triangle CDA}$, ${\triangle DAB}$ are the vertices of a rectangle.

# An Apology for HMMT 2016

Median Putnam contestants, willing to devote one of the last Saturdays before final exams to a math test, are likely to receive an advanced degree in the sciences. It is counterproductive on many levels to leave them feeling like total idiots.

— Bruce Reznick, “Some Thoughts on Writing for the Putnam”

Last February I made a big public apology for having caused one of the biggest scoring errors in HMMT history, causing a lot of changes to the list of top individual students. Pleasantly, I got some nice emails from coaches who reminded me that most students and teams do not place highly in the tournament, and at the end of the day the most important thing is that the contestants enjoyed the tournament.

So now I decided I have to apologize for 2016, too.

The story this time is that I inadvertently sent over 100 students home having solved two or fewer problems total, out of 30 individual problems. That year, I was the problem czar for HMMT February 2016, and like many HMMT problem czars before me, had vastly underestimated the difficulty of my own problems.

I think stories like this are a lot worse than people realize; contests are supposed to be a learning experience for the students, and if a teenager shows up to Massachusetts and spends an entire Saturday feeling hopeless for the entire contest, then the flight back to California is going to feel very long. Now imagine having 100 students go through this every single February.

So today I’d like to say a bit about things I’ve picked up since then that have helped me avoid making similar mistakes. I actually think people generally realize that HMMT is too hard, but are wrong about how this should be fixed. In particular, I think the common approach (and the one I took) of “make problem 1 so easy that almost nobody gets a zero” is wrong, and I’ll explain here what I think should be done instead.

## 1. Gettable, not gimme

I think just “easy” is the wrong way to think about the beginning problems. At ARML, the problem authors use a finer distinction which I really like:

• A problem is gettable if nearly every contestant feels like they could have gotten the problem on a good day. (In particular, problems that require knowledege that not all contestants have are not gettable, even if they are easy with it.)
• A problem is a gimme if nearly every contestant actually solves the problem on the contest.

The consensus is always that the early problems should be gettable but not gimme’s. You could start every contest by asking the contestant to compute the expected value of 7, but the contestants are going to notice, and it isn’t going to help anyone.

(I guess I should make the point that in order for a problem to be a “gimme”, it would have to be so easy to be almost insulting, because high accuracy on a given problem is really only possible if the level of the problem is significantly below the level of the student. So a gimme would have to be a problem that is way easier than the level of the weakest contestant — you can see why these would be bad.)

In contrast, with a gettable problem, even though some of the contestants will miss it, they’ll often miss it for a reason like 2+3=6. This is a bit unfortunate, but it is still a lot better if the contestant goes home thinking “I made a small arithmetic error, so I have to be more careful” than “there’s no way I could have gotten this, it was hopeless”.

But that brings to me to the next point:

## 2. At the IMO 33% of the problems are gettable

At the IMO, there are two easy problems (one each day), but there are only six problems. So a full one-third of the problems are gettable: we hope that most students attending the IMO can solve either IMO1 or IMO4, even though many will not solve both.

If you are writing HMMT or some similar contest, I think this means you should think about the opening in terms of the fraction 1/3, rather than problem 1. For example, at HMMT, I think the czars should strive instead to make the first three or four out of ten problems on each individual test gettable: they should be problems every contestant could solve, even though some of them will still miss it anyways. Under the pressure of contest, students are going to make all sorts of mistakes, and so it’s important that there are multiple gettable problems. This way, every student has two or three or four real chances to solve a problem: they’ll still miss a few, but at least they feel like they could do something.

(Every year at HMMT, when we look back at the tests in hindsight, the first reflex many czars have is to look at how many people got 0’s on each test, and hope that it’s not too many. The fact that this figure is even worth looking at is in my opinion a sign that we are doing things wrong: is 1/10 any better than 0/10, if the kid solved question 1 quickly and then spent the rest of the hour staring at the other nine?)

## 3. Watch the clock

The other thing I want to say is to spend some time thinking about the entire test as a whole, rather than about each problem individually.

To drive the point: I’m willing to bet that an HMMT individual test with 4 easy, 6 medium, and 0 hard problems could actually work, even at the top end of the scores. Each medium problem in isolation won’t distinguish the strongest students. But put six of them all together, and you get two effects:

• Students will make mistakes on some of the problems, and by central limit theorem you’ll get a curve anyways.
• Time pressure becomes significantly more important, and the strongest students will come out ahead by simply being faster.

Of course, I’ll never be able to persuade the problem czars (myself included) to not include at least one or two of those super-nice hard problems. But the point is that they’re not actually needed in situations like HMMT, when there are so many problems that it’s hard to not get a curve of scores.

One suggestion many people won’t take: if you really want to include some difficulty problems that will take a while, decrease the length of the test. If you had 3 easy, 3 medium, and 1 hard problem, I bet that could work too. One hour is really not very much time.

Actually, this has been experimentally verified. On my HMMT 2016 Geometry test, nobody solved any of problems 8-10, so the test was essentially seven problems long. The gradient of scores at the top and center still ended up being okay. The only issue was that a third of the students solved zero problems, because the easy problems were either error-prone, or else were hit-or-miss (either solved quickly or not at all). Thus that’s another thing to watch out for.

# A story of block-ascending permutations

I recently had a combinatorics paper appear in the EJC. In this post I want to brag a bit by telling the “story” of this paper: what motivated it, how I found the conjecture that I originally did, and the process that eventually led me to the proof, and so on.

This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.

## 1. Background

Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.

Definition 1

For nonnegative integers ${a_1}$, …, ${a_n}$ an ${(a_1, \dots, a_n)}$-ascending permutation is a permutation on ${\{1, 2, \dots, a_1 + \dots + a_n\}}$ whose descent set is contained in ${\{a_1, a_1+a_2, \dots, a_1+\dots+a_{n-1}\}}$. In other words the permutation ascends in blocks of length ${a_1}$, ${a_2}$, …, ${a_n}$, and thus has the form

$\displaystyle \pi = \pi_{11} \dots \pi_{1a_1} | \pi_{21} \dots \pi_{2a_2} | \dots | \pi_{n1} \dots \pi_{na_n}$

for which ${\pi_{i1} < \pi_{i2} < \dots < \pi_{ia_i}}$ for all ${i}$.

It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.

Definition 2

Let ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ denote the set of ${(a_1, \dots, a_n)}$-ascending permutations which avoid the pattern ${12 \dots (k+2)}$.

(The reason for using ${k+2}$ will be explained later.) In particular, ${\mathcal L_{k+2}(a_1 ,\dots, a_n) = \varnothing}$ if ${\max \{a_1, \dots, a_n\} \ge k+2}$.

Example 3

Here is a picture of a permutation in ${\mathcal L_7(3,2,4)}$ (but not in ${\mathcal L_6(3,2,4)}$, since one can see an increasing length ${6}$ subsequence shaded). We would denote it ${134|69|2578}$.

Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:

Theorem 4 (Lewis 2011)

The sets ${\mathcal L_{k+2}(k,k,\dots,k)}$ and ${\mathcal L_{k+2}(k+1,k+1,\dots,k+1)}$ are in bijection with Young tableau of shape ${\left< (k+1)^n \right>}$.

Remark 5

When ${k=1}$, this implies ${\mathcal L_3(1,1,\dots,1)}$, which is the set of ${123}$-avoiding permutations of length ${n}$, is in bijection with the Catalan numbers; so is ${\mathcal L_3(2,\dots,2)}$ which is the set of ${123}$-avoiding zig-zag permutations.

Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix ${k}$ and ${k+1}$‘s. To simplify notation,

Definition 6

Let ${I \subseteq \left\{ 1,\dots,n \right\}}$. Then ${\mathcal L(n,k,I)}$ denotes ${\mathcal L_{k+2}(a_1,\dots,a_n)}$ where

$\displaystyle a_i = \begin{cases} k+1 & i \in I \\ k & i \notin I. \end{cases}$

Theorem 7 (Mei, Wang 2017)

The ${2^n}$ sets ${\mathcal L(n,k,I)}$ are also in bijection with Young tableau of shape ${\left< (k+1)^n \right>}$.

The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:

Problem

Find a direct bijection between the ${2^n}$ sets ${\mathcal L(n,k,I)}$ above, not involving the RSK correspondence.

This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)

At this point I should pause to mention that this ${\mathcal L_{k+2}(\dots)}$ notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where ${a_i \in \{k,k+1\}}$ for each ${i}$, and so it was unnecessary to think about other possibilities for ${a_i}$: Mei and Wang’s paper use the notation ${\mathcal L(n,k,I)}$. So while I’ll continue to use the ${\mathcal L_{k+2}(\dots)}$ notation in the blog post for readability, it will make some of the steps more obvious than they actually were.

## 2. Setting out

Mei and Wang’s paper originally suggested that rather than finding a bijection ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,J)}$ for any ${I}$ and ${J}$, it would suffice to biject

$\displaystyle \mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)$

and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.

As an example, they show how to do this bijection with ${I = \{1\}}$ and ${I = \{n\}}$. Indeed, suppose ${I = \{1\}}$. Then ${\pi_{11} < \pi_{12} < \dots < \pi_{1(k+1)}}$ is an increasing sequence of length ${k+1}$ right at the start of ${\pi}$. So ${\pi_{1(k+1)}}$ had better be the largest element in the permutation: otherwise later in ${\pi}$ the biggest element would complete an ascending permutation of length ${k+2}$ already! So removing ${\pi_{1(k+1)}}$ gives a bijection between ${\mathcal L(n,k,\{1\}) \rightarrow \mathcal L(n,k,\varnothing)}$.

But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:

Proposition 8

Suppose ${1 \notin I}$. Then there is a bijection

$\displaystyle \mathcal L(n,k,I \cup \{1\}) \rightarrow \mathcal L(n,k,I)$

by deleting the ${(k+1)}$st element of the permutation (which must be largest one).

Once I found this proposition I rejected the initial suggestion of specializing ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)}$. The “easy case” I had found told me that I could take a set ${I}$ and delete the single element ${1}$ from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,I')}$ whee ${I'}$ and ${I}$ were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.

## 3. The fork in the road

### 3.1. Wishful thinking

I had a lucky break of wishful thinking here. In the notation ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ with ${a_i \in \{k,k+1\}}$, I had found that one could replace ${a_1}$ with either ${k}$ or ${k+1}$ freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?

There were two immediate possibilities that came to my mind.

• Deletion: We already showed ${a_1}$ could be changed from ${k+1}$ to ${k}$ for any ${i}$. If we can do a similar deletion with ${a_i}$ for any ${i}$, not just ${i=1}$, then we would be done.
• Swapping: If we can show that two adjacent ${a_i}$‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent ${a_i}$‘s, but that would cause more disruption for no extra benefit.)

Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)

Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.

(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if ${i}$ is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)

### 3.2. Enter C++

Yeah, I cheated. Sorry.

Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.

Here’s the plan: let ${k=3}$. Then using a C++ computer program:

• Enumerate all the permutations in ${S = \mathcal L_{k+2}(3,4,3,4)}$.
• Enumerate all the permutations in ${A = \mathcal L_{k+2}(3,3,3,4)}$.
• Enumerate all the permutations in ${B = \mathcal L_{k+2}(3,3,4,4)}$.

If the deletion approach is right, then I would hope ${S}$ and ${A}$ look pretty similar. On the flip side, if the swapping approach is right, then ${S}$ and ${B}$ should look close to each other instead.

It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.

### 3.3. A twist of fate

Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length ${N}$ permutation takes ${O(N \log N)}$ time, and there are ${N!}$ or so permutations. But when ${N = 3+4+3+4=14}$, we have ${N! \cdot N \log N \approx 3 \cdot 10^{12}}$, which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.

The solution? Use ${N = 1+4+3+2 = 10}$ instead.

In a deep irony that I didn’t realize at the time, it was this moment when I introduced the ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ notation, and for the first time allowed the ${a_i}$ to not be in ${\{k,k+1\}}$. My reasoning was that since I was only doing this for heuristic reasons, I could instead work with ${S = \mathcal L_{k+2}(2,4,3,2)}$ and probably not change much about the structure of the problem, while replacing ${N = 2 + 4 + 3 + 2 = 11}$, which would run ${1000}$ times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.

And so the new plan was:

• Enumerate all the permutations in ${S = \mathcal L_{k+2}(1,4,3,2)}$.
• Enumerate all the permutations in ${A = \mathcal L_{k+2}(1,3,3,2)}$.
• Enumerate all the permutations in ${B = \mathcal L_{k+2}(1,3,4,2)}$.

I admit I never actually ran the enumeration with ${A}$, because the route with ${S}$ and ${B}$ turned out to be even more promising than I expected. When I compared the empirical data for the sets ${S}$ and ${B}$, I found that the number of permutations with any particular triple ${(\pi_1, \pi_9, \pi_{10})}$ were equal. In other words, the outer blocks were preserved: the bijection

$\displaystyle \mathcal L_{k+2}(1,4,3,2) \rightarrow \mathcal L_{k+2}(1,3,4,2)$

does not tamper with the outside blocks of length ${1}$ and ${2}$.

This meant I was ready to make the following conjecture. Suppose ${a_i = k}$, ${a_{i+1} = k+1}$. There is a bijection

$\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)$

which only involves rearranging the elements of the ${i}$th and ${(i+1)}$st blocks.

## 4. Rooting out the bijection

At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.

Let’s call this mythical bijection ${\mathbf W}$. How could I figure out what it was?

### 4.1. Hunch: ${\mathbf W}$ preserves order-isomorphism

Let me quickly introduce a definition.

Definition 9

We say two words ${a_1 \dots a_m}$ and ${b_1 \dots b_m}$ are order-isomorphic if ${a_i < a_j}$ if and only ${b_i < b_j}$. Then order-isomorphism gives equivalence classes, and there is a canonical representative where the letters are ${\{1,2,\dots,m\}}$; this is called a reduced word.

Example 10

The words ${13957}$, ${12846}$ and ${12534}$ are order-isomorphic; the last is reduced.

Now I guessed one more property of ${\mathbf W}$: this ${\mathbf W}$ should order-isomorphism.

What do I mean by this? Suppose in one context ${139 | 57}$ changed to ${39 | 157}$; then we would expect that in another situation we should have ${124 | 68}$ changing to ${24 | 168}$. Indeed, we expect ${\mathbf W}$ (empirically) to not touch surrounding outside blocks, and so it would be very strange if ${\mathbf W}$ behaved differently due to far-away numbers it wasn’t even touching.

So actually I’ll just write

$\displaystyle \mathbf W(123|45) = 23|145$

for this example, reducing the words in question.

### 4.2. Keep cheating

With this hunch it’s possible to cheat with C++ again. Here’s how.

Let’s for concreteness suppose ${k=2}$ and the particular sets

$\displaystyle \mathcal L_{k+2}(1,3,2,1) \rightarrow \mathcal L_{k+2}(1,2,3,1).$

Well, it turns out if you look at the data:

• The only element of ${\mathcal L_{k+2}(1,3,2,1)}$ which starts with ${2}$ and ends with ${5}$ is ${2|147|36|5}$.
• The only element of ${\mathcal L_{k+2}(1,2,3,1)}$ which starts with ${2}$ and ends with ${5}$ is ${2|47|136|5}$.

So that means that ${147 | 36}$ is changed to ${47 | 136}$. Thus the empirical data shows that

$\displaystyle \mathbf W(135|24) = 35|124.$

In general, it might not be that clear cut. For example, if we look at the permutations starting with ${2}$ and ${4}$, there is more than one.

• ${2 | 1 5 7 | 3 6 | 4}$ and ${2 | 1 6 7 | 3 5 | 4}$ are both in ${\mathcal L_{k+2}(1,3,2,1)}$.
• ${2 | 5 7 | 1 3 6 | 4}$ and ${2 | 6 7 | 1 3 5 | 4}$ are both in in ${\mathcal L_{k+2}(1,2,3,1)}$.

Thus

$\displaystyle \mathbf W( \{135|24, 145|23\} ) = \{35|124, 45|123\}$

but we can’t tell which one goes to which (although you might be able to guess).

Fortunately, there is lots of data. This example narrowed ${135|24}$ down to two values, but if you look at other places you might have different data on ${135|24}$. Since we think ${\mathbf W}$ is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, ${\mathbf W}$ is a bijection, so once we match either of ${135|24}$ or ${145|23}$, we’ve matched the other.

You know what this sounds like? Perfect matchings.

So here’s the experimental procedure.

• Enumerate all permutations in ${\mathcal L_{k+2}(2,3,4,2)}$ and ${\mathcal L_{k+2}(2,4,3,2)}$.
• Take each possible tuple ${(\pi_1, \pi_2, \pi_{10}, \pi_{11})}$, and look at the permutations that start and end with those particular four elements. Record the reductions of ${\pi_3\pi_4\pi_5|\pi_6\pi_7\pi_8\pi_9}$ and ${\pi_3\pi_4\pi_5\pi_6|\pi_7\pi_8\pi_9}$ for all these permutations. We call these input words and output words, respectively. Each output word is a “candidate” of ${\mathbf W}$ for a input word.
• For each input word ${a_1a_2a_3|b_1b_2b_3b_4}$ that appeared, take the intersection of all output words that appeared. This gives a bipartite graph ${G}$, with input words being matched to their candidates.
• Find perfect matchings of the graph.

And with any luck that would tell us what ${\mathbf W}$ is.

### 4.3. Results

Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.

246|1357 => 2467|135
247|1356 => 2457|136
256|1347 => 2567|134
257|1346 => 2357|146
267|1345 => 2367|145
346|1257 => 3467|125
347|1256 => 3457|126
356|1247 => 3567|124
357|1246 => 1357|246
367|1245 => 1367|245
456|1237 => 4567|123
457|1236 => 1457|236
467|1235 => 1467|235
567|1234 => 1567|234


If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if ${7}$ is on the right half, then it always moves over.

Anyways, if you stare at this for an hour, you can actually figure out the exact rule:

Claim 11

Given an input ${a_1a_2a_3|b_1b_2b_3b_4}$, move ${b_{i+1}}$ if ${i}$ is the largest index for which ${a_i < b_{i+1}}$, or ${b_1 = 1}$ if no such index exists.

And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.

Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.

## 5. Generalization

### 5.1. Surprise

On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact ${\mathcal L_6(1,3,5,2)}$ and ${\mathcal L_6(1,5,3,2)}$ also had the same cardinality. Huh.

It seemed too good to be true, but I played around some more, and sure enough, the cardinality of ${\#\mathcal L_{k+2}(a_1, \dots, a_n)}$ seemed to only depend on the order of the ${a_i}$‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption ${a_i \in \{k,k+1\}}$ that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection

$\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)$

which only involves rearranging the elements of the ${i}$th and ${(i+1)}$st blocks.

It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.

### 5.2. More data science

I re-ran the experiment I had done before, now with ${\mathcal L_7(2,3,5,2) \rightarrow \mathcal L_7(2,5,3,2)}$. (This was interesting, because the ${8}$ elements in question could now have either longest increasing subsequence of length ${5}$, or instead of length ${6}$.)

The data I obtained was:

246|13578 => 24678|135
247|13568 => 24578|136
248|13567 => 24568|137
256|13478 => 25678|134
257|13468 => 23578|146
258|13467 => 23568|147
267|13458 => 23678|145
268|13457 => 23468|157
278|13456 => 23478|156
346|12578 => 34678|125
347|12568 => 34578|126
348|12567 => 34568|127
356|12478 => 35678|124
357|12468 => 13578|246
358|12467 => 13568|247
367|12458 => 13678|245
368|12457 => 13468|257
378|12456 => 13478|256
456|12378 => 45678|123
457|12368 => 14578|236
458|12367 => 14568|237
467|12358 => 14678|235
468|12357 => 12468|357
478|12356 => 12478|356
567|12348 => 15678|234
568|12347 => 12568|347
578|12346 => 12578|346
678|12345 => 12678|345


Okay, so it looks like:

• exactly two numbers are moving each time, and
• the length of the longest run is preserved.

Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like

$\displaystyle \mathcal L_{k+2}(7,4) \rightarrow \mathcal L_{k+2}(6,5) \rightarrow \mathcal L_{k+2}(5,6) \rightarrow \mathcal L_{k+2}(4,7)$

while preserving the length of increasing subsequences at each step.

So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:

Theorem 12

For any ${a_1}$, …, ${a_n}$, the cardinality

$\displaystyle \# \mathcal L_{k+2}(a_1, \dots, a_n)$

does not depend on the order of the ${a_i}$‘s.

## 6. Discovered vs invented

Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.

There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection ${\mathbf W}$ myself: it was “already there” and I found it by examining the data. In another world where ${\mathbf W}$ did not exist, all the creativity in the world wouldn’t have changed anything.

So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.

# Joyal’s Proof of Cayley’s Tree Formula

I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:

Theorem 1 (Cayley’s Formula)

The number of trees on ${n}$ labelled vertices is ${n^{n-2}}$.

Proof: We are going to construct a bijection between

• Functions ${\{1, 2, \dots, n\} \rightarrow \{1, 2, \dots, n\}}$ (of which there are ${n^n}$) and
• Trees on ${\{1, 2, \dots, n\}}$ with two distinguished nodes ${A}$ and ${B}$ (possibly ${A=B}$).

Let’s look at the first piece of data. We can visualize it as ${n}$ points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when ${n = 9}$.

You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.

Thus a directed pseudoforest can also be specified by

• a choice of some vertices to be in cycles (blue vertices),
• a permutation on the blue vertices (magenta arrows), and
• attachments of trees to the blue vertices (grey vertices and arrows).

Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case ${1 < 9 < 2 < 4 < 8 < 5}$):

This is exactly the data of a tree on the ${n}$ vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide). $\Box$

I’m reading through Primes of the Form ${x^2+ny^2}$, by David Cox (link; it’s good!). Here are the high-level notes I took on the first chapter, which is about the theory of quadratic forms.

(Meta point re blog: I’m probably going to start posting more and more of these more high-level notes/sketches on this blog on topics that I’ve been just learning. Up til now I’ve been mostly only posting things that I understand well and for which I have a very polished exposition. But the perfect is the enemy of the good here; given that I’m taking these notes for my own sake, I may as well share them to help others.)

## 1. Overview

Definition 1

For us a quadratic form is a polynomial ${Q = Q(x,y) = ax^2 + bxy + cy^2}$, where ${a}$, ${b}$, ${c}$ are some integers. We say that it is primitive if ${\gcd(a,b,c) = 1}$.

For example, we have the famous quadratic form

$\displaystyle Q_{\text{Fermat}}(x,y) = x^2+y^2.$

As readers are probably aware, we can say a lot about exactly which integers can be represented by ${Q_{\text{Fermat}}}$: by Fermat’s Christmas theorem, the primes ${p \equiv 1 \pmod 4}$ (and ${p=2}$) can all be written as the sum of two squares, while the primes ${p \equiv 3 \pmod 4}$ cannot. For convenience, let us say that:

Definition 2

Let ${Q}$ be a quadratic form. We say it represents the integer ${m}$ if there exists ${x,y \in \mathbb Z}$ with ${m = Q(x,y)}$. Moreover, ${Q}$ properly represents ${m}$ if one can find such ${x}$ and ${y}$ which are also relatively prime.

The basic question is: what can we say about which primes/integers are properly represented by a quadratic form? In fact, we will later restrict our attention to “positive definite” forms (described later).

For example, Fermat’s Christmas theorem now rewrites as:

Theorem 3 (Fermat’s Christmas theorem for primes)

An odd prime ${p}$ is (properly) represented by ${Q_{\text{Fermat}}}$ if and only if ${p \equiv 1 \pmod 4}$.

The proof of this is classical, see for example my olympiad handout. We also have the formulation for odd integers:

Theorem 4 (Fermat’s Christmas theorem for odd integers)

An odd integer ${m}$ is properly represented by ${Q_{\text{Fermat}}}$ if and only if all prime factors of ${m}$ are ${1 \pmod 4}$.

Proof: For the “if” direction, we use the fact that ${Q_{\text{Fermat}}}$ is multiplicative in the sense that

$\displaystyle (x^2+y^2)(u^2+v^2) = (xu \pm yv)^2 + (xv \mp yu)^2.$

For the “only if” part we use the fact that if a multiple of a prime ${p}$ is properly represented by ${Q_{\text{Fermat}}}$, then so is ${p}$. This follows by noticing that if ${x^2+y^2 \equiv 0 \pmod p}$ (and ${xy \not\equiv 0 \pmod p}$) then ${(x/y)^2 \equiv -1 \pmod p}$. $\Box$
Tangential remark: the two ideas in the proof will grow up in the following way.

• The fact that ${Q_{\text{Fermat}}}$ “multiplies nicely” will grow up to become the so-called composition of quadratic forms.
• The second fact will not generalize for an arbitrary form ${Q}$. Instead, we will see that if a multiple of ${p}$ is represented by a form ${Q}$ then some form of the same “discriminant” will represent the prime ${p}$, but this form need not be the same as ${Q}$ itself.

## 2. Equivalence of forms, and the discriminant

The first thing we should do is figure out when two forms are essentially the same: for example, ${x^2+5y^2}$ and ${5x^2+y^2}$ should clearly be considered the same. More generally, if we think of ${Q}$ as acting on ${\mathbb Z^{\oplus 2}}$ and ${T}$ is any automorphism of ${\mathbb Z^{\oplus 2}}$, then ${Q \circ T}$ should be considered the same as ${Q}$. Specifically,

Definition 5

Two forms ${Q_1}$ and ${Q_2}$ said to be equivalent if there exists

$\displaystyle T = \begin{pmatrix} p & q \\ r & s \end{pmatrix} \in \text{GL }(2,\mathbb Z)$

such that ${Q_2(x,y) = Q_1(px+ry, qx+sy)}$. We have ${\det T = ps-qr = \pm 1}$ and so we say the equivalence is

• a proper equivalence if ${\det T = +1}$, and
• an improper equivalence if ${\det T = -1}$.

So we generally will only care about forms up to proper equivalence. (It will be useful to distinguish between proper/improper equivalence later.)

Naturally we seek some invariants under this operation. By far the most important is:

Definition 6

The discriminant of a quadratic form ${Q = ax^2 + bxy + cy^2}$ is defined as

$\displaystyle D = b^2-4ac.$

The discriminant is invariant under equivalence (check this). Note also that we also have ${D \equiv 0 , 1 \pmod 4}$.

Observe that we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2.$

So if ${D < 0}$ and ${a > 0}$ (thus ${c > 0}$ too) then ${ax^2+bxy+cy^2 > 0}$ for all ${x,y > 0}$. Such quadratic forms are called positive definite, and we will restrict our attention to these forms.

Now that we have this invariant, we may as well classify equivalence classes of quadratic forms for a fixed discriminant. It turns out this can be done explicitly.

Definition 7

A quadratic form ${Q = ax^2 + bxy + cy^2}$ is reduced if

• it is primitive and positive definite,
• ${|b| \le a \le c}$, and
• ${b \ge 0}$ if either ${|b| = a}$ or ${a = c}$.

Exercise 8

Check there only finitely many reduced forms of a fixed discriminant.

Then the big huge theorem is:

Theorem 9 (Reduced forms give a set of representatives)

Every primitive positive definite form ${Q}$ of discriminant is properly equivalent to a unique reduced form. We call this the reduction of ${Q}$.

Proof: Omitted due to length, but completely elementary. It is a reduction argument with some number of cases. $\Box$

Thus, for any discriminant ${D}$ we can consider the set

$\displaystyle \text{Cl}(D) = \left\{ \text{reduced forms of discriminant } D \right\}$

which will be the equivalence classes of positive definite of discriminant ${D}$. By abuse of notation we will also consider it as the set of equivalence classes of primitive positive definite forms of discriminant ${D}$.

We also define ${h(D) = \left\lvert \text{Cl}(D) \right\rvert}$; by the exercise, ${h(D) < \infty}$. This is called the class number.

Moreover, we have ${h(D) \ge 1}$, because we can take ${x^2 - D/4 y^2}$ for ${D \equiv 0 \pmod 4}$ and ${x^2 + xy + (1-D)/4 y^2}$ for ${D \equiv 1 \pmod 4}$. We call this form the principal form.

## 3. Tables of quadratic forms

Example 10 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 0 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -4}$, with form ${x^2 + y^2}$.
• ${D = -8}$, with form ${x^2 + 2y^2}$.
• ${D = -12}$, with form ${x^2+3y^2}$.
• ${D = -16}$, with form ${x^2 + 4y^2}$.
• ${D = -28}$, with form ${x^2 + 7y^2}$.

This is in fact the complete list when ${D \equiv 0 \pmod 4}$.

Example 11 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 1 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -3}$, with form ${x^2 + xy + y^2}$.
• ${D = -7}$, with form ${x^2 + xy + 2y^2}$.
• ${D = -11}$, with form ${x^2 + xy + 3y^2}$.
• ${D = -19}$, with form ${x^2 + xy + 5y^2}$.
• ${D = -27}$, with form ${x^2 + xy + 7y^2}$.
• ${D = -43}$, with form ${x^2 + xy + 11y^2}$.
• ${D = -67}$, with form ${x^2 + xy + 17y^2}$.
• ${D = -163}$, with form ${x^2 + xy + 41y^2}$.

This is in fact the complete list when ${D \equiv 1 \pmod 4}$.

Example 12 (More examples of quadratic forms)

Here are tables for small discriminants with ${h(D) > 1}$. When ${D \equiv 0 \pmod 4}$ we have

• ${D = -20}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 3y^2}$ and ${x^2 + 5y^2}$.
• ${D = -24}$, with ${h(D) = 2}$ forms ${2x^2 + 3y^2}$ and ${x^2 + 6y^2}$.
• ${D = -32}$, with ${h(D) = 2}$ forms ${3x^2 + 2xy + 3y^2}$ and ${x^2 + 8y^2}$.
• ${D = -36}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 5y^2}$ and ${x^2 + 9y^2}$.
• ${D = -40}$, with ${h(D) = 2}$ forms ${2x^2 + 5y^2}$ and ${x^2 + 10y^2}$.
• ${D = -44}$, with ${h(D) = 3}$ forms ${3x^2 \pm 2xy + 4y^2}$ and ${x^2 + 11y^2}$.

As for ${D \equiv 1 \pmod 4}$ we have

• ${D = -15}$, with ${h(D) = 2}$ forms ${2x^2 + xy + 2y^2}$ and ${x^2 + xy + 4y^2}$.
• ${D = -23}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 3y^2}$ and ${x^2+ xy + 6y^2}$.
• ${D = -31}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 4}$ and ${x^2 + xy + 8y^2}$.
• ${D = -39}$, with ${h(D) = 4}$ forms ${3x^2 + 3xy + 4y^2}$, ${2x^2 \pm 2xy + 5y^2}$ and ${x^2 + xy + 10y^2}$.

Example 13 (Even More Examples of quadratic forms)

Here are some more selected examples:

• ${D = -56}$ has ${h(D) = 4}$ forms ${x^2+14y^2}$, ${2x^2+7y^2}$ and ${3x^2 \pm 2xy + 5y^2}$.
• ${D = -108}$ has ${h(D) = 3}$ forms ${x^2+27y^2}$ and ${4x^2 \pm 2xy + 7y^2}$.
• ${D = -256}$ has ${h(D) = 4}$ forms ${x^2+64y^2}$, ${4x^2+4xy+17y^2}$ and ${5x^2\pm2xy+13y^2}$.

## 4. The Character ${\chi_D}$

We can now connect this to primes ${p}$ as follows. Earlier we played with ${Q_{\text{Fermat}} = x^2+y^2}$, and observed that for odd primes ${p}$, ${p \equiv 1 \pmod 4}$ if and only if some multiple of ${p}$ is properly represented by ${Q_{\text{Fermat}}}$.

Our generalization is as follows:

Theorem 14 (Primes represented by some quadratic form)

Let ${D < 0}$ be a discriminant, and let ${p \nmid D}$ be an odd prime. Then the following are equivalent:

• ${\left( \frac Dp \right) = 1}$, i.e. ${D}$ is a quadratic residue modulo ${p}$.
• The prime ${p}$ is (properly) represented by some reduced quadratic form in ${\text{Cl}(D)}$.

This generalizes our result for ${Q_{\text{Fermat}}}$, but note that it uses ${h(-4) = 1}$ in an essential way! That is: if ${(-1/p) = 1}$, we know ${p}$ is represented by some quadratic form of discriminant ${D = -4}$\dots but only since ${h(-4) = 1}$ do we know that this form reduces to ${Q_{\text{Fermat}} = x^2+y^2}$.

Proof: First assume WLOG that ${p \nmid 4a}$ and ${Q(x,y) \equiv 0 \pmod p}$. Thus ${p \nmid y}$, since otherwise this would imply ${x \equiv y \equiv 0 \pmod p}$. Then

$\displaystyle 0 \equiv 4a \cdot Q(x,y) \equiv (2ax + by)^2 - Dy^2 \pmod p$

hence ${D \equiv \left( 2axy^{-1} + b \right)^2 \pmod p}$.

The converse direction is amusing: let ${m^2 = D + pk}$ for integers ${m}$, ${k}$. Consider the quadratic form

$\displaystyle Q(x,y) = px^2 + mxy + ky^2.$

It is primitive of discriminant ${D}$ and ${Q(1,0) = p}$. Now ${Q}$ may not be reduced, but that’s fine: just take the reduction of ${Q}$, which must also properly represent ${p}$. $\Box$

Thus to every discriminant ${D < 0}$ we can attach the Legendre character (is that the name?), which is a homomorphism

$\displaystyle \chi_D = \left( \tfrac{D}{\bullet} \right) : \left( \mathbb Z / D\mathbb Z \right)^\times \rightarrow \{ \pm 1 \}$

with the property that if ${p}$ is a rational prime not dividing ${D}$, then ${\chi_D(p) = \left( \frac{D}{p} \right)}$. This is abuse of notation since I should technically write ${\chi_D(p \pmod D)}$, but there is no harm done: one can check by quadratic reciprocity that if ${p \equiv q \pmod D}$ then ${\chi_D(p) = \chi_D(q)}$. Thus our previous result becomes:

Theorem 15 (${\ker(\chi_D)}$ consists of representable primes)

Let ${p \nmid D}$ be prime. Then ${p \in \ker(\chi_D)}$ if and only if some quadratic form in ${\text{Cl}(D)}$ represents ${p}$.

As a corollary of this, using the fact that ${h(-8) = h(-12) = h(-28) = 1}$ one can prove that

Corollary 16 (Fermat-type results for ${h(-4n) = 1}$)

Let ${p > 7}$ be a prime. Then ${p}$ is

• of the form ${x^2 + 2y^2}$ if and only if ${p \equiv 1, 3 \pmod 8}$.
• of the form ${x^2 + 3y^2}$ if and only if ${p \equiv 1 \pmod 3}$.
• of the form ${x^2 + 7y^2}$ if and only if ${p \equiv 1, 2, 4 \pmod 7}$.

Proof: The congruence conditions are equivalent to ${(-4n/p) = 1}$, and as before the only point is that the only reduced quadratic form for these ${D = -4n}$ is the principal one. $\Box$

## 5. Genus theory

What if ${h(D) > 1}$? Sometimes, we can still figure out which primes go where just by taking mods.

Let ${Q \in \text{Cl}(D)}$. Then it represents some residue classes of ${(\mathbb Z/D\mathbb Z)^\times}$. In that case we call the set of residue classes represented the genus of the quadratic form ${Q}$.

Example 17 (Genus theory of ${D = -20}$)

Consider ${D = -20}$, with

$\displaystyle \ker(\chi_D) = \left\{ 1, 3, 7, 9 \right\} \subseteq (\mathbb Z/D\mathbb Z)^\times.$

We consider the two elements of ${\text{Cl}(D)}$:

• ${x^2 + 5y^2}$ represents ${1, 9 \in (\mathbb Z/20\mathbb Z)^\times}$.
• ${2x^2+2xy+3y^2}$ represents ${3, 7 \in (\mathbb Z/20\mathbb Z)^\times}$.

Now suppose for example that ${p \equiv 9 \pmod{20}}$. It must be represented by one of these two quadratic forms, but the latter form is never ${9 \pmod{20}}$ and so it must be the first one. Thus we conclude that

• ${p = x^2+5y^2}$ if and only if ${p \equiv 1, 9 \pmod{20}}$.
• ${p = 2x^2 + 2xy + 3y^2}$ if and only if ${p \equiv 3, 7 \pmod{20}}$.

The thing that makes this work is that each genus appears exactly once. We are not always so lucky: for example when ${D = -108}$ we have that

Example 18 (Genus theory of ${D = -108}$)

The two elements of ${\text{Cl}(-108)}$ are:

• ${x^2+27y^2}$, which represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.
• ${4x^2 \pm 2xy + 7y^2}$, which also represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.

So the best we can conclude is that ${p = x^2+27y^2}$ OR ${p = 4x^2\pm2xy+7y^2}$ if and only if ${p \equiv 1 \pmod 3}$ This is because the two distinct quadratic forms of discriminant ${-108}$ happen to have the same genus.

We now prove that:

Theorem 19 (Genii are cosets of ${\ker(\chi_D)}$)

Let ${D}$ be a discriminant and consider the Legendre character ${\chi_D}$.

• The genus of the principal form of discriminant ${D}$ constitutes a subgroup ${H}$ of ${\ker(\chi_D)}$, which we call the principal genus.
• Any genus of a quadratic form in ${\text{Cl}(D)}$ is a coset of the principal genus ${H}$ in ${\ker(\chi_D)}$.

Proof: For the first part, we aim to show ${H}$ is multiplicatively closed. For ${D \equiv 0 \pmod 4}$, ${D = -4n}$ we use the fact that

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

For ${D \equiv 1 \pmod 4}$, we instead appeal to another “magic” identity

$\displaystyle 4\left( x^2+xy+\frac{1-D}{4}y^2 \right) \equiv (2x+y)^2 \pmod D$

and it follows from here that ${H}$ is actually the set of squares in ${(\mathbb Z/D\mathbb Z)^\times}$, which is obviously a subgroup.

Now we show that other quadratic forms have genus equal to a coset of the principal genus. For ${D \equiv 0 \pmod 4}$, with ${D = -4n}$ we can write

$\displaystyle a(ax^2+bxy+cy^2) = (ax+b/2 y)^2 + ny^2$

and thus the desired coset is shown to be ${a^{-1} H}$. As for ${D \equiv 1 \pmod 4}$, we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2 \equiv (2ax+by)^2 \pmod D$

so the desired coset is also ${a^{-1} H}$, since ${H}$ was the set of squares. $\Box$

Thus every genus is a coset of ${H}$ in ${\ker(\chi_D)}$. Thus:

Definition 20

We define the quotient group

$\displaystyle \text{Gen}(D) = \ker(\chi_D) / H$

which is the set of all genuses in discriminant ${D}$. One can view this as an abelian group by coset multiplication.

Thus there is a natural map

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

(The map is surjective by Theorem~14.) We also remark than ${\text{Gen}(D)}$ is quite well-behaved:

Proposition 21 (Structure of ${\text{Gen}(D)}$)

The group ${\text{Gen}(D)}$ is isomorphic to ${(\mathbb Z/2\mathbb Z)^{\oplus m}}$ for some integer ${m}$.

Proof: Observe that ${H}$ contains all the squares of ${\ker(\chi_D)}$: if ${f}$ is the principal form then ${f(t,0) = t^2}$. Thus claim each element of ${\text{Gen}(D)}$ has order at most ${2}$, which implies the result since ${\text{Gen}(D)}$ is a finite abelian group. $\Box$

In fact, one can compute the order of ${\text{Gen}(D)}$ exactly, but for this post I Will just state the result.

Theorem 22 (Order of ${\text{Gen}(D)}$)

Let ${D < 0}$ be a discriminant, and let ${r}$ be the number of distinct odd primes which divide ${D}$. Define ${\mu}$ by:

• ${\mu = r}$ if ${D \equiv 1 \pmod 4}$.
• ${\mu = r}$ if ${D = -4n}$ and ${n \equiv 3 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 1,2 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 4 \pmod 8}$.
• ${\mu = r+2}$ if ${D = -4n}$ and ${n \equiv 0 \pmod 8}$.

Then ${\left\lvert \text{Gen}(D) \right\rvert = 2^{\mu-1}}$.

## 6. Composition

We have already used once the nice identity

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

We are going to try and generalize this for any two quadratic forms in ${\text{Cl}(D)}$. Specifically,

Proposition 23 (Composition defines a group operation)

Let ${f,g \in \text{Cl}(D)}$. Then there is a unique ${h \in \text{Cl}(D)}$ and bilinear forms ${B_i(x,y,z,w) = a_ixz + b_ixw + c_iyz + d_iyw}$ for ${i=1,2}$ such that

• ${f(x,y) g(z,w) = h(B_1(x,y,z,w), B_2(x,y,z,w))}$.
• ${a_1b_2 - a_2b_1 = +f(1,0)}$.
• ${a_1c_2 - a_2c_1 = +g(1,0)}$.

In fact, without the latter two constraints we would instead have ${a_1b_2 - a_2b_1 = \pm f(1,0)}$ and ${a_1c_2 - a_2c_1 = \pm g(1,0)}$, and each choice of signs would yield one of four (possibly different) forms. So requiring both signs to be positive makes this operation well-defined. (This is why we like proper equivalence; it gives us a well-defined group structure, whereas with improper equivalence it would be impossible to put a group structure on the forms above.)

Taking this for granted, we then have that

Theorem 24 (Form class group)

Let ${D \equiv 0, 1 \pmod 4}$, ${D < 0}$ be a discriminant. Then ${\text{Cl}(D)}$ becomes an abelian group under composition, where

• The identity of ${\text{Cl}(D)}$ is the principal form, and
• The inverse of the form ${ax^2+bxy+cy^2}$ is ${ax^2-bxy+cy^2}$.

This group is called the form class group.

We then have a group homomorphism

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

Observe that ${ax^2 + bxy + cy^2}$ and ${ax^2 - bxy + cy^2}$ are inverses and that their ${\Phi_D}$ images coincide (being improperly equivalent); this is expressed in the fact that ${\text{Gen}(D)}$ has elements of order ${\le 2}$. As another corollary, the number of elements of ${\text{Cl}(D)}$ with a given genus is always a power of two.

We now define:

Definition 25

An integer ${n \ge 1}$ is convenient if the following equivalent conditions hold:

• The principal form ${x^2+ny^2}$ is the only reduced form with the principal genus.
• ${\Phi_D}$ is injective (hence an isomorphism).
• ${\left\lvert h(D) \right\rvert = 2^{\mu-1}}$.

Thus we arrive at the following corollary:

Corollary 26 (Convenient numbers have nice representations)

Let ${n \ge 1}$ be convenient. Then ${p}$ is of the form ${x^2+ny^2}$ if and only if ${p}$ lies in the principal genus.

Hence the represent-ability depends only on ${p \pmod{4n}}$.

OEIS A000926 lists 65 convenient numbers. This sequence is known to be complete except for at most one more number; moreover the list is complete assuming the Grand Riemann Hypothesis.

## 7. Cubic and quartic reciprocity

To treat the cases where ${n}$ is not convenient, the correct thing to do is develop class field theory. However, we can still make a little bit more progress if we bring higher reciprocity theorems to bear: we’ll handle the cases ${n=27}$ and ${n=64}$, two examples of numbers which are not convenient.

### 7.1. Cubic reciprocity

First, we prove that

Theorem 27 (On ${p = x^2+27y^2}$)

A prime ${p > 3}$ is of the form ${x^2+27y^2}$ if and only if ${p \equiv 1 \pmod 3}$ and ${2}$ is a cubic residue modulo ${p}$.

To do this we use cubic reciprocity, which requires working in the Eisenstein integers ${\mathbb Z[\omega]}$ where ${\omega}$ is a cube root of unity. There are six units in ${\mathbb Z[\omega]}$ (the sixth roots of unity), hence each nonzero number has six associates (differing by a unit), and the ring is in fact a PID.

Now if we let ${\pi}$ be a prime not dividing ${3}$, and ${\alpha}$ is coprime to ${\pi}$, then we can define the cubic Legendre symbol by setting

$\displaystyle \left( \frac{\alpha}{\pi} \right)_3 \equiv \alpha^{\frac13(N\pi-1)} \pmod \pi \in \left\{ 1, \omega, \omega^2 \right\}.$

Moreover, we can define a primary prime ${\pi \nmid 3}$ to be one such that ${\pi \equiv -1 \pmod 3}$; given any prime exactly one of the six associates is primary. We then have the following reciprocity theorem:

Theorem 28 (Cubic reciprocity)

If ${\pi}$ and ${\theta}$ are disjoint primary primes in ${\mathbb Z[\omega]}$ then

$\displaystyle \left( \frac{\pi}{\theta} \right)_3 = \left( \frac{\theta}{\pi} \right)_3.$

We also have the following supplementary laws: if ${\pi = (3m-1) + 3n\omega}$, then

$\displaystyle \left( \frac{\omega}{\pi} \right)_3 = \omega^{m+n} \qquad\text{and}\qquad \left( \frac{1-\omega}{\pi} \right)_3 = \omega^{2m}.$

The first supplementary law is for the unit (analogous to ${(-1/p)}$) while the second reciprocity law handles the prime divisors of ${3 = -\omega^2(1-\omega)^2}$ (analogous to ${(2/p)}$.)

We can tie this back into ${\mathbb Z}$ as follows. If ${p \equiv 1 \pmod 3}$ is a rational prime then it is represented by ${x^2+xy+y^2}$, and thus we can put ${p = \pi \overline{\pi}}$ for some prime ${\pi}$, ${N(\pi) = p}$. Consequently, we have a natural isomorphism

$\displaystyle \mathbb Z[\omega] / \pi \mathbb Z[\omega] \cong \mathbb Z / p \mathbb Z.$

Therefore, we see that a given ${a \in (\mathbb Z/p\mathbb Z)^\times}$ is a cubic residue if and only if ${(\alpha/\pi)_3 = 1}$.

In particular, we have the following corollary, which is all we will need:

Corollary 29 (When ${2}$ is a cubic residue)

Let ${p \equiv 1 \pmod 3}$ be a rational prime, ${p > 3}$. Write ${p = \pi \overline{\pi}}$ with ${\pi}$ primary. Then ${2}$ is a cubic residue modulo ${p}$ if and only if ${\pi \equiv 1 \pmod 2}$.

Proof: By cubic reciprocity:

$\displaystyle \left( \frac{2}{\pi} \right)_3 = \left( \frac{\pi}{2} \right)_3 \equiv \pi^{\frac13(N2-1)} \equiv \pi \pmod 2.$

$\Box$

Now we give the proof of Theorem~27. Proof: First assume

$\displaystyle p = x^2+27y^2 = \left( x+3\sqrt 3 y \right)\left( x-3\sqrt 3 y \right).$

Let ${\pi = x + 3 \sqrt{-3} y = (x+3y) + 6y\omega}$ be primary, noting that ${\pi \equiv 1 \pmod 2}$. Now clearly ${p \equiv 1 \pmod 3}$, so done by corollary.

For the converse, assume ${p \equiv 1 \pmod 3}$, ${p = \pi \overline{\pi}}$ with ${\pi}$ primary and ${\pi \equiv 1 \pmod 2}$. If we set ${\pi = a + b\omega}$ for integers ${a}$ and ${b}$, then the fact that ${\pi \equiv 1 \pmod 2}$ and ${\pi \equiv -1 \pmod 3}$ is enough to imply that ${6 \mid b}$ (check it!). Moreover,

$\displaystyle p = a^2-ab+b^2 = \left( a - \frac{1}{2} b \right)^2 + 27 \left( \frac16b \right)^2$

as desired. $\Box$

### 7.2. Quartic reciprocity

This time we work in ${\mathbb Z[i]}$, for which there are four units ${\pm 1}$, ${\pm i}$. A prime is primary if ${\pi \equiv 1 \pmod{2+2i}}$; every prime not dividing ${2 = -i(1+i)^2}$ has a unique associate which is primary. Then we can as before define

$\displaystyle \alpha^{\frac14(N\pi-1)} \equiv \left( \frac{\alpha}{\pi} \right)_4 \pmod{\pi} \in \left\{ \pm 1, \pm i \right\}$

where ${\pi}$ is primary, and ${\alpha}$ is nonzero mod ${\pi}$. As before ${p \equiv 1 \pmod 4}$, ${p = \pi\overline{\pi}}$ we have that ${a}$ is a quartic residue modulo ${p}$ if and only if ${\left( a/\pi \right)_4 = 1}$ thanks to the isomorphism

$\displaystyle \mathbb Z[i] / \pi \mathbb Z[i] \cong \mathbb Z / p \mathbb Z.$

Now we have

Theorem 30 (Quartic reciprocity)

If ${\pi}$ and ${\theta}$ are distinct primary primes in ${\mathbb Z[i]}$ then

$\displaystyle \left( \frac{\theta}{\pi} \right)_4 = \left( \frac{\pi}{\theta} \right)_4 (-1)^{\frac{1}{16}(N\theta-1)(N\pi-1)}.$

We also have supplementary laws that state that if ${\pi = a+bi}$ is primary, then

$\displaystyle \left( \frac{i}{\pi} \right)_4 = i^{-\frac{1}{2}(a-1)} \qquad\text{and}\qquad \left( \frac{1+i}{\pi} \right)_4 = i^{\frac14(a-b-b^2-1)}.$

Again, the first law handles units, and the second law handles the prime divisors of ${2}$. The corollary we care about this time in fact uses only the supplemental laws:

Corollary 31 (When ${2}$ is a quartic residue)

Let ${p \equiv 1 \pmod 4}$ be a prime, and put ${p = \pi\overline{\pi}}$ with ${\pi = a+bi}$ primary. Then

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-b/2}$

and in particular ${2}$ is a quartic residue modulo ${p}$ if and only if ${b \equiv 0 \pmod 8}$.

Proof: Note that ${2 = i^3(1+i)^2}$ and applying the above. Therefore

$\displaystyle \left( \frac{2}{\pi} \right)_4 = \left( \frac{i}{\pi} \right)_4^3 \left( \frac{1+i}{\pi} \right)_4^2 = i^{-\frac32(a-1)} \cdot i^{\frac12(a-b-b^2-1)} = i^{-(a-1) - \frac{1}{2} b(b+1)}.$

Now we assumed ${a+bi}$ is primary. We claim that

$\displaystyle a - 1 + \frac{1}{2} b^2 \equiv 0 \pmod 4.$

Note that since ${(a+bi)-1}$ was is divisible by ${2+2i}$, hence ${N(2+2i)=8}$ divides ${(a-1)^2+b^2}$. Thus

$\displaystyle 2(a-1) + b^2 \equiv 2(a-1) + (a-1)^2 \equiv (a-1)(a-3) \equiv 0 \pmod 8$

since ${a}$ is odd and ${b}$ is even. Finally,

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-(a-1) - \frac{1}{2} b(b+1)} = i^{-\frac{1}{2} b + (a-1+\frac{1}{2} b^2)} \equiv i^{-\frac{1}{2} b} \pmod p.$

$\Box$

From here we quickly deduce

Theorem 32 (On ${p = x^2+64y^2}$)

If ${p > 2}$ is prime, then ${p = x^2+64y^2}$ if and only if ${p \equiv 1 \pmod 4}$ and ${2}$ is a quartic residue modulo ${p}$.

# Some Thoughts on Olympiad Material Design

(This is a bit of a follow-up to the solution reading post last month. Spoiler warnings: USAMO 2014/6, USAMO 2012/2, TSTST 2016/4, and hints for ELMO 2013/1, IMO 2016/2.)

I want to say a little about the process which I use to design my olympiad handouts and classes these days (and thus by extension the way I personally think about problems). The short summary is that my teaching style is centered around showing connections and recurring themes between problems.

Now let me explain this in more detail.

## 1. Main ideas

Solutions to olympiad problems can look quite different from one another at a surface level, but typically they center around one or two main ideas, as I describe in my post on reading solutions. Because details are easy to work out once you have the main idea, as far as learning is concerned you can more or less throw away the details and pay most of your attention to main ideas.

Thus whenever I solve an olympiad problem, I make a deliberate effort to summarize the solution in a few sentences, such that I basically know how to do it from there. I also make a deliberate effort, whenever I write up a solution in my notes, to structure it so that my future self can see all the key ideas at a glance and thus be able to understand the general path of the solution immediately.

The example I’ve previously mentioned is USAMO 2014/6.

Example 1 (USAMO 2014, Gabriel Dospinescu)

Prove that there is a constant ${c>0}$ with the following property: If ${a, b, n}$ are positive integers such that ${\gcd(a+i, b+j)>1}$ for all ${i, j \in \{0, 1, \dots, n\}}$, then

$\displaystyle \min\{a, b\}> (cn)^n.$

If you look at any complete solution to the problem, you will see a lot of technical estimates involving ${\zeta(2)}$ and the like. But the main idea is very simple: “consider an ${N \times N}$ table of primes and note the small primes cannot adequately cover the board, since ${\sum p^{-2} < \frac{1}{2}}$”. Once you have this main idea the technical estimates are just the grunt work that you force yourself to do if you’re a contestant (and don’t do if you’re retired like me).

Thus the study of olympiad problems is reduced to the study of main ideas behind these problems.

## 2. Taxonomy

So how do we come up with the main ideas? Of course I won’t be able to answer this question completely, because therein lies most of the difficulty of olympiads.

But I do have some progress in this way. It comes down to seeing how main ideas are similar to each other. I spend a lot of time trying to classify the main ideas into categories or themes, based on how similar they feel to one another. If I see one theme pop up over and over, then I can make it into a class.

I think olympiad taxonomy is severely underrated, and generally not done correctly. The status quo is that people do bucket sorts based on the particular technical details which are present in the problem. This is correlated with the main ideas, but the two do not always coincide.

An example where technical sort works okay is Euclidean geometry. Here is a simple example: harmonic bundles in projective geometry. As I explain in my book, there are a few “basic” configurations involved:

• Midpoints and parallel lines
• The Ceva / Menelaus configuration
• Harmonic quadrilateral / symmedian configuration
• Apollonian circle (right angle and bisectors)

(For a reference, see Lemmas 2, 4, 5 and Exercise 0 here.) Thus from experience, any time I see one of these pictures inside the current diagram, I think to myself that “this problem feels projective”; and if there is a way to do so I try to use harmonic bundles on it.

An example where technical sort fails is the “pigeonhole principle”. A typical problem in such a class looks something like USAMO 2012/2.

Example 2 (USAMO 2012, Gregory Galperin)

A circle is divided into congruent arcs by ${432}$ points. The points are colored in four colors such that some ${108}$ points are colored Red, some ${108}$ points are colored Green, some ${108}$ points are colored Blue, and the remaining ${108}$ points are colored Yellow. Prove that one can choose three points of each color in such a way that the four triangles formed by the chosen points of the same color are congruent.

It’s true that the official solution uses the words “pigeonhole principle” but that is not really the heart of the matter; the key idea is that you consider all possible rotations and count the number of incidences. (In any case, such calculations are better done using expected value anyways.)

Now why is taxonomy a good thing for learning and teaching? The reason is that building connections and seeing similarities is most easily done by simultaneously presenting several related problems. I’ve actually mentioned this already in a different blog post, but let me give the demonstration again.

Suppose I wrote down the following:

$\displaystyle \begin{array}{lll} A1 & B11 & C8 \\ A9 & B44 & C27 \\ A49 & B33 & C343 \\ A16 & B99 & C1 \\ A25 & B22 & C125 \end{array}$

You can tell what each of the ${A}$‘s, ${B}$‘s, ${C}$‘s have in common by looking for a few moments. But what happens if I intertwine them?

$\displaystyle \begin{array}{lllll} B11 & C27 & C343 & A1 & A9 \\ C125 & B33 & A49 & B44 & A25 \\ A16 & B99 & B22 & C8 & C1 \end{array}$

This is the same information, but now you have to work much harder to notice the association between the letters and the numbers they’re next to.

This is why, if you are an olympiad student, I strongly encourage you to keep a journal or blog of the problems you’ve done. Solving olympiad problems takes lots of time and so it’s worth it to spend at least a few minutes jotting down the main ideas. And once you have enough of these, you can start to see new connections between problems you haven’t seen before, rather than being confined to thinking about individual problems in isolation. (Additionally, it means you will never have redo problems to which you forgot the solution — learn from my mistake here.)

## 3. Ten buckets of geometry

I want to elaborate more on geometry in general. These days, if I see a solution to a Euclidean geometry problem, then I mentally store the problem and solution into one (or more) buckets. I can even tell you what my buckets are:

1. Direct angle chasing
2. Power of a point / radical axis
3. Homothety, similar triangles, ratios
4. Recognizing some standard configuration (see Yufei for a list)
5. Doing some length calculations
6. Complex numbers
7. Barycentric coordinates
8. Inversion
9. Harmonic bundles or pole/polar and homography
10. Spiral similarity, Miquel points

which my dedicated fans probably recognize as the ten chapters of my textbook. (Problems may also fall in more than one bucket if for example they are difficult and require multiple key ideas, or if there are multiple solutions.)

Now whenever I see a new geometry problem, the diagram will often “feel” similar to problems in a certain bucket. Exactly what I mean by “feel” is hard to formalize — it’s a certain gut feeling that you pick up by doing enough examples. There are some things you can say, such as “problems which feature a central circle and feet of altitudes tend to fall in bucket 6”, or “problems which only involve incidence always fall in bucket 9”. But it seems hard to come up with an exhaustive list of hard rules that will do better than human intuition.

## 4. How do problems feel?

But as I said in my post on reading solutions, there are deeper lessons to teach than just technical details.

For examples of themes on opposite ends of the spectrum, let’s move on to combinatorics. Geometry is quite structured and so the themes in the main ideas tend to translate to specific theorems used in the solution. Combinatorics is much less structured and many of the themes I use in combinatorics cannot really be formalized. (Consequently, since everyone else seems to mostly teach technical themes, several of the combinatorics themes I teach are idiosyncratic, and to my knowledge are not taught by anyone else.)

For example, one of the unusual themes I teach is called Global. It’s about the idea that to solve a problem, you can just kind of “add up everything at once”, for example using linearity of expectation, or by double-counting, or whatever. In particular these kinds of approach ignore the “local” details of the problem. It’s hard to make this precise, so I’ll just give two recent examples.

Example 3 (ELMO 2013, Ray Li)

Let ${a_1,a_2,\dots,a_9}$ be nine real numbers, not necessarily distinct, with average ${m}$. Let ${A}$ denote the number of triples ${1 \le i < j < k \le 9}$ for which ${a_i + a_j + a_k \ge 3m}$. What is the minimum possible value of ${A}$?

Example 4 (IMO 2016)

Find all integers ${n}$ for which each cell of ${n \times n}$ table can be filled with one of the letters ${I}$, ${M}$ and ${O}$ in such a way that:

• In each row and column, one third of the entries are ${I}$, one third are ${M}$ and one third are ${O}$; and
• in any diagonal, if the number of entries on the diagonal is a multiple of three, then one third of the entries are ${I}$, one third are ${M}$ and one third are ${O}$.

If you look at the solutions to these problems, they have the same “feeling” of adding everything up, even though the specific techniques are somewhat different (double-counting for the former, diagonals modulo ${3}$ for the latter). Nonetheless, my experience with problems similar to the former was immensely helpful for the latter, and it’s why I was able to solve the IMO problem.

## 5. Gaps

This perspective also explains why I’m relatively bad at functional equations. There are some things I can say that may be useful (see my handouts), but much of the time these are just technical tricks. (When sorting functional equations in my head, I have a bucket called “standard fare” meaning that you “just do work”; as far I can tell this bucket is pretty useless.) I always feel stupid teaching functional equations, because I never have many good insights to say.

Part of the reason is that functional equations often don’t have a main idea at all. Consequently it’s hard for me to do useful taxonomy on them.

Then sometimes you run into something like the windmill problem, the solution of which is fairly “novel”, not being similar to problems that come up in training. I have yet to figure out a good way to train students to be able to solve windmill-like problems.

## 6. Surprise

I’ll close by mentioning one common way I come up with a theme.

Sometimes I will run across an olympiad problem ${P}$ which I solve quickly, and think should be very easy, and yet once I start grading ${P}$ I find that the scores are much lower than I expected. Since the way I solve problems is by drawing experience from similar previous problems, this must mean that I’ve subconsciously found a general framework to solve problems like ${P}$, which is not obvious to my students yet. So if I can put my finger on what that framework is, then I have something new to say.

The most recent example I can think of when this happened was TSTST 2016/4 which was given last June (and was also a very elegant problem, at least in my opinion).

Example 5 (TSTST 2016, Linus Hamilton)

Let ${n > 1}$ be a positive integers. Prove that we must apply the Euler ${\varphi}$ function at least ${\log_3 n}$ times before reaching ${1}$.

I solved this problem very quickly when we were drafting the TSTST exam, figuring out the solution while walking to dinner. So I was quite surprised when I looked at the scores for the problem and found out that empirically it was not that easy.

After I thought about this, I have a new tentative idea. You see, when doing this problem I really was thinking about “what does this ${\varphi}$ operation do?”. You can think of ${n}$ as an infinite tuple

$\displaystyle \left(\nu_2(n), \nu_3(n), \nu_5(n), \nu_7(n), \dots \right)$

of prime exponents. Then the ${\varphi}$ can be thought of as an operation which takes each nonzero component, decreases it by one, and then adds some particular vector back. For example, if ${\nu_7(n) > 0}$ then ${\nu_7}$ is decreased by one and each of ${\nu_2(n)}$ and ${\nu_3(n)}$ are increased by one. In any case, if you look at this behavior for long enough you will see that the ${\nu_2}$ coordinate is a natural way to “track time” in successive ${\varphi}$ operations; once you figure this out, getting the bound of ${\log_3 n}$ is quite natural. (Details left as exercise to reader.)

Now when I read through the solutions, I found that many of them had not really tried to think of the problem in such a “structured” way, and had tried to directly solve it by for example trying to prove ${\varphi(n) \ge n/3}$ (which is false) or something similar to this. I realized that had the students just ignored the task “prove ${n \le 3^k}$” and spent some time getting a better understanding of the ${\varphi}$ structure, they would have had a much better chance at solving the problem. Why had I known that structural thinking would be helpful? I couldn’t quite explain it, but it had something to do with the fact that the “main object” of the question was “set in stone”; there was no “degrees of freedom” in it, and it was concrete enough that I felt like I could understand it. Once I understood how multiple ${\varphi}$ operations behaved, the bit about ${\log_3 n}$ almost served as an “answer extraction” mechanism.

These thoughts led to the recent development of a class which I named Rigid, which is all about problems where the point is not to immediately try to prove what the question asks for, but to first step back and understand completely how a particular rigid structure (like the ${\varphi}$ in this problem) behaves, and to then solve the problem using this understanding.