Some Thoughts on Olympiad Material Design

(This is a bit of a follow-up to the solution reading post last month. Spoiler warnings: USAMO 2014/6, USAMO 2012/2, TSTST 2016/4, and hints for ELMO 2013/1, IMO 2016/2.)

I want to say a little about the process which I use to design my olympiad handouts and classes these days (and thus by extension the way I personally think about problems). The short summary is that my teaching style is centered around showing connections and recurring themes between problems.

Now let me explain this in more detail.

1. Main ideas

Solutions to olympiad problems can look quite different from one another at a surface level, but typically they center around one or two main ideas, as I describe in my post on reading solutions. Because details are easy to work out once you have the main idea, as far as learning is concerned you can more or less throw away the details and pay most of your attention to main ideas.

Thus whenever I solve an olympiad problem, I make a deliberate effort to summarize the solution in a few sentences, such that I basically know how to do it from there. I also make a deliberate effort, whenever I write up a solution in my notes, to structure it so that my future self can see all the key ideas at a glance and thus be able to understand the general path of the solution immediately.

The example I’ve previously mentioned is USAMO 2014/6.

Example 1 (USAMO 2014, Gabriel Dospinescu)

Prove that there is a constant {c>0} with the following property: If {a, b, n} are positive integers such that {\gcd(a+i, b+j)>1} for all {i, j \in \{0, 1, \dots, n\}}, then

\displaystyle  \min\{a, b\}> (cn)^n.

If you look at any complete solution to the problem, you will see a lot of technical estimates involving {\zeta(2)} and the like. But the main idea is very simple: “consider an {N \times N} table of primes and note the small primes cannot adequately cover the board, since {\sum p^{-2} < \frac{1}{2}}”. Once you have this main idea the technical estimates are just the grunt work that you force yourself to do if you’re a contestant (and don’t do if you’re retired like me).

Thus the study of olympiad problems is reduced to the study of main ideas behind these problems.

2. Taxonomy

So how do we come up with the main ideas? Of course I won’t be able to answer this question completely, because therein lies most of the difficulty of olympiads.

But I do have some progress in this way. It comes down to seeing how main ideas are similar to each other. I spend a lot of time trying to classify the main ideas into categories or themes, based on how similar they feel to one another. If I see one theme pop up over and over, then I can make it into a class.

I think olympiad taxonomy is severely underrated, and generally not done correctly. The status quo is that people do bucket sorts based on the particular technical details which are present in the problem. This is correlated with the main ideas, but the two do not always coincide.

An example where technical sort works okay is Euclidean geometry. Here is a simple example: harmonic bundles in projective geometry. As I explain in my book, there are a few “basic” configurations involved:

  • Midpoints and parallel lines
  • The Ceva / Menelaus configuration
  • Harmonic quadrilateral / symmedian configuration
  • Apollonian circle (right angle and bisectors)

(For a reference, see Lemmas 2, 4, 5 and Exercise 0 here.) Thus from experience, any time I see one of these pictures inside the current diagram, I think to myself that “this problem feels projective”; and if there is a way to do so I try to use harmonic bundles on it.

An example where technical sort fails is the “pigeonhole principle”. A typical problem in such a class looks something like USAMO 2012/2.

Example 2 (USAMO 2012, Gregory Galperin)

A circle is divided into congruent arcs by {432} points. The points are colored in four colors such that some {108} points are colored Red, some {108} points are colored Green, some {108} points are colored Blue, and the remaining {108} points are colored Yellow. Prove that one can choose three points of each color in such a way that the four triangles formed by the chosen points of the same color are congruent.

It’s true that the official solution uses the words “pigeonhole principle” but that is not really the heart of the matter; the key idea is that you consider all possible rotations and count the number of incidences. (In any case, such calculations are better done using expected value anyways.)

Now why is taxonomy a good thing for learning and teaching? The reason is that building connections and seeing similarities is most easily done by simultaneously presenting several related problems. I’ve actually mentioned this already in a different blog post, but let me give the demonstration again.

Suppose I wrote down the following:

\displaystyle  \begin{array}{lll} A1 & B11 & C8 \\ A9 & B44 & C27 \\ A49 & B33 & C343 \\ A16 & B99 & C1 \\ A25 & B22 & C125 \end{array}

You can tell what each of the {A}‘s, {B}‘s, {C}‘s have in common by looking for a few moments. But what happens if I intertwine them?

\displaystyle  \begin{array}{lllll} B11 & C27 & C343 & A1 & A9 \\ C125 & B33 & A49 & B44 & A25 \\ A16 & B99 & B22 & C8 & C1 \end{array}

This is the same information, but now you have to work much harder to notice the association between the letters and the numbers they’re next to.

This is why, if you are an olympiad student, I strongly encourage you to keep a journal or blog of the problems you’ve done. Solving olympiad problems takes lots of time and so it’s worth it to spend at least a few minutes jotting down the main ideas. And once you have enough of these, you can start to see new connections between problems you haven’t seen before, rather than being confined to thinking about individual problems in isolation. (Additionally, it means you will never have redo problems to which you forgot the solution — learn from my mistake here.)

3. Ten buckets of geometry

I want to elaborate more on geometry in general. These days, if I see a solution to a Euclidean geometry problem, then I mentally store the problem and solution into one (or more) buckets. I can even tell you what my buckets are:

  1. Direct angle chasing
  2. Power of a point / radical axis
  3. Homothety, similar triangles, ratios
  4. Recognizing some standard configuration (see Yufei for a list)
  5. Doing some length calculations
  6. Complex numbers
  7. Barycentric coordinates
  8. Inversion
  9. Harmonic bundles or pole/polar and homography
  10. Spiral similarity, Miquel points

which my dedicated fans probably recognize as the ten chapters of my textbook. (Problems may also fall in more than one bucket if for example they are difficult and require multiple key ideas, or if there are multiple solutions.)

Now whenever I see a new geometry problem, the diagram will often “feel” similar to problems in a certain bucket. Exactly what I mean by “feel” is hard to formalize — it’s a certain gut feeling that you pick up by doing enough examples. There are some things you can say, such as “problems which feature a central circle and feet of altitudes tend to fall in bucket 6”, or “problems which only involve incidence always fall in bucket 9”. But it seems hard to come up with an exhaustive list of hard rules that will do better than human intuition.

4. How do problems feel?

But as I said in my post on reading solutions, there are deeper lessons to teach than just technical details.

For examples of themes on opposite ends of the spectrum, let’s move on to combinatorics. Geometry is quite structured and so the themes in the main ideas tend to translate to specific theorems used in the solution. Combinatorics is much less structured and many of the themes I use in combinatorics cannot really be formalized. (Consequently, since everyone else seems to mostly teach technical themes, several of the combinatorics themes I teach are idiosyncratic, and to my knowledge are not taught by anyone else.)

For example, one of the unusual themes I teach is called Global. It’s about the idea that to solve a problem, you can just kind of “add up everything at once”, for example using linearity of expectation, or by double-counting, or whatever. In particular these kinds of approach ignore the “local” details of the problem. It’s hard to make this precise, so I’ll just give two recent examples.

Example 3 (ELMO 2013, Ray Li)

Let {a_1,a_2,\dots,a_9} be nine real numbers, not necessarily distinct, with average {m}. Let {A} denote the number of triples {1 \le i < j < k \le 9} for which {a_i + a_j + a_k \ge 3m}. What is the minimum possible value of {A}?

Example 4 (IMO 2016)

Find all integers {n} for which each cell of {n \times n} table can be filled with one of the letters {I}, {M} and {O} in such a way that:

  • In each row and column, one third of the entries are {I}, one third are {M} and one third are {O}; and
  • in any diagonal, if the number of entries on the diagonal is a multiple of three, then one third of the entries are {I}, one third are {M} and one third are {O}.

If you look at the solutions to these problems, they have the same “feeling” of adding everything up, even though the specific techniques are somewhat different (double-counting for the former, diagonals modulo {3} for the latter). Nonetheless, my experience with problems similar to the former was immensely helpful for the latter, and it’s why I was able to solve the IMO problem.

5. Gaps

This perspective also explains why I’m relatively bad at functional equations. There are some things I can say that may be useful (see my handouts), but much of the time these are just technical tricks. (When sorting functional equations in my head, I have a bucket called “standard fare” meaning that you “just do work”; as far I can tell this bucket is pretty useless.) I always feel stupid teaching functional equations, because I never have many good insights to say.

Part of the reason is that functional equations often don’t have a main idea at all. Consequently it’s hard for me to do useful taxonomy on them.

Then sometimes you run into something like the windmill problem, the solution of which is fairly “novel”, not being similar to problems that come up in training. I have yet to figure out a good way to train students to be able to solve windmill-like problems.

6. Surprise

I’ll close by mentioning one common way I come up with a theme.

Sometimes I will run across an olympiad problem {P} which I solve quickly, and think should be very easy, and yet once I start grading {P} I find that the scores are much lower than I expected. Since the way I solve problems is by drawing experience from similar previous problems, this must mean that I’ve subconsciously found a general framework to solve problems like {P}, which is not obvious to my students yet. So if I can put my finger on what that framework is, then I have something new to say.

The most recent example I can think of when this happened was TSTST 2016/4 which was given last June (and was also a very elegant problem, at least in my opinion).

Example 5 (TSTST 2016, Linus Hamilton)

Let {n > 1} be a positive integers. Prove that we must apply the Euler {\varphi} function at least {\log_3 n} times before reaching {1}.

I solved this problem very quickly when we were drafting the TSTST exam, figuring out the solution while walking to dinner. So I was quite surprised when I looked at the scores for the problem and found out that empirically it was not that easy.

After I thought about this, I have a new tentative idea. You see, when doing this problem I really was thinking about “what does this {\varphi} operation do?”. You can think of {n} as an infinite tuple

\displaystyle  \left(\nu_2(n), \nu_3(n), \nu_5(n), \nu_7(n), \dots \right)

of prime exponents. Then the {\varphi} can be thought of as an operation which takes each nonzero component, decreases it by one, and then adds some particular vector back. For example, if {\nu_7(n) > 0} then {\nu_7} is decreased by one and each of {\nu_2(n)} and {\nu_3(n)} are increased by one. In any case, if you look at this behavior for long enough you will see that the {\nu_2} coordinate is a natural way to “track time” in successive {\varphi} operations; once you figure this out, getting the bound of {\log_3 n} is quite natural. (Details left as exercise to reader.)

Now when I read through the solutions, I found that many of them had not really tried to think of the problem in such a “structured” way, and had tried to directly solve it by for example trying to prove {\varphi(n) \ge n/3} (which is false) or something similar to this. I realized that had the students just ignored the task “prove {n \le 3^k}” and spent some time getting a better understanding of the {\varphi} structure, they would have had a much better chance at solving the problem. Why had I known that structural thinking would be helpful? I couldn’t quite explain it, but it had something to do with the fact that the “main object” of the question was “set in stone”; there was no “degrees of freedom” in it, and it was concrete enough that I felt like I could understand it. Once I understood how multiple {\varphi} operations behaved, the bit about {\log_3 n} almost served as an “answer extraction” mechanism.

These thoughts led to the recent development of a class which I named Rigid, which is all about problems where the point is not to immediately try to prove what the question asks for, but to first step back and understand completely how a particular rigid structure (like the {\varphi} in this problem) behaves, and to then solve the problem using this understanding.


Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:


[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:


Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.


    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

New algebra handouts on my website

For olympiad students: I have now published some new algebra handouts. They are:

  • Introduction to Functional Equations, which cover the basic techniques and theory for FE’s typically appearing on olympiads like USA(J)MO.
  • Monsters, an advanced handout which covers functional equations that have pathological solutions. It covers in detail the solutions to Cauchy functional equation.
  • Summation, which is a compilation of various types of olympiad-style sums like generating functions and multiplicative number theory.

I have also uploaded:

  • English, notes on proof-writing that I used at the 2016 MOP (Mathematical Olympiad Summer Program).

You can download all these (and other handouts) from my MIT website. Enjoy!

Against the “Research vs. Olympiads” Mantra

There’s a Mantra that you often hear in math contest discussions: “math olympiads are very different from math research”. (For known instances, see O’Neil, Tao, and more. More neutral stances: Monks, Xu.)

It’s true. And I wish people would stop saying it.

Every time I’ve heard the Mantra, it set off a little red siren in my head: something felt wrong. And I could never figure out quite why until last July. There was some (silly) forum discussion about how Allen Liu had done extraordinarily on math contests over the past year. Then someone says:

A: Darn, what math problem can he not do?!

B: I’ll go out on a limb and say that the answer to this is “most of the problems worth asking.” We’ll see where this stands in two years, at which point the answer will almost certainly change, but research \neq Olympiads.

Then it hit me.

Ping-pong vs. Tennis

Let’s try the following thought experiment. Consider a world-class ping-pong player, call her Sarah. She has a fan-base talking about her pr0 ping-pong skills. Then someone comes along as says:

Well, table tennis isn’t the same as tennis.

To which I and everyone else reasonable would say, “uh, so what?”. It’s true, but totally irrelevant; ping-pong and tennis are just not related. Maybe Sarah will be better than average at tennis, but there’s no reason to expect her to be world-class in that too.

And yet we say exactly the same thing for olympiads versus research. Someone wins the IMO, out pops the Mantra. Even if the Mantra is true when taken literally, it’s implicitly sending the message there’s something wrong with being good at contests and not good at research.

So now I ask: just what is wrong with that? To answer this question, I first need to answer: “what is math?”.

There’s been a trick played with this debate, and you can’t see it unless you taboo the word “math”. The word “math” can refer to a bunch of things, like:

  • Training for contest problems like USAMO/IMO, or
  • Learning undergraduate/graduate materials like algebra and analysis, or
  • Working on open problems and conjectures (“research”).

So here’s the trick. The research community managed to claim the name “math”, leaving only “math contests” for the olympiad community. Now the sentence

“Math contests should be relevant to math”

seems totally innocuous. But taboo the world “math”, and you get

“Olympiads should be relevant to research”

and then you notice something’s wrong. In other words, since “math” is a substring of “math contests”, it suddenly seems like the olympiads are subordinate to research. All because of an accident in naming.

Since when? Everyone agrees that olympiads and research are different things, but it does not then follow that “olympiads are useless”. Even if ping-pong is called “table tennis”, that doesn’t mean the top ping-pong players are somehow inferior to top tennis players. (And the scary thing is that in a world without the name “ping-pong”, I can imagine some people actually thinking so.)

I think for many students, olympiads do a lot of good, independent of any value to future math research. Math olympiads give high school students something interesting to work on, and even the training process for a contest such that the IMO carries valuable life lessons: it teaches you how to work hard even in the face of possible failure, and what it’s like to be competitive at an international level (i.e. what it’s like to become really good at something after years of hard work). The peer group that math contests give is also wonderful, and quite similar to the kind of people you’d meet at a top-tier university (and in some cases, they’re more or less the same people). And the problem solving ability you gain from math contests is indisputably helpful elsewhere in life. Consequently, I’m well on record as saying the biggest benefits of math contests have nothing to do with math.

There are also more mundane (but valid) reasons (they help get students out of the classroom, and other standard blurbs about STEM and so on). And as a matter of taste I also think contest problems are interesting and beautiful in their own right. You could even try to make more direct comparisons (for example, I’d guess the average arXiv paper in algebraic geometry gets less attention than the average IMO geometry problem), but that’s a point for another blog post entirely.

The Right and Virtuous Path

Which now leads me to what I think is a culture issue.

MOP alumni prior to maybe 2010 or so were classified into two groups. They would either go on to math research, which was somehow seen as the “right and virtuous path“, or they would defect to software/finance/applied math/etc. Somehow there is always this implicit, unspoken message that the smart MOPpers do math research and the dumb MOPpers drop out.

I’ll tell you how I realized why I didn’t like the Mantra: it’s because the only time I hear the Mantra is when someone is belittling olympiad medalists.

The Mantra says that the USA winning the IMO is no big deal. The Mantra says Allen Liu isn’t part of the “smart club” until he succeeds in research too. The Mantra says that the countless time and energy put into running each year’s MOP are a waste of time. The Mantra says that the students who eventually drop out of math research are “not actually good at math” and “just good at taking tests”. The Mantra even tells outsiders that they, too, can be great researchers, because olympiads are useless anyways.

The Mantra is math research’s recruiting slogan.

And I think this is harmful. The purpose of olympiads was never to produce more math researchers. If it’s really the case that olympiads and research are totally different, then we should expect relatively few olympiad students to go into research; yet in practice, a lot of them do. I think one could make a case that a lot of the past olympiad students are going into math research without realizing that they’re getting into something totally unrelated, just because the sign at the door said “math”. One could also make a case that it’s very harmful for those that don’t do research, or try research and then decide they don’t like it: suddenly these students don’t think they’re “good at math” any more, they’re not smart enough be a mathematician, etc.

But we need this kind of problem-solving skill and talent too much for it to all be spent on computing R(6,6). Richard Rusczyk’s take from Math Prize for Girls 2014 is:

When people ask me, am I disappointed when my students don’t go off and be mathematicians, my answer is I’d be very disappointed if they all did. We need people who can think about these complex problems and solve really hard problems they haven’t seen before everywhere. It’s not just in math, it’s not just in the sciences, it’s not just in medicine — I mean, what we’d give to get some of them in Congress!

Academia is a fine career, but there’s tons of other options out there: the research community may denounce those who switch out as failures, but I’m sure society will take them with open arms.

To close, I really like this (sarcastic) comment from Steven Karp (near bottom):

Contest math is inaccessible to over 90% of people as it is, and then we’re supposed to tell those that get it that even that isn’t real math? While we’re at it, let’s tell Vi Hart to stop making videos because they don’t accurately represent math research.

Addendums (response to comments)

Thanks first of all for the many long and thoughtful comments from everyone (both here, on Facebook, in private, and so on). It’s given me a lot to think about.

Here’s my responses to some of the points that were raised, which is necessarily incomplete because of the volume of discussion.

  1. To start off, it was suggested I should explicitly clarify: I do not mean to imply that people who didn’t do well on contests cannot do well in math research. So let me say that now.

  2. My favorite comment that I got was that in fact this whole post pattern matches with bravery debates.

    On one hand you have lots of olympiad students who actually FEEL BAD about winning medals because they “weren’t doing real math”. But on the other hand there are students whose parents tell them to not pursue math as a major or career because of low contest scores. These students (and their parents) would benefit a lot from the Mantra; so I concede that there are indeed good use cases of the Mantra (such as those that Anonymous Chicken, betaveros describe below) and in particular the Mantra is not intrinsically bad.

    Which of these use is the “common use” probably depends on which tribes you are part of (guess which one I see more?). It’s interesting in that in this case, the two sides actually agree on the basic fact (that contests and research are not so correlated).

  3. Some people point out that research is a career while contests aren’t. I am not convinced by this; I don’t think “is a career” is a good metric for measuring value to society, and can think of several examples of actual jobs that I think really should not exist (not saying any names). In addition, I think that if the general public understood what mathematicians actually do for a career, they just might be a little less willing to pay us.

    I think there’s an interesting discussion about whether contests / research are “valuable” or not, but I don’t think the answer is one-sided; this would warrant a whole different debate (and would derail the entire post if I tried to address it).

  4. Some people point out that training for olympiads yields diminishing returns (e.g. learning Muirhead and Schur is probably not useful for anything else). I guess this is true, but isn’t it true of almost anything? Maybe the point is supposed to be “olympiads aren’t everything”, which is agreeable (see below).

  5. The other favorite comment I got was from Another Chicken, who points out below that the olympiad tribe itself is elitist: they tend to wall themselves off from outsiders (I certainly do this), and undervalue anything that isn’t hard technical problems.

    I concede these are real problems with the olympiad community. Again, this could be a whole different blog post.

    But I think this comment missed the point of this post. It is probably fine (albeit patronizing) to encourage olympiad students to expand; but I have a big problem with framing it as “spend time on not-contests because research“. That’s the real issue with the Mantra: it is often used as a recruitment slogan, telling students that research is the next true test after the IMO has been conquered.

    Changing the Golden Metric from olympiads to research seems to just make the world more egotistic than it already is.

Against Hook-Length on USAMO 2016/2

A recent USAMO problem asked the contestant to prove that

\displaystyle  (k^2)! \cdot \prod_{j=0}^{k-1} \frac{j!}{(j+k)!}

is an integer for every {k \in \mathbb N}. Unfortunately, it appears that this is a special case of the so-called hook-length formula, applied to a {k \times k} Young tableau, and several students appealed to this fact without proof to produce one-line solutions lacking any substance. This has led to a major controversy about how such solutions should be graded, in particular whether they should receive the {7^-} treatment for “essentially correct solutions”, or the {0^+} treatment for “essentially not solved”.

In this post I want to argue that I think that these solutions deserve a score of {1}.

1. Disclaimers

However, before I do so, I would like to make some disclaimers:

  • This issue is apparently extremely polarized: everyone seems to strongly believe one side or the other.
  • This was an extremely poor choice of a USAMO problem, and so there is no “good” way to grade the HL solutions, only a least bad way. The correct solution to the dilemma is to not have used the problem at all. Yet here’s the bloodied patient, and here we are in the emergency room.
  • While I am a grader for the USAMO, I am one of many graders, and what I say in this post does not necessarily reflect the viewpoints of other USAMO graders or what score the problem will actually receive. In other words, this is my own view and not official in any way.

One last remark is that I do not consider the hook-length formula to be a “well-known” result like so many contestants seem to want to pretend it is. However, this results in the danger of what constitutes “well-known” or not. So in what follows I’ll pretend that the HL formula is about as well-known as, say, the Pascal or Zsigmondy theorem, even though I personally don’t think that this is the case.

One final disclaimer: I am unlikely to respond to further comments about an issue this polarized, since I have already spent many hours doing so, and I’ve done enough of my duty. So if I don’t respond to a comment of yours, please don’t take it personally.

2. Rule for citations

Here is the policy I use for citations when grading:

  • You can cite any named result as long as it does not trivialize the problem.
  • If the result trivializes the problem, then you are required to prove the result (or otherwise demonstrate you understand the proof) in order to use it.

This is what I’ve heard every time I have asked or answered this question; I have never heard anything to the contrary.

Some people apparently want to nit-pick about how “trivialize” is not objective. I think this is silly. If you really want a definition of “trivialize”, you can take “equivalent to the problem or a generalization of the problem” as a rule of thumb.

Clearly it follows from my rule above that the hook-length formula deserves {0^+} grading, so the remainder of the post is dedicated to justifying why I think this is the correct rule.

3. Subjective grading

I would rather have an accurate subjective criteria than a poor objective one.

In an ideal world, grading would be completely objective: a solution which solves the problem earns {7^-} points and a solution which does not solve the problem earns {0^+} points. But in practice this is of course not possible, unless we expect our contestants to write their solutions in a formal language like Coq. Since this is totally infeasible, we instead use informal proofs: students write their solutions in English and human graders assign a score based on whether the solution could in principle be compiled into a formal language proof.

What this means is that in grading, there are subjective decisions made all of the time. A good example of this is omitting steps. Suppose a student writes “case {B} is similar [to case {A}]”. Then the grader has to decide whether or not the student actually knows that the other case is similar or not, or is just bluffing. One one extreme, if {A} and {B} really are identical, then the grader would probably accept the claim. On the other extreme, if {A} and {B} have some substantial differences, then the grader would almost certainly reject the claim. This implies that there is some intermediate “gray area” which cannot be objectively defined.

Fortunately, most of these decisions are clear, thus USAMO grading appears externally to be mostly objective. In other words, accurate grading and objective grading are correlated (yay math!), but they are not exactly the same. All scores are ultimately discretion by human graders, and not some set of rigid guidelines.

Citations are a special case of this. By citing a theorem, a student is taking advantage of convention that a well-known proof to both the student and grader can be omitted from the write-up.

4. Citing the problem

In light of this I don’t think you should get points for citing the problem. I think we all agree that if a student writes “this is a special case of IMO Shortlist 1999 G8”, they shouldn’t get very many points.

The issue with citing HL in lieu of solving the problem is that the hook-length formula is very hard to prove, and it is not reasonable to do so in an olympiad setting. Consequently, it is near certain that these students have essentially zero understanding of the solution to the problem; they have not solved it, they have only named it.

Similarly, I think you should not earn points for trivializing a problem using Dirichlet, Prime Number Theorem, Zsigmondy, etc. This is also historically consistent with the way that grading has been done in the past (hi Palmer!).

5. Citing intermediate steps

Now consider the usage of difficult theorems such as Dirichlet, Prime Number Theorem, etc. on a solution in which they are merely an intermediate step rather than the entire problem. The consensus here is that citing these results is okay (though it is not unanimous; some super harsh graders also want to take off points here, but they are very few in my experience).

I think this is acceptable, because in this case the contestant has done 90% of the solution, and cannot do the remaining 10% but recognize it as well-known. In my eyes this is considered as essentially solving the problem, because the last missing bit is a standard fact. But somehow if you cannot do 100% of the problem, I don’t think that counts as solving the problem.

What I mean is there is a subjective dependence both on how much of the solution the student actually understands, and how accessible the result is. Unfortunately, the HL solutions earn the worst possible rank in both of the categories: the student understands 0% of the solution and moreover the result is completely inaccessible.

6. Common complaints

Here are the various complaints that people have made to me.

  • “HL is well-known.”
    Well, I don’t think it is, but in any case that’s not the point; you cannot cite a result which is (a) more general than the problem, and (b) to which you don’t understand the proof.

  • “Your criteria is subjective!”
    So what? I would rather have an accurate subjective criteria than a poor objective one.

  • “It’s the problem writer’s fault, so students should get {7}.”
    This is not an ethics issue. It is not appropriate to award points of sympathy on a serious contest such as the USAMO; score inflation is not going to help anyone.

  • “It’s elitist for the graders to decide what counts as trivialized.”
    That’s the grader’s job. Again I would rather have an accurate subjective criteria than a poor objective one. In practice I think the graders are very reasonable about decisions.

  • “I don’t think anyone disputes that the HL solution is a correct one, so certainly the math dictates a {7^-}.”
    I dispute it: I don’t think citing HL is a solution at all.

  • “Why do we let students use Pascal / Cauchy / etc?”
    Because these results are much more reasonable to prove, and the “one-line” solutions using Pascal and Cauchy are not completely trivial; the length of a solution is not the same thing as its difficulty. Of course this is ultimately subjective, but I would rather have an accurate subjective criteria than a poor objective one.

  • “HL solutions have substance, because we made an observation that the quantity is actually the number of ways to do something.”
    That’s why I wish to award {1} instead of {0}.

  • “Your rule isn’t written anywhere.”
    Unfortunately, none of the rules governing USAMO are written anywhere. I agree this is bad and the USAMO should publish some more clear description of the grading mechanics.

  • “The proof of the HLF isn’t even that complicated.”
    Are you joking me?

In summary, I don’t think it is appropriate to give full marks to a student who cannot solve a problem just because they can name it.

Stop Paying Me Per Hour

Occasionally I am approached by parents who ask me if I am available to teach their child in olympiad math. This is flattering enough that I’ve even said yes a few times, but I’m always confused why the question is “can you tutor my child?” instead of “do you think tutoring would help, and if so, can you tutor my child?”.

Here are my thoughts on the latter question.

Charging by Salt

I’m going to start by clearing up the big misconception which inspired the title of this post.

The way tutoring works is very roughly like the following: I meet with the student once every week, with custom-made materials. Then I give them some practice problems to work on (“homework”), which I also grade. I throw in some mock olympiads. I strongly encourage my students to email me with questions as they come up. Rinse and repeat.

The actual logistics vary; for example, for small in-person groups I prefer to do every other week for 3 hours. But the thing that never changes is how the parents pay me. It’s always the same: I get N \gg 0 dollars per hour for the actual in-person meeting, and 0 dollars per hour for preparing materials, grading homework, responding to questions, and writing the mock olympiads.

Now I’m not complaining because N is embarrassingly large. But one day I realized that this pricing system is giving parents the wrong impression. They now think the bulk of the work is from me taking the time to meet with their child, and that the homework is to help reinforce what I talk about in class. After all, this is what high school does, right?

I’m here to tell you that this is completely wrong.

It’s the other way around: the class is meant to supplement the homework. Think salt: for most dishes you can’t get away with having zero salt, but you still don’t price a dish based on how much salt is in it. Similarly, you can’t excise the in-person meeting altogether, but the dirty secret is that the classtime isn’t the core component.

I mean, here’s the thing.

  • When you take the IMO, you are alone with a sheet of paper that says “Problem 1”, “Problem 2”, “Problem 3”.
  • When you do my homework, you are alone with a sheet of paper that says “Problem 1”, “Problem 2”, “Problem 3”.
  • When you’re in my class, you get to see my beautiful smiling face plus a sheet of paper that says “Theorem 1”, “Example 2”, “Example 3”.

Which of these is not like the other?

Active Ingredients

So we’ve established that the main active ingredient is actually you working on problems alone in your room. If so, why do you need a teacher at all?

The answer depends on what the word “need” means. No USA IMO contestant in my recent memory has had a coach, so you don’t need a coach. But there are some good reasons why one might be helpful.

Some obvious reasons are social:

  • Forces you to work regularly; though most top students don’t really have a problem with self-motivation
  • You have a person to talk to. This can be nice if you are relatively isolated from the rest of the math community (e.g. due to geography).
  • You have someone who will answer your questions. (I can’t tell you how jealous I am right now.)
  • Feedback on solutions to problems. This includes student’s written solutions (stylistic remarks, or things like “this lemma you proved in your solution is actually just a special case of X” and so on) as well as explaining solutions to problems the student fails to solve.

In short, it’s much more engaging to study math with a real person.

Those reasons don’t depend so much on the instructor’s actual ability. Here are some reasons which do:

  • Guidance. An instructor can tell you what things to learn or work on based on their own experience in the past, and can often point you to things that you didn’t know existed.
  • It’s a big plus if the instructor has a good taste in problems. Some problems are bad and don’t teach you anything; some (old) problems don’t resemble the flavor of problems that actually appear on olympiads. On the flip side, some problems are very instructive or very pretty, and it’s great if your teacher knows what these are.
  • Ideally, also a good taste in topics. For example, I strongly object to classes titled “collinearity and concurrence” because this may as well be called “geometry”, and I think that such global classes are too broad to do anything useful. Conversely, examples of topics I think should be classes but aren’t: “looking at equality cases”, “explicit constructions”, “Hall’s marriage theorem”, “greedy algorithms”. I make this point a lot more explicitly in Section 2 of this blog post of mine.

In short, you’re also paying for the material and expertise. Past IMO medalists know how the contest scene works. Parents and (beginning) students less so.

Lastly, the reason which I personally think is most important:

  • Conveys strong intuition/heuristics, both globally and for specific problems. It’s hard to give concrete examples of this, but a few global ones I know were particularly helpful for me: “look at maximal things” (Po-Shen Loh on greedy algorithms), “DURR WE WANT STUFF TO CANCEL” (David Yang on FE’s), “use obvious inequalities” (Gabriel Dospinescu on analytic NT), which are take-aways that have gotten me a lot of points. This is also my biggest criteria for evaluating my own written exposition.

You guys know this feeling, I’m sure: when your English teacher assigned you an passage to read, the fastest way to understand it is to not read the passage but to ask the person sitting next to you what it’s saying. I think this is in part because most people are awful at writing and don’t even know how to write for other human beings.

The situation in olympiads is the same. I estimate listening to me explain a solution is maybe 4 to 10 times faster than reading the official solution. Turns out that writing up official solutions for contests is a huge chore, so most people just throw a sequence of steps at the reader without even bothering to identify the main ideas. (As a contest organizer, I’m certainly guilty of this laziness too!)

Aside: I think this is a large part of why my olympiad handouts and other writings have been so well-received. Disclaimer: this was supposed to be a list of what makes a good instructor, but due to narcissism it ended up being a list of things I focus on when teaching.

Caveat Emptor

And now I explain why the top IMO candidates still got by without teachers.

It turns out that the amount of math preparation time that students put in doesn’t seem to be a normal distribution. It’s a log normal distribution. And the reason is this: it’s hard to do a really good job on anything you don’t think about in the shower.

Officially, when I was a contestant I spent maybe 20 hours a week doing math contest preparation. But the actual amount of time is higher. The reason is that I would think about math contests more like 24/7. During English class, I would often be daydreaming about the inequality I worked on last night. On the car ride home, I would idly think about what I was going to teach my middle school students the next week. To say nothing of showers: during my showers I would draw geometry diagrams on the wall with water on my finger.

So spiritually, I maybe spent 10 times as much time on math olympiads compared to an average USA(J)MO qualifier.

And that factor of 10 is enormous. Even if I as a coach can cause you to learn two or three or four times more efficiently, you will still lose to that factor of 10. I’d guess my actual multiplier is somewhere between 2 and 3, so there you go. (Edit: this used to say 3 to 4, I think that’s too high now.)

The best I can do is hope that, in addition to making my student’s training more efficient, I also cause my students to like math more.

Some Advice for Olympiad Geometry

I know some friends who are fantastic at synthetic geometry. I can give them any problem and they’ll come up with an incredibly impressive synthetic solution. I also have some friends who are very bad at synthetic geometry, but have such good fortitude at computations that they can get away with using Cartesian coordinates for everything.

I don’t consider myself either of these types; I don’t have much ingenuity when it comes to my solutions, and I’m actually quite clumsy when it comes to long calculations. But nonetheless I have a high success rate with olympiad geometry problems. Not only that, but my solutions are often very algorithmic, in the sense that any well-trained student should be able to come up with this solution.

In this article I try to describe how I come up which such solutions.

1. The Three Reductions

Very roughly, there are three different ways I try to make progress on a geometry problem.

  • (I) The standard synthetic techniques; angle chasing, cyclic quadrilaterals, homothety, radical axis / power of a point, etc. My own personal arsenal contains some weapons not known to many contestants as well, most notably inversion, harmonic bundles and quadrilaterals, and spiral similarity / Miquel points.For this part, it’s highly advantageous to be well-versed with “standard” configurations and tricks. To give an extreme example: to solve Iran TST 2009, Problem 9 one essentially needs only recognize two configurations: a lemma about the midpoint of an altitude (2002 G7) and another lemma about the line {EF} (USAJMO 2014/6). Not knowing either of these makes it more difficult to solve the problem synthetically in the time limit. As a reference, Yufei Zhao’s lemmas handout contains a fairly comprehensive list of these configurations.

    Easier problems don’t require as much in this way of configuration recognition.

  • (II) Standard computational techniques (aka bashing). Personally, I prefer complex numbers and barycentric coordinates but I know other students who will use Cartesian coordinates and trigonometry to great success. The advantage of such methods is that they are straightforward and reliable, albeit tedious and time-consuming. It is mostly a matter of experience to understand whether a calculation can be carried out within the time limit — I can basically tell just by looking at a setup whether it can be solved in this time.
  • (III) Most surprisingly: simply finding crucial claims. Especially for harder problems like IMO 3/6 much of the time the key to solving a problem is making some key observation. Said another way: a difficult IMO 3/6 problem which asks you to prove {A \implies B} might have a solution which goes like,

    \displaystyle A \implies X \implies Y \implies B.

    Each of the individual implications might be no harder than an IMO 1/4 but the difficulty rests in finding what to prove. The most reliable way to do such things is to draw large, in-scale diagrams. If you are good at recognizing cyclic quadrilaterals, collinear points, etc. then the correct claims will naturally suggest themselves; conversely, good diagrams will prevent you from wasting time trying to prove things that aren’t true (effectively letting you test your claims “experimentally” before trying to prove them).

Type (III) deserves some comment here. There is more to making progress on a problem than simply trying things you think will solve the problem: there is some “scouting” involved that you will need to do for any difficult problems. As a terrible analogy, in StarCraft you have to scout an experienced opponent to understand what they’re doing before you try to attack them. The situation with IMO 3/6 is no different: you have to have some understanding of the problem before you stand a chance of being able to solve it.

Easy problems can often succumb to just one class of attacks, but the interesting and difficult problems can require two or all three classes in order to solve. How much you use each type of strategy is in my opinion a matter of personal taste — some people don’t use (II) at all and rely on (I) to prove everything, and even vice versa! I like to think I balance (I) and (II) evenly. But (III) is indispensable, and in any case I think part of the reason I have been so successful with geometry problems is precisely that I can draw on all three strategies in tandem, rather than being limited to one or two.

In fact, a good rule of thumb that I use for judging the difficulty of a problem is how many of the above methods I had to use: the {n}th problem on an IMO paper should require me to resort to about {n} of these strategies.

2. Concrete Examples

I’ll now give some concrete examples of the things I said above. Warning: spoilers follow, and hyperlinks lead to my solutions on Art of Problem Solving. You are encouraged to try the problems yourself before reading the comments.

Example [EGMO 2012/1] Let {ABC} be a triangle with circumcenter {O}. The points {D}, {E}, {F} lie in the interiors of the sides {BC}, {CA}, {AB} respectively, such that {\overline{DE}} is perpendicular to {\overline{CO}} and {\overline{DF}} is perpendicular to {\overline{BO}}. Let {K} be the circumcenter of triangle {AFE}. Prove that the lines {\overline{DK}} and {\overline{BC}} are perpendicular.

This is a pretty typical entry-level geometry problem. Do some angle chasing (I) to find one cyclic quad (III), and then follow through to solve the problem (I). If you are good enough, you don’t even need to find the cyclic quad in advance; just play around with the angles until you notice it.

Example [IMO 2014, Problem 4] Let {P} and {Q} be on segment {BC} of an acute triangle {ABC} such that {\angle PAB=\angle BCA} and {\angle CAQ=\angle ABC}. Let {M} and {N} be the points on {AP} and {AQ}, respectively, such that {P} is the midpoint of {AM} and {Q} is the midpoint of {AN}. Prove that the intersection of {BM} and {CN} is on the circumference of triangle {ABC}.

You can solve this problem by barycentric coordinates (II) instantly (textbook example). Also similar triangles (I) solves the problem pretty quickly as well. Again, this problem is “easy” in the sense that one can directly approach it with either (I) or (II), not needing (III) at all.

Example [USAMO 2015/2] Quadrilateral {APBQ} is inscribed in circle {\omega} with {\angle P = \angle Q = 90^{\circ}} and {AP = AQ < BP}. Let {X} be a variable point on segment {\overline{PQ}}. Line {AX} meets {\omega} again at {S} (other than {A}). Point {T} lies on arc {AQB} of {\omega} such that {\overline{XT}} is perpendicular to {\overline{AX}}. Let {M} denote the midpoint of chord {\overline{ST}}. As {X} varies on segment {\overline{PQ}}, show that {M} moves along a circle.

This was not supposed to be a very difficult problem, but it seems to have nearly swept the JMO group. Essentially, the key to this problem is to notice that the center of the desired circle is in fact the midpoint of {AO} (with {O} the center of the circle). This is a huge example of (III) — after this observation, one can solve the problem very quickly using complex numbers (II). It is much harder (though not impossible) to solve the problem without knowing the desired center.

Example [USAMO 2014/5] Let {ABC} be a triangle with orthocenter {H} and let {P} be the second intersection of the circumcircle of triangle {AHC} with the internal bisector of the angle {\angle BAC}. Let {X} be the circumcenter of triangle {APB} and {Y} the orthocenter of triangle {APC}. Prove that the length of segment {XY} is equal to the circumradius of triangle {ABC}.

Personally I think the most straightforward solution is to use (I) to eliminate the orthocenter condition, and then finish with complex numbers (II). Normally, you won’t see a medium-level problem that dies immediately to (II), and the only reason a problem like this could end up as a problem 5 is that there is a tiny bit of (I) that needs to happen before the complex numbers becomes feasible.

Example [IMO 2014/3] Convex quadrilateral {ABCD} has {\angle ABC = \angle CDA = 90^{\circ}}. Point {H} is the foot of the perpendicular from {A} to {BD}. Points {S} and {T} lie on sides {AB} and {AD}, respectively, such that {H} lies inside triangle {SCT} and

\displaystyle \angle CHS - \angle CSB = 90^{\circ}, \quad \angle THC - \angle DTC = 90^{\circ}.

Prove that line {BD} is tangent to the circumcircle of triangle {TSH}.

Like most IMO 3/6’s I had to resort to using all three methods in order to solve this problem. The first important step was finding out what to do with the angle condition. It turns out that in fact, it’s equivalent to the circumcenter of triangle {TCH} lying on side {AD} of the triangle (III); proving this is then a matter of angle chasing (I). Afterwards, one has to recognize a tricky usage of the angle bisector theorem (I) to reduce it to something that can be computed with trigonometry (II). This leads to a direct solution that, while not elegant, also requires much less ingenuity then most of the solutions found by friends I know.

I really want to stress that being proficient in all three strategies is key to getting “straightforward” solutions like this to IMO 3/6 caliber problems. If you miss any of these components, you are not going to solve the problem.

Example [IMO 2011/6] Let {ABC} be an acute triangle with circumcircle {\Gamma}. Let {\ell} be a tangent line to {\Gamma}, and let {\ell_a}, {\ell_b}, {\ell_c} be the lines obtained by reflecting {\ell} in the lines {BC}, {CA}, and {AB}, respectively. Show that the circumcircle of the triangle determined by the lines {\ell_a}, {\ell_b}, and {\ell_c} is tangent to the circle {\Gamma}.

The ultimate example of these three principles. Using a trick that showed up on APMO 2014/5 and RMM 2013/3, one constructs the tangency point {T} and connects the points {A_1}, {B_1}, {C_1}, as I explain in this post, yielding points {A_2}, {B_2}, {C_2}. After that, a very careful examination of the diagram (possibly several diagrams) leads to a conjecture that {A_1A=AP}, et cetera. This is the key observation (III), and leads to highly direct solution via (II). But the point of this problem is that you need to have the guts to construct those auxiliary points and then boldly claim they are the desired “squared” points.

3. Comparison with Other Subjects

The approaches I’ve described highlight some of the features of olympiad geometry which distinguish it from other subjects.

  • Unlike other olympiad subjects, you can actually obtain a big advantage by just knowing lots of theory. Experienced contestants simply “recognize” a large body of common configurations that those without access to training materials have never seen before. Similarly, there are a lot of fancy techniques that can make a big difference. This is much less true of other subjects (for example combinatorics is the opposite extreme).
  • There’s less variance in the subject: lots of Euclidean geometry problems feel the same, and all of them use the same body of techniques. It reminds me of chess: it’s very “narrow” in the sense that at the end of the day, there are only so many possible moves. (Olympiad inequalities also has this kind of behavior.) Again combinatorics is the opposite of this.
  • You have a reliable backup in case you can’t find the official solution: bash. Moreover, in general there are often many different ways to solve a problem; not true of other subjects.
  • If you want to make some “critical claim” you can quickly test it empirically (by drawing a good diagram).


Writing Olympiad Geometry Problems

You can use a wide range of wild, cultivated or supermarket greens in this recipe. Consider nettles, beet tops, turnip tops, spinach, or watercress in place of chard. The combination is also up to you so choose the ones you like most.

— Y. Ottolenghi. Plenty More

In this post I’ll describe how I come up with geometry proposals for olympiad-style contests. In particular, I’ll go into detail about how I created the following two problems, which were the first olympiad problems which I got onto a contest. Note that I don’t claim this is the only way to write such problems, it just happens to be the approach I use, and has consistently gotten me reasonably good results.

[USA December TST for 56th IMO] Let {ABC} be a triangle with incenter {I} whose incircle is tangent to {\overline{BC}}, {\overline{CA}}, {\overline{AB}} at {D}, {E}, {F}, respectively. Denote by {M} the midpoint of {\overline{BC}} and let {P} be a point in the interior of {\triangle ABC} so that {MD = MP} and {\angle PAB = \angle PAC}. Let {Q} be a point on the incircle such that {\angle AQD = 90^{\circ}}. Prove that either {\angle PQE = 90^{\circ}} or {\angle PQF = 90^{\circ}}.


[Taiwan TST Quiz for 56th IMO] In scalene triangle {ABC} with incenter {I}, the incircle is tangent to sides {CA} and {AB} at points {E} and {F}. The tangents to the circumcircle of {\triangle AEF} at {E} and {F} meet at {S}. Lines {EF} and {BC} intersect at {T}. Prove that the circle with diameter {ST} is orthogonal to the nine-point circle of {\triangle BIC}.


1. General Procedure

Here are the main ingredients you’ll need.

  • The ability to consistently solve medium to hard olympiad geometry problems. The intuition you have from being a contestant proves valuable when you go about looking for things.
  • In particular, a good eye: in an accurate diagram, you should be able to notice if three points look collinear or if four points are concyclic, and so on. Fortunately, this is something you’ll hopefully have just from having done enough olympiad problems.
  • Geogebra, or some other software that will let you quickly draw and edit diagrams.

With that in mind, here’s the gist of what you do.

  1. Start with a configuration of your choice; something that has a bit of nontrivial structure in it, and add something more to it. For example, you might draw a triangle with its incircle and then add in the excircle tangency point, and the circle centered at {BC} passing through both points (taking advantage of the fact that the two tangency points are equidistant from {B} and {C}).
  2. Start playing around, adding in points and so on to see if anything interesting happens. You might be guided by some actual geometry constructions: for example, if you know that the starting configuration has a harmonic bundle in it, you might project this bundle to obtain the new points to play with.
  3. Keep going with this until you find something unexpected: three points are collinear, four points are cyclic, or so on. Perturb the diagram to make sure your conjecture looks like it’s true in all cases.
  4. Figure out why this coincidence happened. This will probably add more points to you figure, since you often need to construct more auxiliary points to prove the conjecture that you have found.
  5. Repeat the previous two steps to your satisfaction.
  6. Once you are happy with what you have, you have a nontrivial statement and probably several things that are equivalent to it. Pick the one that is most elegant (or hardest), and erase auxiliary points you added that are not needed for the problem statement.
  7. Look for other ways to reduce the number of points even further, by finding other equivalent formulations that have fewer points.

Or shorter yet: build up, then tear down.

None of this makes sense written this abstractly, so now let me walk you through the two problems I wrote.

2. The December TST Problem

In this narrative, the point names might be a little strange at first, because (to make the story follow-able) I used the point names that ended up in the final problem, rather than ones I initially gave. Please bear with me!

I began by drawing a triangle {ABC} (always a good start\dots) and its incircle, tangent to side {BC} at {D}. Then, I added in the excircle touch point {T}, and drew in the circle with diameter {DT}, which was centered at the midpoint {M}. This was a coy way of using the fact that {MD = MT}; I wanted to see whether it would give me anything interesting.

So, I now had the following picture.


Now I had two circles intersecting at a single point {D}, so I added in {Q}, the second intersection. But really, this point {Q} can be thought of another way. If we let {DS} be the diameter of the incircle, then as {DT} is the other diameter, {Q} is actually just the foot of the altitude from {D} to line {ST}.

But recall that {A}, {S}, {T} are collinear! (Again, this is why it’s helpful to be familiar with “standard” contest configurations; you see these kind of things immediately.) So {Q} in fact lies on line {AT}.


This was pretty cool, though not yet interesting enough to be a contest problem. So I looked for most things that might be true.

I don’t remember what I tried next; it didn’t do anything interesting. But I do remember the thing I tried after that: I drew in the angle bisector, line {AI}. And then, I noticed a big coincidence: the first intersection of {AI} with the circle with diameter {DT} seemed to lie on line {DE}! I was initially confused by this; it didn’t seem like it could possibly be true due to symmetry reasons. But in my diagram, it was indeed correct. A moment later, I realized the reason why this was plausible: in fact, the second intersection of line {AI} with the circle was on line {DF}.


Now, I could not see quickly at all why this was true. So I started trying to prove it, but initially failed: however, I managed to show (via angle chasing) that

\displaystyle D, P, E \text{ collinear} \iff \angle PQE = 90^\circ.

So, at least I had an interesting equivalent statement.


After another half hour of trying to prove my conjecture, I finally realized what was happening. The point {P} was the one attached to a particular lemma: the {A}-bisector, {B}-midline, and {C} touch-chord are concurrent, and from this {MD = MP} just follows by some similar triangles. So, drawing in the point {N} (the midpoint of {AB}), I had the full configuration which gave the answer to my conjecture.


Finally, I had to clean up the mess that I had made. How could I do this? Well, the points {N}, {S} could be eliminated easily enough. And we could re-define {Q} to be a point on the incircle such that {\angle AQD = 90^\circ}. This actually eliminated the green circle and point {T} altogether, provided we defined {P} by just saying that it was on the angle bisector, and that {MD = MP}. (So while the circle was still implicit in the condition {MD = MP}, it was no longer explicitly part of the problem.)

Finally, we could even remove the line through {D}, {P} and {E}; we ask the contestant to prove {\angle PQE = 90^\circ}.


And that was it!

3. The Taiwan TST Problem

In fact, the starting point of this problem was the same lemma which provided the key to the previous solution: the circle with diameter {BC} intersects the {B} and {C} bisectors on the {A} touch chord. Thus, we had the following diagram.


The main idea I had was to look at the points {D}, {X}, {Y} in conjunction with each other. Specifically, this was the orthic triangle of {\triangle BIC}, a situation which I had remembered from working on Iran TST 2009, Problem 9. So, I decided to see what would happen if I drew in the nine-point circle of {\triangle BIC}. Naturally, this induces the midpoint {M} of {BC}.


At this point, notice (or recall!) that line {AM} is concurrent with lines {DI} and {EF}.


So the nine-point circle of the problem is very tied down to the triangle {BIC}. Now, since I was in the mood for something projective, I constructed the point {T}, the intersection of lines {EF} and {BC}. In fact, what I was trying to do was take perspectivity through {I}. From this we actually deduce that {(T,K;X,Y)} is a harmonic bundle.


Now, what could I do with this picture? I played around looking for some coincidences, but none immediately presented themselves. But I was enticed by the point {T}, which was somehow related to the cyclic complete quadrilateral {XYMD}. So, I went ahead and constructed the pole of {T} to the nine-point circle, letting it hit line {BC} at {L}. This was aimed at “completing” the picture of a cyclic quadrilateral and the pole of an intersection of two sides. In particular, {(T,L;D,M)} was harmonic too.


I spent a long time thinking about how I could make this into a problem. I unfortunately don’t remember exactly what things I tried, other than the fact that I was taking a lot of perspectivity. In particular, the “busiest” point in the picture is {K}, so it makes sense to try and take perspectives through it. Especially enticing was the harmonic bundle

\displaystyle \left( \overline{KT}, \overline{KL}; \overline{KD}, \overline{KM} \right) = -1.

How could I use this to get a nice result?

Finally about half an hour I got the right idea. We could take this bundle and intersect it with the ray {AI}! Now, letting {N} be the midpoint {EF}, we find that three of the points in the harmonic bundle we obtain are {A}, {I}, and {N}; let {S} be the fourth point, which is the intersection of line {KL} with {AI}. Then by hypothesis, we ought to have {(A,I;N,S) = -1}. But from this we know exactly what the point {S}. Just look at the circumcircle of triangle {AEF}: as this has diameter {AI}, we see that {S} is the intersection of the tangents at {E} and {F}.


Consequently, we know that the point {S}, defined very naturally in terms of the original picture, lies on the polar of {T} to the nine-point circle. By simply asking the contestant to prove this, we thus eliminate all the points {K}, {M}, {D}, {N}, {I}, {X}, and {Y} completely from the picture, leaving only the nine-point circle. Finally, instead of directly asking the contestant to show that {T} lies on the polar of {S}, one can rephrase the problem as saying “the circle with diameter {ST} is orthogonal to the nine-point circle of {\triangle BIC}”, concealing all the work that went into the creation of the problem.



Putnam 2015 Aftermath

(EDIT: These solutions earned me a slot in N1, top 16.)

I solved eight problems on the Putnam last Saturday. Here were the solutions I found during the exam, plus my repaired solution to B3 (the solution to B3 I submitted originally had a mistake).

Some comments about the test. I thought that the A test was mostly very easy: problems A1, A3, A4 were all routine, and problem A5 is a little long-winded but nothing magical. Problem A2 was tricky, and took me well over half the A session. I don’t know anything about A6, but it seems to be very hard.

The B session, on the other hand, was completely bizarre. In my opinion, the difficulty of the problems I did attempt was

\displaystyle  B4 \ll B1 \ll B5 < B3 < B2.

1. Problems

1.1. Day A


  1. Points {A} and {B} are on the same branch of the hyperbola {xy=1}. Point {P} is selected between them maximizing the area of {\triangle ABP}. Show that the area of region bounded by the hyperbola and {AP} equals the area of region bounded by the hyperbola and {BP}.
  2. Define {a_0 = 1}, {a_1 = 2} and {a_n = 4a_{n-1} - a_{n-2}}. Find an odd prime dividing {a_{2015}}.
  3. Compute

    \displaystyle  \log_2 \prod_{a=1}^{2015} \prod_{b=1}^{2015} \left( 1 + e^{\frac{2\pi i a b}{2015}} \right).

  4. Let {f(x) = \sum_{n \in S_x} 2^{-n}} where {S_x = \{ n \in \mathbb Z^+ \mid \left\lfloor nx \right\rfloor \text { even} \}}. Compute {\inf_{x \in [0,1)} f(x)}.
  5. Let {q \ge 1} be an odd integer and define

    \displaystyle  N_q = \# \left\{ 0 < a < q/4 \mid \gcd(a,q) = 1 \right\}.

    Prove {N_q} is odd if and only if {q = p^k} for some prime {p \equiv 5,7 \pmod8}.

  6. Let {A}, {B}, {M}, {X} be {n \times n} real matrices. Prove that if {AM = MB} and {A}, {B} have the same characteristic polynomial then {\det(A-MX) = \det(B-XM)}.

1.2. Night B

  1. Let {f : \mathbb R \rightarrow \mathbb R} be thrice differentiable. Prove that if {f} has at least five distinct roots then {f + 6f' + 12f'' + 8f'''} has at least two distinct roots.
  2. Starting with the list of positive integers {1, 2, 3, \dots}, we repeatedly remove the first three elements as well as their sum, giving a sequence of sums {1+2+3=6}, {4+5+7=16}, {8+9+10=27}, {11+12+13=36}, {14+15+17=46}, and so on. Decide whether some sum in the sequence ends with {2015} in base {10}.
  3. Define {S} to be the set of real matrices {\left(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\right)} such that {a}, {b}, {c}, {d} form an arithmetic progression in that order. Find all {M \in S} such that for some integer {k > 1}, {M^k \in S}.
  4. Let { T = \left\{(a,b,c) \in \mathbb Z_+^3 \mid a,b,c \text{ are sides of a triangle} \right\} }. Prove that

    \displaystyle  \sum_{(a,b,c) \in T} \frac{2^a}{3^b5^c}

    is rational and determine its value.

  5. Let {P_n} be the number of permutations {\pi : \{1, \dots, n\} \rightarrow \{1, \dots, n\}} such that {|\pi(i)-\pi(j)| \le 2} where {|i-j| \le 1}. For any {n \ge 2} show that

    \displaystyle  P_{n+5} - P_{n+4} - P_{n+3} + P_n

    does not depend on {n} and determine its value.

  6. Let {a(k)} be the number of odd divisors of {k} in the interval {[1, \sqrt{2k})}. Compute

    \displaystyle  \sum_{k \ge 1} \frac{(-1)^{k-1} a(k)}{k}.

2. Solution to Problem A1

Standard A1 fare. WLOG {A} is to the left of {B} on the positive branch. First we claim the tangent to {P} must be parallel to line {AB}; otherwise let it intersect the parabola again at {P'}, and any point between {P} and {P'} will increase the area of {\triangle ABP}. In that case, we must have

\displaystyle  -\frac{1}{p^2} = \frac{\frac1a-\frac1b}{a-b} \implies p = \sqrt{ab}

Now the area bounded by {AP} is

\displaystyle  \frac{\frac1a+\frac1p}{2} \cdot (p-a) - \int_a^p \frac 1x \; dx = \frac{1}{2}\left( \frac pa - \frac ap \right) - \log \left( \frac pa \right).

The area of {BP} is computed similarly and gives

\displaystyle  \frac{1}{2}\left( \frac bp - \frac pb \right) - \log \left( \frac bp \right)

and so these are equal.

3. Solution to Problem A2

By induction we have {a_n = \frac12\left( (2+\sqrt3)^n+(2-\sqrt3)^n \right)}. Thus {a_{m} \mid a_{(2k+1)m}} for any {k}, {m}, because their quotient is both rational and an algebraic integer. Thus {a_5 = 362 = 2 \cdot \boxed{181}}.

During the test, I only found this solution by computing {a_1}, \dots, {a_{11}} explicitly and factoring the first {10} numbers, upon which I realized {a_3 \mid a_9}.

4. Solution to Problem A3

First, we use the fact that for any odd integer {m}, we have

\displaystyle  \prod_{1 \le b \le m} (1 + \zeta^b) = 2

where {\zeta} is an {m}th root of unity. (Just plug {-1} into {X^m-1}.) Thus

\displaystyle  \begin{aligned} \log_2 \prod_{a=1}^{2015} \prod_{b=1}^{2015} \left( 1 + e^{\frac{2\pi i a b}{2015}} \right) &= \log_2 \prod_{a=1}^{2015} 2^{\gcd(a,2015)} \\ &= \sum_{a=1}^{2015} \gcd(a,2015) \\ &= \sum_{d \mid 2015} \frac{2015}{d} \phi(d) \\ &= (5+\phi(5))(13+\phi(13))(31+\phi(31)) \\ &= 9 \cdot 25 \cdot 61 \\ &= \boxed{13725}. \end{aligned}

5. Solution to Problem A4

We first prove that {f(x) \ge \frac47} for every {x \in [0,1)}. It suffices to consider {x \ge \frac{1}{2}} since otherwise {f(x) \ge \frac12+\frac14}. We note that for any {k}, if {k \notin S_x} and {k+1 \notin S_x} then we require that {\left\lfloor kx \right\rfloor = \left\lfloor (k+1)x \right\rfloor = 2m+1} for some {m}; since {1 \le 2x < 2} this implies {\left\lfloor (k+2)x \right\rfloor = 2m+2}.

Thus among any three consecutive numbers, at least one is in {S_x}. Plainly {1 \in S_x} and hence {f(x) \ge 2^{-1} + 2^{-4} + 2^{-7} + \dots = \frac47}.

Finally note that

\displaystyle  f\left( \frac23-\varepsilon \right) \rightarrow \frac47 \quad\text{as}\quad x \rightarrow \frac23

so the answer is {L = \boxed{\frac47}}.

6. Solution to Problem A5

Let {q = p_1^{e_1} p_2^{e_2} \dots p_n^{e_n}}. By Principle of Inclusion-Exclusion (or Moebius inversion?) we have

\displaystyle  N_q = \sum_{S \subset \{1, \dots, n\}} (-1)^{|S|} \left\lfloor \frac{q}{4\prod_{s \in S} p_s} \right\rfloor .

The {(-1)^{|S|}} vanishes when taking modulo {2}. Now, {\left\lfloor \frac{\bullet}{4} \right\rfloor \pmod 2} depends only on the input modulo {8}, equalling {0} for {1,3 \pmod 8} and {1} for {5,7\pmod8}. As {p^2 \equiv 1 \pmod 8} the exponents in the floor also can be taken mod {2}. Therefore, we can simplify this to

\displaystyle  N_q \equiv \sum_{S \subset \{1, \dots, n\}} \left\lfloor \frac{q\prod_{s \in S} p_s}{8} \right\rfloor \pmod 2.

Let {A(S) = \left\lfloor \frac{q\prod_{s \in S} p_s}{8} \right\rfloor \pmod 2}. The solution now proceeds in three cases.

If any prime is {1} or {3} mod {8}, say {p_1}, then {A(T) = A(T \cup \{1\})} for {T \subset \{2, \dots, n\}} and thus {N_q} is even.

If all primes are {5} or {7} mod {8} and {n \ge 2}, then {A(T) = A(T \cup \{1,2\})} and {A(T \cup \{1\}) = A(T \cup \{2\})} for {T \subset \{3, \dots, n\}} and thus {N_q} is even.

But if {n = 1} and {p_1 \equiv 5,7 \pmod 8} then {A(\varnothing) + A(\{1\})} is always odd. This completes the proof.

7. Solution to Problem B1

Let {X(f) = f + 2f'}. The problem asks to show {f} having five distinct real roots implies {X(X(X(f)))} has at least two. But by Rolle’s Theorem on

\displaystyle  g(x) = e^{\frac{1}{2} x} f(x) \implies 2g'(x) = e^{\frac{1}{2} x}(f(x) + 2f'(x))

we see that {f} having {k} roots gives {X(f)} having {k-1} roots, so done. (I’m told the trick of considering this function {g} is not new, though I came up with it during the exam so I can’t verify this.)

Lots of people said something about trying to use the Intermediate Value Theorem to show that {X(f)} has a root between any two roots of {f}. This would require {f} to be continuously differentiable, so it breaks down at the last application to {X(X(X(f)))}. I deliberately appealed to Rolle’s Theorem because I didn’t trust myself to deal with these real-analytic issues during the exam.

8. Solution to Problem B3

The answer is

\displaystyle  \begin{pmatrix} t&t\\t&t \end{pmatrix} \quad\text{and}\quad \begin{pmatrix} -3t&-t\\t&3t \end{pmatrix}

for {t \in \mathbb R}. These work by taking {k=3}.

Now to see these are the only ones, consider an arithmetic matrix

\displaystyle  M = \begin{pmatrix} a & a+e \\ a+2e & a+3e \end{pmatrix}.

which is not zero everywhere. Its characteristic polynomial is {t^2 - (2a+3e)t - 2e^2}, with discriminant {(2a+3e)^2 + 8e^2}, so it has two distinct real roots; moreover, since {-2e^2 \le 0} either one of the roots is zero or they are of opposite signs.

Now we can diagonalize {M} by writing

\displaystyle  M = \begin{pmatrix} s & -q \\ -r & p \end{pmatrix} \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} \begin{pmatrix} p & q \\ r & s \end{pmatrix} = \begin{pmatrix} ps\lambda_1 - qr\lambda_2 & qs(\lambda_1-\lambda_2) \\ pr(\lambda_2-\lambda_1) & ps\lambda_2 - qr\lambda_1 \end{pmatrix}

where {ps-qr=1}. By using the fact the diagonal entries have sum equalling the off-diagonal entries, we obtain that

\displaystyle  (ps-qr)(\lambda_1+\lambda_2) = (qs-pr)(\lambda_1-\lambda_2) \implies qs-pr = \frac{\lambda_1+\lambda_2}{\lambda_1-\lambda_2}.

Now if {M^k \in S} too then the same calculation gives

\displaystyle  qs-pr = \frac{\lambda_1^k+\lambda_2^k}{\lambda_1^k-\lambda_2^k}.

Let {x = \lambda_1/\lambda_2 < 0} (since {-2e^2 < 0}). We appropriately get

\displaystyle  \frac{x+1}{x-1} = \frac{x^k+1}{x^k-1} \implies \frac{2}{x-1} = \frac{2}{x^k-1} \implies x = x^k \implies x = -1 \text{ or } x = 0

and {k} odd. If {x=0} we get {e=0} and if {x=-1} we get {2a+3e=0}, which gives the curve of solutions that we claimed. (Unfortunately, during the test I neglected to address {x=-1}, which probably means I will get null marks.)

9. Solution to Problem B4

By the Ravi substitution,

\displaystyle  \begin{aligned} \sum_{(a,b,c) \in T} \frac{2^a}{3^b5^c} &= \sum_{x,y,z \ge 1 \text{ odd}} \frac{2^\frac{y+z}{2}}{3^{\frac{z+x}{2}} 5^{\frac{x+y}{2}}} + \sum_{x,y,z \ge 2 \text{ even}} \frac{2^\frac{y+z}{2}}{3^{\frac{z+x}{2}} 5^{\frac{x+y}{2}}} \\ &= \sum_{u,v,w \ge 1} \frac{2^{v+w}}{3^{w+u}5^{u+v}} \left( 1 + \frac{2^{-1}}{3^{-1}5^{-1}} \right) \\ &= \frac{17}{2} \sum_{u,v,w \ge 1} \left(\frac{1}{15}\right)^u \left(\frac25\right)^v \left(\frac23\right)^w \\ &= \frac{17}{2} \frac{\frac{1}{15}}{1-\frac{1}{15}} \frac{\frac25}{1-\frac25}\frac{\frac23}{1-\frac23} \\ &= \boxed{\frac{17}{21}}. \end{aligned}

No idea why this was B4; it was by far the easiest problem on the test for me. I knew how to do it in less than a second on seeing it and the rest was just arithmetic. (We use the Ravi substitution all the time on olympiad inequalities, so it’s practically a reflex for me.)

10. Solution to Problem B5

My solution was long and disgusting, so I won’t write out the full details. The idea is to let {A_n} be the number of permutations in {P_{n+1}} with the additional property that {\pi(1) = 1}. By casework on the index {k} with {\pi(k) = 1} (which must be surrounded by {2} or {3} if {k \neq 1, n}) one obtains

\displaystyle  P_n = 2(A_0 + A_1 + \dots + A_{n-3} + A_{n-1}) \qquad n \ge 2.

Also by more casework one obtains the recursion

\displaystyle  A_n = A_{n-1} + A_{n-3} + 1 \qquad n \ge 3.

From this one obtains

\displaystyle  P_n - P_{n-1} = 2(A_{n-1}-A_{n-2}+A_{n-3}) = 2(A_{n-3}+A_{n-4}+1).

So we know {P_{n+5}-P_{n+4}} and can compute {P_{n+3}-P_n = (P_{n+3}-P_{n+2}) + (P_{n+2}-P_{n+1}) + (P_{n+1}-P_n)} so a long algebraic computation gives that these differ by {\boxed 4}.

Also, I swear I’ve seen the exact condition {|\pi(i)-\pi(j)|\le 2} many years ago, back when I was doing high school olympiads. But it was so long ago I don’t remember the source.

Cauchy’s Functional Equation and Zorn’s Lemma

This is a draft of an appendix chapter for my Napkin project.

In the world of olympiad math, there’s a famous functional equation that goes as follows:

\displaystyle f : {\mathbb R} \rightarrow {\mathbb R} \qquad f(x+y) = f(x) + f(y).

Everyone knows what its solutions are! There’s an obvious family of solutions {f(x) = cx}. Then there’s also this family of… uh… noncontinuous solutions (mumble grumble) pathological (mumble mumble) Axiom of Choice (grumble).

Don’t worry, I know what I’m doing!

There’s also this thing called Zorn’s Lemma. It sounds terrifying, because it’s equivalent to the Axiom of Choice, which is also terrifying because why not.

In this post I will try to de-terrify these things, because they’re really not terrifying and I’m not sure why no one bothered to explain this properly yet. I have yet to see an olympiad handout that explains how you would construct a pathological solution, even though it’s really quite natural. So let me fix this problem now…

1.Let’s Construct a Monster

Let’s try to construct a “bad” function {f} and see what happens.

By scaling, let’s assume WLOG that {f(1) = 1}. Thus {f(n) = n} for every integer {n}, and you can easily show from here that

\displaystyle f\left( \frac mn \right) = \frac mn.

So {f} is determined for all rationals. And then you get stuck.

None of this is useful for determining, say, {f(\sqrt 2)}. You could add and subtract rational numbers all day and, say, {\sqrt 2} isn’t going to show up at all.

Well, we’re trying to set things on fire anyways, so let’s set

\displaystyle f(\sqrt 2) = 2015

because why not? By the same induction, we get {f(n\sqrt2) = 2015n}, and then that

\displaystyle f\left( a + b \sqrt 2 \right) = a + 2015b.

Here {a} and {b} are rationals. Well, so far so good — as written, this is a perfectly good solution, other than the fact that we’ve only defined {f} on a tiny portion of the real numbers.

Well, we can do this all day:

\displaystyle f\left( a + b \sqrt 2 + c \sqrt 3 + d \pi \right) = a + 2015b + 1337c - 999d.

Perfectly consistent.

You can kind of see how we should keep going now. Just keep throwing in new real numbers which are “independent” to the previous few, assigning them to whatever junk we want. It feels like it should be workable. . .

In a moment I’ll explain what “independent” means (though you might be able to guess already), but at the moment there’s a bigger issue: no matter how many numbers we throw, it seems like we’ll never finish. Let’s address the second issue first.

2. Review of Finite Induction

When you do induction, you get to count off {1}, {2}, {3}, … and so on. So for example, suppose we had a “problem” such as the following: Prove that the intersection of {n} open intervals is either {\varnothing} or an open interval. You can do this by induction easily: it’s true for {n = 2}, and for the larger cases it’s similarly easy.

But you can’t conclude from this that infinitely many open intervals intersect at some open interval. Indeed, this is false: consider the intervals

\displaystyle \left( -1, 1 \right), \quad \left( -\frac12, \frac12 \right), \quad \left( -\frac13, \frac13 \right), \quad \left( -\frac14, \frac14 \right), \quad \dots

This infinite set of intervals intersects at a single point {\{0\}}!

The moral of the story is that induction doesn’t let us reach infinity. Too bad, because we’d have loved to use induction to help us construct a monster. That’s what we’re doing, after all — adding things in one by one.

3. Transfinite Induction

Well, it turns out we can, but we need a new notion of number.

For this we need a notion of an ordinal number. I defined these in their full glory a previous blog post, but I don’t need the full details of that. Here’s what I want to say: after all the natural numbers

\displaystyle 0, \; 1, \; \dots,

I’ll put a new number called {\omega}, representing how large the natural numbers are. After that there’s a number called

\displaystyle \omega+1, \; \omega+2, \; \dots

and eventually a number called {2\omega}.

The list goes on:

\displaystyle \begin{aligned} 0, & 1, 2, 3, \dots, \omega \\ & \omega+1, \omega+2, \dots, \omega+\omega \\ & 2\omega+1, 2\omega+2, \dots, 3\omega \\ & \vdots \\ & \omega^2 + 1, \omega^2+2, \dots \\ & \vdots \\ & \omega^3, \dots, \omega^4, \dots, \omega^\omega \\ & \vdots, \\ & \omega^{\omega^{\omega^{\dots}}} \\ \end{aligned}

Pictorially, it kind of looks like this:


Anyways, in the same way that natural numbers dominate all finite sets, the ordinals dominate all the sets.

Theorem 1 For every set {S} there’s some ordinal {\alpha} which is bigger than it.

But it turns out (and you can intuitively see) that as large as the ordinals grow, there is no infinite descending chain. Meaning: if I start at an ordinal (like {2 \omega + 4}) and jump down, I can only take finitely many jumps before I hit {0}. (To see this, try writing down a chain starting at {2 \omega + 4} yourself.) Hence, induction and recursion still work verbatim:

Theorem 2 Given a statement {P(-)}, suppose that

  • {P(0)} is true, and
  • If {P(\alpha)} is true for all {\alpha < \beta}, then {P(\beta)} is true.

Then {P(\beta)} is true.

Similarly, you’re allowed to do recursion to define {x_\beta} if you know the value of {x_\alpha} for all {\alpha < \beta}.

The difference from normal induction or recursion is that we’ll often only do things like “define {x_{n+1} = \dots}”. But this is not enough to define {x_\alpha} for all {\alpha}. To see this, try using our normal induction and see how far we can climb up the ladder.

Answer: you can’t get {\omega}! It’s not of the form {n+1} for any of our natural numbers {n} — our finite induction only lets us get up to the ordinals less than {\omega}. Similarly, the simple {+1} doesn’t let us hit the ordinal {2\omega}, even if we already have {\omega+n} for all {n}. Such ordinals are called limit ordinals. The ordinal that are of the form {\alpha+1} are called successor ordinals.

So a transfinite induction or recursion is very often broken up into three cases. In the induction phrasing, it looks like

  • (Zero Case) First, resolve {P(0)}.
  • (Successor Case) Show that from {P(\alpha)} we can get {P(\alpha+1)}.
  • (Limit Case) Show that {P(\lambda)} holds given {P(\alpha)} for all {\alpha < \lambda}, where {\lambda} is a limit ordinal.

Similarly, transfinite recursion often is split into cases too.

  • (Zero Case) First, define {x_0}.
  • (Successor Case) Define {x_{\alpha+1}} from {x_\alpha}.
  • (Limit Case) Define {x_\lambda} from {x_\alpha} for all {\alpha < \lambda}, where {\lambda} is a limit ordinal.

In both situations, finite induction only does the first two cases, but if we’re able to do the third case we can climb far above the barrier {\omega}.

4. Wrapping Up Functional Equations

Let’s return to solving our problem.

Let {S_n} denote the set of “base” numbers we have at the {n} the step. In our example, we might have

\displaystyle S_1 = \left\{ 1 \right\}, \quad S_2 = \left\{ 1, \sqrt 2 \right\}, \quad S_3 = \left\{ 1, \sqrt 2, \sqrt 3 \right\}, \quad S_4 = \left\{ 1, \sqrt 2, \sqrt 3, \pi \right\}, \quad \dots

and we’d like to keep building up {S_i} until we can express all real numbers. For completeness, let me declare {S_0 = \varnothing}.

First, I need to be more precise about “independent”. Intuitively, this construction is working because

\displaystyle a + b \sqrt 2 + c \sqrt 3 + d \pi

is never going to equal zero for rational numbers {a}, {b}, {c}, {d} (other than all zeros). In general, a set {X} of numbers is “independent” if the combination

\displaystyle c_1x_1 + c_2x_2 + \dots + c_mx_m = 0

never occurs for rational numbers {{\mathbb Q}} unless {c_1 = c_2 = \dots = c_m = 0}. Here {x_i \in X} are distinct. Note that even if {X} is infinite, I can only take finite sums! (This notion has a name: we want {X} to be linearly independent over {{\mathbb Q}}.)

When do we stop? We’d like to stop when we have a set {S_{\text{something}}} that’s so big, every real number can be written in terms of the independent numbers. (This notion also has a name: it’s called a {{\mathbb Q}}-basis.) Let’s call such a set spanning; we stop once we hit a spanning set.

The idea that we can induct still seems okay: suppose {S_\alpha} isn’t spanning. Then there’s some number that is independent of {S_\alpha}, say {\sqrt{2015}\pi} or something. Then we just add it to get {S_{\alpha+1}}. And we keep going.

Unfortunately, as I said before it’s not enough to be able to go from {S_\alpha} to {S_{\alpha+1}} (successor case); we need to handle the limit case as well. But it turns out there’s a trick we can do. Suppose we’ve constructed all the sets {S_0}, {S_1}, {S_2}, …, one for each positive integer {n}, and none of them are spanning. The next thing I want to construct is {S_\omega}; somehow I have to “jump”. To do this, I now take the infinite union

\displaystyle S_\omega \overset{\text{def}}{=} S_0 \cup S_1 \cup S_2 \cup \dots.

The elements of this set are also independent (why?).

Ta-da! With the simple trick of “union all the existing sets”, we’ve just jumped the hurdle to the first limit ordinal {\omega}. Then we can construct {S_{\omega+1}}, {S_{\omega+2}}, …, and for the next limit we just do the same trick of “union-ing” all the previous sets.

So we can formalize the process as follows:

  1. Let {S_0 = \varnothing}.
  2. For a successor stage {S_{\alpha+1}}, add any element to {S_\alpha} to obtain {S_{\alpha+1}}.
  3. For a limit stage {S_{\lambda}}, take the union {\bigcup_{\gamma < \lambda} S_\gamma}.

How do we know that we’ll stop eventually? Well, the thing is that this process consumes a lot of real numbers. In particular, the ordinals get larger than the size of {{\mathbb R}}. Hence if we don’t stop we will quite literally reach a point where we have used up every single real number. Clearly that’s impossible, because by then the elements can’t possibly be independent! (EDIT Dec 20 2015: To be clear, the claim that “ordinals get larger than the size of the reals” requires the Axiom of Choice; one can’t do this construction using transfinite induction alone. Thanks reddit for calling me out on this.)

So by transfinite recursion (and Choice), we eventually hit some {S_\gamma} which is spanning: the elements are all independent, but every real number can be expressed using it. Done! This set has a name: a Hamel basis.

Thus this problem is dead, dead, dead. Any questions?


5. Zorn’s Lemma

Now I can tell you what Zorn’s Lemma is: it lets us do the same thing in any poset.

We can think of the above example as follows: consider all sets of independent elements. These form a partially ordered set by inclusion, and what we did was quite literally climb up a chain

\displaystyle S_0 \subset S_1 \subset S_2 \subset \dots.

It’s not quite climbing since we weren’t just going one step at a time: we had to do “jumps” to get up to {S_\omega} and resume climbing. But the main idea is to climb up a poset until we’re at the very top; in the previous case, when we reached the spanning set.

The same thing works verbatim with any partially ordered set {\mathcal P}. Let’s define some terminology. A local maximum (or maximal element) of the entire poset {\mathcal P} is an element which has no other elements strictly greater than it.

Now a chain of length {\gamma} is a set of elements {p_\alpha} for every {\alpha < \gamma} such that {p_0 < p_1 < p_2 < \dots}. (Observe that a chain has a last element if and only if {\gamma} is a successor ordinal, like {\omega+3}.) An upper bound to a chain is an element {\tilde p} which is greater than or equal to all elements of the chain. In particular, if {\gamma} is a successor ordinal, then just taking the last element of the chain works.

In this language, Zorn’s Lemma states that

Theorem 3 (Zorn’s Lemma) Let {\mathcal P} be a nonempty partially ordered set. If every chain has an upper bound, then {\mathcal P} has a local maximum.

Chains with length equal to a successor ordinal always have upper bounds, but this is not true in the limit case. So the hypothesis of Zorn’s Lemma is exactly what lets us “jump” up to define {p_\omega} and other limit ordinals. And the proof of Zorn’s Lemma is straightforward: keep climbing up the poset at successor stages, using Zorn’s condition to jump up at limit stages, and thus building a really long chain. But we have to eventually stop, or we literally run out of elements of {\mathcal P}. And the only possible stopping point is a local maximum.

If we want to phrase our previous solution in terms of Zorn’s Lemma, we’d say: Proof: Look at the poset whose elements are sets of independent real numbers. Every chain {S_0 \subset S_1 \subset \dots} has an upper bound {\bigcup S_\alpha} (which you have to check is actually an element of the poset). Thus by Zorn, there is a local maximum {S}. Then {S} must be spanning, because otherwise we could add an element to it. \Box

So really, Zorn’s Lemma is encoding all of the work of climbing that I argued earlier. It’s a neat little package that captures all the boilerplate, and tells you exactly what you need to check.

One last thing you might ask: where is the Axiom of Choice used? Well, the idea is that for any chain there could be lots of {\tilde p}‘s, and you need to pick one of them. Since you are making arbitrary choices infinitely many times, you need the Axiom of Choice. But really, it’s nothing special. [EDIT: AM points out that in order to talk about cardinalities in Theorem 1, one also needs the Axiom of Choice.]

6. Conclusion

In the words of Timothy Gowers,

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

Really, there’s nothing tricky at all here. People seem scared of Zorn’s Lemma, and claim it’s not intuitive or something. But really, all we’re doing is climbing up a poset. Nothing tricky at all.