Math contest platitudes, v3

I think it would be nice if every few years I updated my generic answer to “how do I get better at math contests?”. So here is the 2019 version. Unlike previous instances, I’m going to be a little less olympiad-focused than I usually am, since these days I get a lot of people asking for help on the AMC and AIME too.

(Historical notes: you can see the version from right after I graduated and the version from when I was still in high school. I admit both of them make me cringe slightly when I read them today. I still think everything written there is right, but the style and focus seems off to me now.)

0. Stop looking for the “right” training (or: be yourself)

These days many of the questions I get are clearly most focused on trying to find a perfect plan — questions like “what did YOU do to get to X” or “how EXACTLY do I practice for Y”. (Often these words are in all-caps in the email, too!) When I see these I always feel very hesitant to answer. The reason is that I always feel like there’s some implicit hope that I can give you some recipe that, if you follow it, will guarantee reaching your goals.

I’m sorry, math contests don’t work that way (and can’t work that way). I actually think that if I gave you a list of which chapters of which books I read in 2009-2010 over which weeks, and which problems I did on each day, and you followed it to the letter, it would go horribly.

Why? It’s not just a talent thing, I think. Solving math problems is actually a deeply personal art: despite being what might appear to be a cold and logical discipline, learning math and getting better at it actually requires being human. Different people find different things natural or unnatural, easy or hard, et cetera. If you try to squeeze yourself into some mold or timeline then the results will probably be counterproductive.

On the flip side, this means that you can worry a lot less. I actually think that surprisingly often, you can get a first-order approximation of what’s the “best” thing to do by simply doing whatever feels the most engaging or rewarding (assuming you like math, of course). Of course there are some places where this is not correct (e.g., you might hate geometry, but cannot just ignore it). But the first-order approximation is actually quite decent.

That’s why in the introduction to my geometry book, I explicitly have the line:

Readers are encouraged to not be bureaucratic in their learning and move around as they see fit, e.g., skipping complicated sections and returning to them later, or moving quickly through familiar material.

Put another way: as learning math is quite personal, the advice “be yourself” is well-taken.

1. Some brief recommendations (anyways)

With all that said, probably no serious harm will come from me listing a little bit of references I think are reasonable — so that you have somewhere to start, and can oscillate from there.

For learning theory and fundamentals:

For sources of additional practice problems (other than the particular test you’re preparing for):

  • The collegiate contests HMMT November, PUMaC, CMIMC will typically have decent short-answer problems.
  • HMMT February is by far the hardest short-answer contest I know of.
  • At the olympiad level, there are so many national olympiads and team selection tests that you will never finish. (My website has an archive of USA problems and solutions if you’re interested in those in particular.)
    The IMO Shortlist is also good place to work as it contains proposals of varying difficulty from many countries — and thus is the most culturally diverse. As for other nations, as a rule of thumb, any country that often finishes in the top 20 at the IMO (say) will probably have a good questions on their national olympiad or TST.

For every subject that’s not olympiad geometry, there are actually surprisingly few named theorems.

2. Premature optimization is the root of all evil (so just get your hands dirty)

For some people, the easiest first step to getting better is to double the amount of time you spend practicing. (Unless that amount is zero, in which case, you should just start.)

There is a time and place for spending time thinking about how to practice — one example is if you’ve been working a while and feel like nothing has changed, or you’ve been working on some book and it just doesn’t feel fun, etc. Another common example is if you notice you keep missing all the functional equations on the USAMO: then, maybe it’s time to search up some handouts on functional equations. Put another way, if you feel stuck, then you can start thinking about whether you’re not doing something right.

On the other extreme, if you’re wondering whether you are ready to read book X or do problems from Y contest, my advice is to just try it and see if you like it. There is no commitment: just read Chapter 1, see how you feel. If it works, keep doing it, if not, try something else.

(I can draw an analogy from my own life. Whenever I am learning a new board game or card game, like Catan or Splendor or whatever, I always overthink it. I spend all this time thinking and theorizing and trying to come up with this brilliant strategy — which never works, because it’s my first game, for crying out loud. It turns out that until you start grappling at close range and getting your hands dirty, your internal model of something you’ve never done is probably not that good.)

3. Doing problems just above your level (and a bit on reflecting on them)

There is one pitfall that I do see sometimes, common enough I will point it out. If you mostly (only?) do old practice tests or past problems, then you’re liable to be spending too much time on easy problems. That was the topic of another old post of mine, but the short story is that if you find yourself constantly getting 130ish on AMC10 practice tests, then maybe you should spend most of your time working on problems 21-25 rather than repeatedly grinding 1-20 over and over. (See 28:30-29:00 here to hear Zuming make fun of them.)

The common wisdom is that you should consistently do problems just above your level so that you gradually increase the difficulty of problems you are able to solve. The situation is a little more nuanced at the AMC/AIME level, since for those short-answer contests it’s also important to be able to do routine problems quickly and accurately. However, I think for most people, you really should be spending at least 70% of your time getting smarter, rather than just faster.

I think in this case, I want to give concrete descriptions. Here’s some examples of what can happen after a problem.

  • You looked at the problem and immediately (already?) knew how to do it. Then you probably didn’t learn much from it. (But at least you’ll get faster, if not smarter.)
  • You looked at the problem and didn’t know right away how to start, but after a little while figured it out. That’s a little better.
  • You struggled with the problem and eventually figured out a solution, but maybe not the most elegant one. I think that’s a great situation to be in. You came up with some solution to the problem, so you understand it fairly well, but there’s still more for you to update your instincts on. What can you do in the future to get solutions more like the elegant one?
  • You struggled with the problem and eventually gave up, then when you read the solution you realize quickly what you were missing. I think that’s a great situation to be in, too. You now want to update your instincts by a little bit — how could you make sure you don’t miss something like that again in the future?
  • The official solution quoted some theorem you don’t know. If this was among a batch of problems where the other problems felt about the right level to you, then I think often this is a pretty good time to see if you can learn the statement (better, proof) of the theorem. You have just spent some time working on a situation in which the theorem was useful, so that data is fresh in your mind. And pleasantly often, you will find that ideas you came up with during your attempt on the problem correspond to ideas in the statement or proof of the theorem, which is great!
  • You didn’t solve the problem, and the solution makes sense, but you don’t see how you would have come up with it. It’s possible that this is the fault of the solutions author (many people are actually quite bad at making solutions read naturally). If you have a teacher, this is the right time to ask them about it. But it’s also possible that the problem was too hard. In general, I think it’s better to miss problems “by a little”, whatever that means, so that you can update your intuition correctly.
  • You can’t even understand the solution. Okay, too hard.

You’ll notice how much emphasis I place on the post-problem reflection process. This is actually important — after all the time you spent working on the problem itself, you want to update your instincts as much as possible to get the payoff. In particular, I think it’s usually worth it to read the solutions to problems you worked on, whether or not you solve them. In general, after reading a solution, I think you should be able to state in a couple sentences all the main ideas of the solution, and basically know how to solve the problem from there.

For the olympiad level, I have a whole different post dedicated to reading solutions, and interested readers can read more there. (One point from that post I do want to emphasize since it wasn’t covered explicitly in any of the above examples: by USA(J)MO level it becomes important to begin building intuition that you can’t explicitly formalize. You may start having vague feelings and notions that you can’t quite put your finger on, but you can feel it. These non-formalizable feelings are valuable, take note of them.)

4. Leave your ego out (e.g. be willing to give up on problems)

This is easy advice to give, but it’s hard advice to follow. For concreteness, here are examples of things I think can be explained this way.

Sometimes people will ask me whether they need to solve every problem in each chapter of EGMO, or do every past practice test, or so on. The answer is: of course not, and why would you even think that? There’s nothing magical about doing 80% of the problems versus 100% of them. (If there was, then EGMO is secretly a terrible book, because I commented out some problems, and so OH NO YOU SKIPPED SOME AAAHHHHH.) And so it’s okay to start Chapter 5 even though you didn’t finish that last challenge problem at the end. Otherwise you let one problem prevent you from working on the next several.

Or, sometimes I learn about people who, if they do not solve an olympiad problem, will refuse to look at the solution; instead they will mark it in a spreadsheet and to come back to later. In short, they never give up on a problem: which I think is a bad idea, since reflecting on missed problems is so important. (It is not as if you can realistically run out of olympiad problems to do.) And while this is still better than giving up too early, I mean, all things in moderation, right?

I think if somehow people were able to completely leave your ego out, and not worry at all about how good you are and rather just maximize learning, then mistakes like these two would be a lot rarer. Of course, this is impossible to do in practice (we’re all human), but it’s good to keep in mind at least that this is an ideal we can strive for.

5. Enjoy it

Which leads me to the one bit that everyone already knows, but that no platitude-filled post would be complete without: to do well at math contests (or anything hard) you probably have to enjoy the process of getting better. Not just the end result. You have to enjoy the work itself.

Which is not to say you have to do it all the time or for hours a day. Doing math is hard, so you get tired eventually, and beyond that forcing yourself to work is not productive. Thus when I see people talk about how they plan to do every shortlist problem, or they will work N hours per day over M time, I always feel a little uneasy, because it always seems too results-oriented.

In particular, I actually think it’s quite hard to spend more than two or three good hours per day on a regular basis. I certainly never did — back in high school (and even now), if I solved one problem that took me more than an hour, that was considered a good day. (But I should also note that the work ethic of my best students consistently amazes me; it far surpasses mine.) In that sense, the learning process can’t be forced or rushed.

There is one sense in which you can get more hours a day, that I am on record saying quite often: if you think about math in the shower, then you know you’re doing it right.

 

Some things Evan is working on for 2019

With Christmas Day, here are some announcements about my work that will possibly interest readers of this blog.

OTIS V Applications

Applications for OTIS V are open now, so if you are an olympiad contestant interested in working with me during the 2019-2020 school year, here is your chance. I’m hoping to find 20-40 students for the next school year. Note that the application has math problems in it, unlike previous years, so you have to start early.

OTIS Lecture Series

At the same time, I realize that I will never be able to take everyone for OTIS. So I am planning to post a substantial fraction of OTIS materials for public consumption, hopefully by late January, but no promises.

Napkin 2nd edition

The Napkin is getting a second edition which, if all goes well, should come out by the end of February (but that is a big “if”). Most chapters will be mostly unchanged modulo typos, but a few big changes:

  • I am hoping to add a new part on measure theory with an eye towards probability applications (e.g. law of large numbers, central limit theorem, stopped martingales).
  • There will be a bit of real analysis / calculus now. (Not much.)
  • Maybe two-ish bonus chapters on other topics being added.
  • The earliest chapters (on algebra and topology) are being re-organized significantly, though most of the content should remain the same.
  • The algebraic geometry chapters on schemes are getting a major facelift, because the old ones were terrible. They will still cover roughly the same content, but in a way that makes more sense, has more examples, and has more pictures.

This means that for the first time the numbering of the chapters is going to break with the new update. This also means there will be plenty of new typos and mistakes for readers to find. I’m looking forward to it!

SPARC 2019 applications

For high school students, SPARC applications will open soon. The deadline will probably be the end of February. This year SPARC will be held in the Bay Area from July 24 to August 2.

A few shockingly linear graphs

There’s a recent working paper by economists Ruchir Agarwal and Patrick Gaule which I think would be of much interest to this readership: a systematic study of IMO performance versus success as a mathematician later on.

Here is a link to the working paper.

Despite the click-baity title and dreamy introduction about the Millenium Prizes, the rest of the paper is fascinating, and the figures section is a gold mine. Here are two that stood out to me:

There’s also one really nice idea they had, which was to investigate the effect of getting one point less than a gold medal, versus getting exactly a gold medal. This is a pretty clever way to account for the effect of the prestige of the IMO, since “IMO gold” sounds so much better on a CV than “IMO silver” even though in any given year they may not differ so much. To my surprise, the authors found that “being awarded a better medal appears to have no additional impact on becoming a professional mathematician or future knowledge production”. I included the relevant graph below here.

The data used in the paper spans from IMO 1981 to IMO 2000. This is before the rise of Art of Problem Solving and the Internet (and the IMO was smaller back then, anyways), so I imagine these graphs might look different if we did them in 2040 using IMO 2000 – IMO 2020 data, although I’m not even sure whether I expect the effects to be larger or smaller.

(As usual: I do not mean to suggest that non-IMO participants cannot do well in math later. This is so that I do not get flooded with angry messages like last time.)

A trailer for p-adic analysis, second half: Mahler coefficients

In the previous post we defined {p}-adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions {f \colon \mathbb Z_p \rightarrow \mathbb Q_p}. Then we give the famous proof of the Skolem-Mahler-Lech theorem using {p}-adic analysis.

1. Digression on {\mathbb C_p}

Before I go on, I want to mention that {\mathbb Q_p} is not algebraically closed. So, we can take its algebraic closure {\overline{\mathbb Q_p}} — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain {\mathbb C_p}. In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space {\mathbb C_p} which is algebraically closed and complete. This space is called the {p}-adic complex numbers.

We won’t need {\mathbb C_p} at all in what follows, so you can forget everything you just read.

2. Mahler coefficients: a description of continuous functions on {\mathbb Z_p}

One of the big surprises of {p}-adic analysis is that we can concretely describe all continuous functions {\mathbb Z_p \rightarrow \mathbb Q_p}. They are given by a basis of functions

\displaystyle  \binom xn \overset{\mathrm{def}}{=} \frac{x(x-1) \dots (x-(n-1))}{n!}

in the following way.

Theorem 1 (Mahler; see Schikhof Theorem 51.1 and Exercise 51.B)

Let {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} be continuous, and define

\displaystyle  a_n = \sum_{k=0}^n \binom nk (-1)^{n-k} f(n).  \ \ \ \ \ (1)

Then {\lim_n a_n = 0} and

\displaystyle  f(x) = \sum_{n \ge 0} a_n \binom xn.

Conversely, if {a_n} is any sequence converging to zero, then {f(x) = \sum_{n \ge 0} a_n \binom xn} defines a continuous function satisfying (1).

The {a_i} are called the Mahler coefficients of {f}.

Exercise 2

Last post we proved that if {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is continuous and {f(n) = (-1)^n} for every {n \in \mathbb Z_{\ge 0}} then {p = 2}. Re-prove this using Mahler’s theorem, and this time show conversely that a unique such {f} exists when {p=2}.

You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.

\displaystyle  \begin{aligned} a_0 &= f(0) \\ a_1 &= f(1) - f(0) \\ a_2 &= f(2) - 2f(1) - f(0) \\ a_3 &= f(3) - 3f(2) + 3f(1) - f(0). \end{aligned}

Thus one can think of {a_n \rightarrow 0} as saying that the values of {f(0)}, {f(1)}, \dots behave like a polynomial modulo {p^e} for every {e \ge 0}. Amusingly, this fact was used on a USA TST in 2011:

Exercise 3 (USA TST 2011/3)

Let {p} be a prime. We say that a sequence of integers {\{z_n\}_{n=0}^\infty} is a {p}-pod if for each {e \geq 0}, there is an {N \geq 0} such that whenever {m \geq N}, {p^e} divides the sum

\displaystyle  \sum_{k=0}^m (-1)^k \binom mk z_k.

Prove that if both sequences {\{x_n\}_{n=0}^\infty} and {\{y_n\}_{n=0}^\infty} are {p}-pods, then the sequence {\{x_n y_n\}_{n=0}^\infty} is a {p}-pod.

3. Analytic functions

We say that a function {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is analytic if it has a power series expansion

\displaystyle  \sum_{n \ge 0} c_n x^n \quad c_n \in \mathbb Q_p \qquad\text{ converging for } x \in \mathbb Z_p.

As before there is a characterization in terms of the Mahler coefficients:

Theorem 4 (Schikhof Theorem 54.4)

The function {f(x) = \sum_{n \ge 0} a_n \binom xn} is analytic if and only if

\displaystyle  \lim_{n \rightarrow \infty} \frac{a_n}{n!} = 0.

Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on {\mathbb Z_p}.

Theorem 5 (Strassmann’s theorem)

Let {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} be analytic. Then {f} has finitely many zeros.

4. Skolem-Mahler-Lech

We close off with an application of the analyticity results above.

Theorem 6 (Skolem-Mahler-Lech)

Let {(x_i)_{i \ge 0}} be an integral linear recurrence. Then the zero set of {x_i} is eventually periodic.

Proof: According to the theory of linear recurrences, there exists a matrix {A} such that we can write {x_i} as a dot product

\displaystyle  x_i = \left< A^i u, v \right>.

Let {p} be a prime not dividing {\det A}. Let {T} be an integer such that {A^T \equiv \mathbf{1} \pmod p}.

Fix any {0 \le r < N}. We will prove that either all the terms

\displaystyle  f(n) = x_{nT+r} \qquad n = 0, 1, \dots

are zero, or at most finitely many of them are. This will conclude the proof.

Let {A^T = \mathbf{1} + pB} for some integer matrix {B}. We have

\displaystyle  \begin{aligned} f(n) &= \left< A^{nT+r} u, v \right> = \left< (\mathbf1 + pB)^n A^r u, v \right> \\ &= \sum_{k \ge 0} \binom nk \cdot p^n \left< B^n A^r u, v \right> \\ &= \sum_{k \ge 0} a_n \binom nk \qquad \text{ where } a_n = p^n \left< B^n A^r u, v \right> \in p^n \mathbb Z. \end{aligned}

Thus we have written {f} in Mahler form. Initially, we define {f \colon \mathbb Z_{\ge 0} \rightarrow \mathbb Z}, but by Mahler’s theorem (since {\lim_n a_n = 0}) it follows that {f} extends to a function {f \colon \mathbb Z_p \rightarrow \mathbb Q_p}. Also, we can check that {\lim_n \frac{a_n}{n!} = 0} hence {f} is even analytic.

Thus by Strassman’s theorem, {f} is either identically zero, or else it has finitely many zeros, as desired. \Box

A trailer for p-adic analysis, first half: USA TST 2003

I think this post is more than two years late in coming, but anywhow…

This post introduces the {p}-adic integers {\mathbb Z_p}, and the {p}-adic numbers {\mathbb Q_p}. The one-sentence description is that these are “integers/rationals carrying full mod {p^e} information” (and only that information).

The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.

In this whole post, {p} is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.

1. Motivation

Before really telling you what {\mathbb Z_p} and {\mathbb Q_p} are, let me tell you what you might expect them to do.

In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:

  • Taking modulo a prime {p} or prime {p^e}, and
  • Looking at the exponent {\nu_p}.

Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo {p} to gain something else to work with. Unfortunately, taking modulo {p} loses some information: (the reduction {\mathbb Z \twoheadrightarrow \mathbb Z/p} is far from injective).

If we want finer control, we could consider instead taking modulo {p^2}, rather than taking modulo {p}. This can also give some new information (cubes modulo {9}, anyone?), but it has the disadvantage that {\mathbb Z/p^2} isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo {p}.

One of the goals of {p}-adic numbers is that we can get around these two issues I described. The {p}-adic numbers we introduce is going to have the following properties:

  1. You can “take modulo {p^e} for all {e} at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with {p}-adic numbers, we won’t have to make that choice. An equation of {p}-adic numbers carries enough information to take modulo {p^e}.
  2. The numbers {\mathbb Q_p} form a field, the nicest possible algebraic structure: {1/p} makes sense. Contrast this with {\mathbb Z/p^2}, which is not even an integral domain.
  3. It doesn’t lose as much information as taking modulo {p} does: rather than the surjective {\mathbb Z \twoheadrightarrow \mathbb Z/p} we have an injective map {\mathbb Z \hookrightarrow \mathbb Z_p}.
  4. Despite this, you “ignore” some “irrelevant” data. Just like taking modulo {p}, you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation { a^2 + b^2 + c^2 + d^2 = -1} has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime {p}, because once you take modulo {p} you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in {p}-adics: the above equation has a solution in {\mathbb Z_p} for every prime {p}.)

So, you can think of {p}-adic numbers as the right tool to use if you only really care about modulo {p^e} information, but normal {\mathbb Z/p^e} isn’t quite powerful enough.

To be more concrete, I’ll give a poster example now:

Example 1 (USA TST 2002/2)

For a prime {p}, show the value of

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}

does not depend on {x}.

Here is a problem where we clearly only care about {p^e}-type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod {p^3} (try it!). The basic issue is that there is no good way to deal with the denominators modulo {p^3} (in part {\mathbb Z/p^3} is not even an integral domain).

However, with {p}-adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity

\displaystyle \left( 1 + \frac{px}{k} \right)^{-2} = \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^n.

Such an identity makes no sense over {\mathbb Q} or {\mathbb R} for converge reasons, but it will work fine over the {\mathbb Q_p}, which is all we need.

2. Algebraic perspective

We now construct {\mathbb Z_p} and {\mathbb Q_p}. I promised earlier that a {p}-adic integer will let you look at “all residues modulo {p^e}” at once. This definition will formalize this.

2.1. Definition of {\mathbb Z_p}

Definition 2 (Introducing {\mathbb Z_p})

A {p}-adic integer is a sequence

\displaystyle x = (x_1 \bmod p, \; x_2 \bmod{p^2}, \; x_3 \bmod{p^3}, \; \dots)

of residues {x_e} modulo {p^e} for each integer {e}, satisfying the compatibility relations {x_i \equiv x_j \pmod{p^i}} for {i < j}.

The set {\mathbb Z_p} of {p}-adic integers forms a ring under component-wise addition and multiplication.

Example 3 (Some {3}-adic integers)

Let {p=3}. Every usual integer {n} generates a (compatible) sequence of residues modulo {p^e} for each {e}, so we can view each ordinary integer as {p}-adic one:

\displaystyle 50 = \left( 2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 50 \bmod{81}, \; 50 \bmod{243}, \; \dots \right).

On the other hand, there are sequences of residues which do not correspond to any usual integer despite satisfying compatibility relations, such as

\displaystyle \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right)

which can be thought of as {x = 1 + p + p^2 + \dots}.

In this way we get an injective map

\displaystyle \mathbb Z \hookrightarrow \mathbb Z_p \qquad n \mapsto \left( n \bmod p, n \bmod{p^2}, n \bmod{p^3}, \dots \right)

which is not surjective. So there are more {p}-adic integers than usual integers.

(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as

\displaystyle \mathbb Z_p \overset{\mathrm{def}}{=} \varprojlim \mathbb Z/p^e \mathbb Z

where the inverse limit is taken across {e \ge 1}.)

Exercise 4

Check that {\mathbb Z_p} is an integral domain.

2.2. Base {p} expansion

Here is another way to think about {p}-adic integers using “base {p}”. As in the example earlier, every usual integer can be written in base {p}, for example

\displaystyle 50 = \overline{1212}_3 = 2 \cdot 3^0 + 1 \cdot 3^1 + 2 \cdot 3^2 + 1 \cdot 3^3.

More generally, given any {x = (x_1, \dots) \in \mathbb Z_p}, we can write down a “base {p}” expansion in the sense that there are exactly {p} choices of {x_k} given {x_{k-1}}. Continuing the example earlier, we would write

\displaystyle \begin{aligned} \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3 \end{aligned}

and in general we can write

\displaystyle x = \sum_{k \ge 0} a_k p^k = \overline{\dots a_2 a_1 a_0}_p

where {a_k \in \{0, \dots, p-1\}}, such that the equation holds modulo {p^e} for each {e}. Note the expansion is infinite to the left, which is different from what you’re used to.

(Amusingly, negative integers also have infinite base {p} expansions: {-4 = \overline{\dots222212}_3}, corresponding to {(2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 77 \bmod{81} \dots)}.)

Thus you may often hear the advertisement that a {p}-adic integer is an “possibly infinite base {p} expansion”. This is correct, but later on we’ll be thinking of {\mathbb Z_p} in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base {p}. Indeed, much of your intuition from generating functions {K[[X]]} (where {K} is a field) will carry over to {\mathbb Z_p}.

2.3. Constructing {\mathbb Q_p}

Here is one way in which your intuition from generating functions carries over:

Proposition 5 (Non-multiples of {p} are all invertible)

The number {x \in \mathbb Z_p} is invertible if and only if {x_1 \ne 0}. In symbols,

\displaystyle x \in \mathbb Z_p^\times \iff x \not\equiv 0 \pmod p.

Contrast this with the corresponding statement for {K[ [ X ] ]}: a generating function {F \in K[ [ X ] ]} is invertible iff {F(0) \neq 0}.

Proof: If {x \equiv 0 \pmod p} then {x_1 = 0}, so clearly not invertible. Otherwise, {x_e \not\equiv 0 \pmod p} for all {e}, so we can take an inverse {y_e} modulo {p^e}, with {x_e y_e \equiv 1 \pmod{p^e}}. As the {y_e} are themselves compatible, the element {(y_1, y_2, \dots)} is an inverse. \Box

Example 6 (We have {-\frac{1}{2} = \overline{\dots1111}_3 \in \mathbb Z_3})

We claim the earlier example is actually

\displaystyle \begin{aligned} -\frac{1}{2} = \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3. \end{aligned}

Indeed, multiplying it by {-2} gives

\displaystyle \left( -2 \bmod 3, \; -8 \bmod 9, \; -26 \bmod{27}, \; -80 \bmod{81}, \; \dots \right) = 1.

(Compare this with the “geometric series” {1 + 3 + 3^2 + \dots = \frac{1}{1-3}}. We’ll actually be able to formalize this later, but not yet.)

Remark 7 ({\frac{1}{2}} is an integer for {p > 2})

The earlier proposition implies that {\frac{1}{2} \in \mathbb Z_3} (among other things); your intuition about what is an “integer” is different here! In olympiad terms, we already knew {\frac{1}{2} \pmod 3} made sense, which is why calling {\frac{1}{2}} an “integer” in the {3}-adics is correct, even though it doesn’t correspond to any element of {\mathbb Z}.

Fun (but trickier) exercise: rational numbers correspond exactly to eventually periodic base {p} expansions.

With this observation, here is now the definition of {\mathbb Q_p}.

Definition 8 (Introducing {\mathbb Q_p})

Since {\mathbb Z_p} is an integral domain, we let {\mathbb Q_p} denote its field of fractions. These are the {p}-adic numbers.

Continuing our generating functions analogy:

\displaystyle \mathbb Z_p \text{ is to } \mathbb Q_p \quad\text{as}\quad K[[X]] \text{ is to } K((X)).

This means {\mathbb Q_p} is “Laurent series with base {p}”, and in particular according to the earlier proposition we deduce:

Proposition 9 ({\mathbb Q_p} looks like formal Laurent series)

Every nonzero element of {\mathbb Q_p} is uniquely of the form

\displaystyle p^k u \qquad \text{ where } k \in \mathbb Z, \; u \in \mathbb Z_p^\times.

Thus, continuing our base {p} analogy, elements of {\mathbb Q_p} are in bijection with “Laurent series”

\displaystyle \sum_{k \ge -n} a_k p^k = \overline{\dots a_2 a_1 a_0 . a_{-1} a_{-2} \dots a_{-n}}_p

for {a_k \in \left\{ 0, \dots, p-1 \right\}}. So the base {p} representations of elements of {\mathbb Q_p} can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).

(Fair warning: the field {\mathbb Q_p} has characteristic zero, not {p}.)

Remark 10 (Warning on fraction field)

This result implies that you shouldn’t think about elements of {\mathbb Q_p} as {x/y} (for {x,y \in \mathbb Z_p}) in practice, even though this is the official definition (and what you’d expect from the name {\mathbb Q_p}). The only denominators you need are powers of {p}.

To keep pushing the formal Laurent series analogy, {K((X))} is usually not thought of as quotient of generating functions but rather as “formal series with some negative exponents”. You should apply the same intuition on {\mathbb Q_p}.

(At this point I want to make a remark about the fact {1/p \in \mathbb Q_p}, connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo {p}, but if you do the quantity {n/p \bmod{p}} doesn’t make sense unless you know {n \bmod{p^2}}. You can’t fix this by just taking modulo {p^2} since then you need {n \bmod{p^3}} to get {n/p \bmod{p^2}}, ad infinitum. You can work around issues like this, but the nice feature of {\mathbb Z_p} and {\mathbb Q_p} is that you have modulo {p^e} information for “all {e} at once”: the information of {x \in \mathbb Q_p} packages all the modulo {p^e} information simultaneously. So you can divide by {p} with no repercussions.)

3. Analytic perspective

3.1. Definition

Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for {p}-adic purposes we only care about modulo {p^e} information. So, we’ll instead think of two elements of {\mathbb Z_p} or {\mathbb Q_p} as “close” if they differ by a large multiple of {p^e}.

For this we’ll borrow the familiar {\nu_p} from elementary number theory.

Definition 11 ({p}-adic valuation and absolute value)

We define the {p}-adic valuation {\nu_p : \mathbb Q_p^\times \rightarrow \mathbb Z} in the following two equivalent ways:

  • For {x = (x_1, x_2, \dots) \in \mathbb Z_p} we let {\nu_p(x)} be the largest {e} such that {x_e \equiv 0 \pmod{p^e}} (or {e=0} if {x \in \mathbb Z_p^\times}). Then extend to all of {\mathbb Q_p^\times} by {\nu_p(xy) = \nu_p(x) + \nu_p(y)}.
  • Each {x \in \mathbb Q_p^\times} can be written uniquely as {p^k u} for {u \in \mathbb Z_p^\times}, {k \in \mathbb Z}. We let {\nu_p(x) = k}.

By convention we set {\nu_p(0) = +\infty}. Finally, define the {p}-adic absolute value {\left\lvert \bullet \right\rvert_p} by

\displaystyle \left\lvert x \right\rvert_p = p^{-\nu_p(x)}.

In particular {\left\lvert 0 \right\rvert_p = 0}.

This fulfills the promise that {x} and {y} are close if they look the same modulo {p^e} for large {e}; in that case {\nu_p(x-y)} is large and accordingly {\left\lvert x-y \right\rvert_p} is small.

3.2. Ultrametric space

In this way, {\mathbb Q_p} and {\mathbb Z_p} becomes a metric space with metric given by {\left\lvert x-y \right\rvert_p}.

Exercise 12

Suppose {f \colon \mathbb Z_p \rightarrow \mathbb Q_p} is continuous and {f(n) = (-1)^n} for every {n \in \mathbb Z_{\ge 0}}. Prove that {p = 2}.

In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from {\mathbb R}.

Proposition 13 ({\left\lvert \bullet \right\rvert_p} is an ultrametric)

For any {x,y \in \mathbb Z_p}, we have the strong triangle inequality

\displaystyle \left\lvert x+y \right\rvert_p \le \max \left\{ \left\lvert x \right\rvert_p, \left\lvert y \right\rvert_p \right\}.

Equality holds if (but not only if) {\left\lvert x \right\rvert_p \neq \left\lvert y \right\rvert_p}.

However, {\mathbb Q_p} is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in {\mathbb R} or {\mathbb C}: basically, any notion such as “continuous function”, “convergent series”, et cetera has a {p}-adic analog. In particular, we can define what it means for an infinite sum to converge:

Definition 14 (Convergence notions)

Here are some examples of {p}-adic analogs of “real-world” notions.

  • A sequence {s_1}, \dots converges to a limit {L} if {\lim_{n \rightarrow \infty} \left\lvert s_n - L \right\rvert_p = 0}.
  • The infinite series {\sum_k x_k} converges if the sequence of partial sums {s_1 = x_1}, {s_2 = x_1 + x_2}, \dots, converges to some limit.
  • \dots et cetera \dots

With this definition in place, the “base {p}” discussion we had earlier is now true in the analytic sense: if {x = \overline{\dots a_2 a_1 a_0}_p \in \mathbb Z_p} then

\displaystyle \sum_{k=0}^\infty a_k p^k \quad\text{converges to } x.

Indeed, the {n}th partial sum is divisible by {p^n}, hence the partial sums approach {x} as {n \rightarrow \infty}.

While the definitions are all the same, there are some changes in properties that should be true. For example, in {\mathbb Q_p} convergence of partial sums is simpler:

Proposition 15 ({|x_k|_p \rightarrow 0} iff convergence of series)

A series {\sum_{k=1}^\infty x_k} in {\mathbb Q_p} converges to some limit if and only if {\lim_{k \rightarrow \infty} |x_k|_p = 0}.

Contrast this with {\sum \frac1n = \infty} in {\mathbb R}. You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of {p}, we may assume {x_k \in \mathbb Z_p}. (This isn’t actually necessary, but makes the notation nicer.)

Observe that {x_k \pmod p} must eventually stabilize, since for large enough {n} we have {\left\lvert x_n \right\rvert_p < 1 \iff \nu_p(x_n) \ge 1}. So let {a_1} be the eventual residue modulo {p} of {\sum_{k=0}^N x_k \pmod p} for large {N}. In the same way let {a_2} be the eventual residue modulo {p^2}, and so on. Then one can check we approach the limit {a = (a_1, a_2, \dots)}. \Box

Here’s a couple exercises to get you used to thinking of {\mathbb Z_p} and {\mathbb Q_p} as metric spaces.

Exercise 16 ({\mathbb Z_p} is compact)

Show that {\mathbb Q_p} is not compact, but {\mathbb Z_p} is. (For the latter, I recommend using sequential continuity.)

Exercise 17 (Totally disconnected)

Show that both {\mathbb Z_p} and {\mathbb Q_p} are totally disconnected: there are no connected sets other than the empty set and singleton sets.

3.3. More fun with geometric series

While we’re at it, let’s finally state the {p}-adic analog of the geometric series formula.

Proposition 18 (Geometric series)

Let {x \in \mathbb Z_p} with {\left\lvert x \right\rvert_p < 1}. Then

\displaystyle \frac{1}{1-x} = 1 + x + x^2 + x^3 + \dots.

Proof: Note that the partial sums satisfy {1 + x + x^2 + \dots + x^n = \frac{1-x^n}{1-x}}, and {x^n \rightarrow 0} as {n \rightarrow \infty} since {\left\lvert x \right\rvert_p < 1}. \Box

So, {1 + 3 + 3^2 + \dots = -\frac{1}{2}} is really a correct convergence in {\mathbb Z_3}. And so on.

If you buy the analogy that {\mathbb Z_p} is generating functions with base {p}, then all the olympiad generating functions you might be used to have {p}-adic analogs. For example, you can prove more generally that:

Theorem 19 (Generalized binomial theorem)

If {x \in \mathbb Z_p} and {\left\lvert x \right\rvert_p < 1}, then for any {r \in \mathbb Q} we have the series convergence

\displaystyle \sum_{n \ge 0} \binom rn x^n = (1+x)^r.

(I haven’t defined {(1+x)^r}, but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of {p}. I won’t elaborate too much on this now, since {p}-adic exponentiation will be described in much more detail in the next post.

3.4. Completeness

Note that the definition of {\left\lvert \bullet \right\rvert_p} could have been given for {\mathbb Q} as well; we didn’t need {\mathbb Q_p} to introduce it (after all, we have {\nu_p} in olympiads already). The big important theorem I must state now is:

Theorem 20 ({\mathbb Q_p} is complete)

The space {\mathbb Q_p} is the completion of {\mathbb Q} with respect to {\left\lvert \bullet \right\rvert_p}.

This is the definition of {\mathbb Q_p} you’ll see more frequently; one then defines {\mathbb Z_p} in terms of {\mathbb Q_p} (rather than vice-versa) according to

\displaystyle \mathbb Z_p = \left\{ x \in \mathbb Q_p : \left\lvert x \right\rvert_p \le 1 \right\}.

(Remark for experts: {\mathbb Q_p} is a field with {\nu_p} a non-Arcihmedian valuation; then {\mathbb Z_p} is its valuation ring.)

Let me justify why this definition is philosophically nice.

Suppose you are a numerical analyst and you want to estimate the value of the sum

\displaystyle S = \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{10000^2}

to within {0.001}. The sum {S} consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series {\sum_{n \ge 1} n^{-2} = \frac16 \pi^2}, and deduce that {S \approx \frac{\pi^2}{6}}, up to some small error term from the terms past {\frac{1}{10001^2}}, which can be bounded.

Of course, in order to have access to enough theory to prove that {S = \pi^2/6}, you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space {\mathbb Q}, where e.g. the sequence {1}, {1.4}, {1.41}, {1.414}, \dots is considered “not convergent” because {\sqrt2 \notin \mathbb Q}. Instead, all analysis is done in the completion of {\mathbb Q}, namely {\mathbb R}.

Now suppose you are an olympiad contestant and want to estimate the sum

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2}

to within mod {p^3} (i.e. to within {p^{-3}} in {\left\lvert \bullet \right\rvert_p}). Even though {f_p(x)} is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod {p^3}). But the space {\mathbb Q} is not complete with respect to {\left\lvert \bullet \right\rvert_p} either, and thus it makes sense to work in the completion of {\mathbb Q} with respect to {\left\lvert \bullet \right\rvert_p}. This is exactly {\mathbb Q_p}.

4. Solving USA TST 2002/2

Let’s finally solve Example~1, which asks to compute

\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}.

Armed with the generalized binomial theorem, this becomes straightforward.

\displaystyle \begin{aligned} f_p(x) &= \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} = \sum_{k=1}^{p-1} \frac{1}{k^2} \left( 1 + \frac{px}{k} \right)^{-2} \\ &= \sum_{k=1}^{p-1} \frac{1}{k^2} \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^{n} \\ &= \sum_{n \ge 0} \binom{-2}{n} \sum_{k=1}^{p-1} \frac{1}{k^2} \left( \frac{x}{k} \right)^{n} p^n \\ &\equiv \sum_{k=1}^{p-1} \frac{1}{k^2} - 2x \left( \sum_{k=1}^{p-1} \frac{1}{k^3} \right) p + 3x^2 \left( \sum_{k=1}^{p-1} \frac{1}{k^4} \right) p^2 \pmod{p^3}. \end{aligned}

Using the elementary facts that {p^2 \mid \sum_k k^{-3}} and {p \mid \sum k^{-4}}, this solves the problem.

 

New oly handout: Constructing Diagrams

I’ve added a new Euclidean geometry handout, Constructing Diagrams, to my webpage.

Some of the stuff covered in this handout:

  • Advice for constructing the triangle centers (hint: circumcenter goes first)
  • An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
  • Some mechanical suggestions such as dealing with phantom points
  • Some examples of computer-generated figures

Enjoy.

Make training non zero-sum

Some thoughts about some modern trends in mathematical olympiads that may be concerning.

I. The story of the barycentric coordinates

I worry about my geometry book. To explain why, let me tell you a story.

When I was in high school about six years ago, barycentric coordinates were nearly unknown as an olympiad technique. I only heard about it from whispers in the wind from friends who had heard of the technique and thought it might be usable. But at the time, there were nowhere where everything was written down explicitly. I had a handful of formulas online, a few helpful friends I can reach out to, and a couple example posts littered across some forums.

Seduced by the possibility of arcane power, I didn’t let this stop me. Over the spring of 2012, spring break settled in, and I spent that entire week developing the entire theory of barycentric coordinates from scratch. There were no proofs I could find online, so I had to personally reconstruct all of them. In addition, I set out to finding as many example problems as I could, but since no one had written barycentric solutions yet, I had to not only identify which problems like they might be good examples but also solve them myself to see if my guesses were correct. I even managed to prove a “new” theorem about perpendicular displacement vectors (which I did not get to name after myself).

I continued working all the way up through the summer, adding several new problems that came my way from MOP 2012. Finally, I posted a rough article with all my notes, examples, and proofs, which you can still find online. I still remember this as a sort of magnus opus from the first half of high school; it was an immensely rewarding learning experience.

Today, all this and much more can be yours for just $60, with any major credit or debit card.


Alas, my geometry book is just one example of ways in which the math contest scene is looking more and more like an industry. Over the years, more and more programs dedicated to training for competitions are springing up, and these programs can be quite costly. I myself run a training program now, which is even more expensive (in my defense, it’s one-on-one teaching, rather than a residential camp or group lesson).

It’s possible to imagine a situation in which the contest problems become more and more routine. In that world, math contests become an arms race. It becomes mandatory to have training in increasingly obscure techniques: everything from Popoviciu to Vieta jumping to rectangular circumhyperbolas. Students from less well-off families, or even countries without access to competition resources, become unable to compete, and are pushed to the bottom of the IMO scoreboard.

(Fortunately for me, I found out at the 2017 IMO that my geometry book actually helped level the international playing field, contrary to my initial expectations. It’s unfortunate that it’s not free, but it turned out that many students in other countries had until then found it nearly impossible to find suitable geometry materials. So now many more people have access to a reasonable geometry reference, rather than just the top countries with well-established training.)

II. Another dark future

The first approximation you might have now is that training is bad. But I think that’s the wrong conclusion, since, well, I have an entire previous post dedicated to explaining what I perceive as the benefits of the math contest experience. So I think the conclusion is not that training is intrinsically bad, but rather than training must be meaningful. That is, the students have to gain something from the experience that’s not just a +7 bonus on their next olympiad contest.

I think the message “training is bad” might be even more dangerous.

Imagine that the fashion swings the other way. The IMO jury become alarmed at the trend of train-able problems, and in response, the problems become designed specifically to antagonize trained students. The entire Geometry section of the IMO shortlist ceases to exist, because some Asian kid wrote this book that gives you too much of an advantage if you’ve read it, and besides who does geometry after high school anyways? The IMO 2014 used to be notable for having three combinatorics problems, but by 2040 the norm is to have four or five, because everyone knows combinatorics is harder to train for.

Gradually, the IMO is redesigned to become an IQ test.

The changes then begin to permeate down. The USAMO committee is overthrown, and USAMO 2050 features six linguistics questions “so that we can find out who can actually think”. Math contests as a whole become a system for identifying the best genetic talent, explicitly aimed at weeding out the students who have “just been trained”. It doesn’t matter how hard you’ve worked; we want “creativity”.

This might be great at identifying the best mathematicians each generation, but I think an IMO of this shape would be actively destructive towards the contestants and community as well. You thought math contests were bad because they’re discouraging to the kids who don’t win? What if they become redesigned to make sure that you can’t improve your score no matter how hard you work?

III. Now

What this means is that we have a balancing act to maintain. We do not want to eliminate the role of training entirely, because the whole point of math contests is to have a learning experience that lasts longer than the two-day contest every year. But at the same time, we need to ensure the training is interesting, that it is deep and teaches skills like the ones I described before.

Paying $60 to buy a 300-page PDF is not meaningful. But spending many hours to work through the problems in that PDF might be.

In many ways this is not a novel idea. If I am trying to teach a student, and I give them a problem which is too easy, they will not learn anything from it. Conversely, if I give them a problem which is too difficult, they will get discouraged and are unlikely to learn much from their trouble. The situation with olympiad training feels the same.

This applies to the way I think about my teaching as well. I am always upset when I hear (as I have) things like “X only did well on USAMO because of Evan Chen’s class”. If that is true, then all I am doing is taking money as input and changing the results of a zero-sum game as output, which is in my opinion rather pointless (and maybe unethical).

But I really think that’s not what’s happening. Maybe I’m a good teacher, but at the end of the day I am just a guide. If my students do well, or even if they don’t do well, it is because they spent many hours on the challenges that I designed, and have learned a lot from the whole experience. The credit for any success thus lies solely through the student’s effort. And that experience, I think, is certainly not zero-sum.

I switched to point-based problem sets

It’s not uncommon for technical books to include an admonition from the author that readers must do the exercises and problems. I always feel a little peculiar when I read such warnings. Will something bad happen to me if I don’t do the exercises and problems? Of course not. I’ll gain some time, but at the expense of depth of understanding. Sometimes that’s worth it. Sometimes it’s not.

— Michael Nielsen, Neural Networks and Deep Learning

1. Synopsis

I spent the first few days of my recent winter vacation transitioning all the problem sets for my students from a “traditional” format to a “point-based” format. Here’s a before and after.

Technical specification:

  • The traditional problem sets used to consist of a list of 6-9 olympiad problems of varying difficulty, for which you were expected to solve all problems over the course of two weeks.
  • The new point-based problem sets consist of 10-15 olympiad problems, each weighted either 2, 3, 5, or 9 points, and an explicit target goal for that problem set. There’s a spectrum of how many of the problems you need to solve depending on the topic and the version (I have multiple difficulty versions of many sets), but as a rough estimate the goal is maybe 60%-75% of the total possible points on the problem set. Usually, on each problem set there are 2-4 problems which I think are especially nice or important, and I signal this by coloring the problem weight in red.

In this post I want to talk a little bit about what motivated this change.

2. The old days

I guess for historical context I’ll start by talking about why I used to have a traditional format, although I’m mildly embarrassed at now, in hindsight.

When I first started out with designing my materials, I was actually basically always short on problems. Once you really get into designing olympiad materials, good problems begin to feel like tangible goods. Most problems I put on a handout are ones I’ve done personally, because otherwise, how are you supposed to know what the problem is like? This means I have to actually solve the problem, type up solution notes, and then decide how hard it is and what that problem teaches. This might take anywhere from 30 minutes to the entire afternoon, per problem. Now imagine you need 150 such problems to run a year’s curriculum, and you can see why the first year was so stressful. (I was very fortunate to have paid much of this cost in high school; I still remember many of the problems I did back as a student.)

So it seemed like a waste if I spent a lot of time vetting a problem and then my students didn’t do it, and as practical matter I didn’t have enough materials yet to have much leeway anyways. I told myself this would be fine: after all, if you couldn’t do a problem, all you had to do was tell me what you’ve tried, and then I’d walk you through the rest of it. So there’s no reason why you couldn’t finish the problem sets, right? (Ha. Ha. Ha.)

Now my problem bank has gotten much deeper, so I don’t have that excuse anymore. [1]

3. Agonizing over problem eight

But I’ll tell you now that even before I decided to switch to points, one of the biggest headaches was always whether to add in that an eighth problem that was really nice but also difficult. (When I first started teaching, my problem sets were typically seven problems long.) If you looked at the TeX source for some of my old handouts, you’ll see lots of problems commented out with a line saying “too long already”.

Teaching OTIS made me appreciate the amount of power I have on the other side of a mentor-student relationship. Basically, when I design a problem set, I am making decisions on behalf of the student: “these are the problems that I think you should work on”. Since my kids are all great students that respect me a lot, they will basically do whatever I tell them to.

That means I used to spend many hours agonizing over that eighth problem or whether to punt it. Yes, they’ll learn a lot if they solve (or don’t solve) it, but it will also take them another two or three hours on top of everything else they’re already doing (OTIS, school, trumpet, track, dance, social, blah blah blah). Is it worth those extra hours? Is it not? I’ve lost sleep over whether I made the right choice on the nights I ended up adding that last hard problem.

But in hindsight the right answer all along was to just let the students decide for themselves, because unlike your average high-school math teacher in a room of decked-out slackers, I have the best students in the world.

4. The morning I changed my mind

As I got a deeper database this year and commented more problems out, I started thinking about point-based problem sets. But I can tell you the exact moment when I decided to switch.

On the morning of Sunday November 5, I had a traditional problem set on my desk next to a point-based one. In both cases I had figured out how to do about half the problems required. I noticed that the way the half-full glass of water looked was quite different between them. In the first case, I was freaking out about the other half of the problems I hadn’t solved yet. In the second case, I was trying to decide which of the problems would be the most fun to do next.

Then I realized that OTIS was running on the traditional system, and what I had been doing to my students all semester! So instead of doing either problem set I began the first prototypes of the points system.

5. Count up

I’m worried I’ll get misinterpreted as arguing that students shouldn’t work hard. This is not really the point. If you read the specification at the beginning carefully, the number of problems the students are solving is actually roughly the same in both systems.

It might be more psychological than anything else: I want my kids to count how many problems they’ve solved, not how many problems they haven’t solved. Every problem you solve makes you better. Every problem you try and don’t solve makes you better, too. But a problem you didn’t have time to try doesn’t make you worse.

I’ll admit to being mildly pissed off at high school for having built this particular mindset into all my kids. The straight-A students sitting in calculus BC aren’t counting how many questions they’ve answered correctly when checking grades. They’re counting how many points they lost. The implicit message is that if you don’t do nearly all the questions, you’re a bad person because you didn’t try hard enough and you won’t learn anything this way and shame on you and…

That can’t possibly be correct. Imagine two calculus teachers A and B using the same textbook. Teacher A assigns 15 questions of homework a week, teacher B assigns 25 questions. All of teacher A’s students are failing by B’s standards. Fortunately, that’s not actually how the world works.

For this reason I’m glad that all the olympiad kids report their performance as “I solved problems 1,2,4,5” rather than “I missed problems 3,6”.

6. There are no stupid or lazy questions

The other wrong assumption I had about traditional problem sets was the bit about asking for help on problems you can’t solve. It turns out getting students to ask for help is a struggle. So one other hope is that with the point-based system is that if a student tries a problem, can’t solve it, and is too shy to ask, then they can switch to a different problem and read the solution later on. No need to get me involved with every single missed problem any more.

But anyways I have a hypothesis why asking for help seems so hard (though there are probably other reasons too).

You’ve all heard the teachers who remind students to always ask questions during lectures [2], because it means someone else has the same question. In other words: don’t be afraid to ask questions just because you’re afraid you’ll look dumb, because “there are no stupid questions“.

But I’ve rarely heard anyone say the same thing about problem sets.

As I’m writing this, I realize that this is actually the reason I’ve never been willing to go to office hours to ask my math professors for help on homework problems I’m stuck on. It’s not because I’m worried my professors will think I’m dumb. It’s because I’m worried they’ll think I didn’t try hard enough before I gave up and came to them for help, or even worse, that I just care about my grade. You’ve all heard the freshman biology TA’s complain about those kids that just come and ask them to check all their pset answers one by one, or that come to argue about points they got docked, or what-have-you. I didn’t want to be that guy.

Maybe this shaming is intentional if the class you’re teaching is full of slackers that don’t work unless you crack the whip. [3] But if you are teaching a math class that’s half MOPpers, I seriously don’t think we need guilt-trips for these kids whenever they can’t solve a USAMO3.

So for all my students, here’s my version of the message: there are no stupid questions, and there are no lazy questions.

Footnotes

  1. The other reason I used traditional problem sets at first was that I wanted to force the students to at least try the harder problems. This is actually my main remaining concern about switching to point-based problem sets: you could in principle always ignore the 9-point problems at the end. I tried to compensate for this by either marking some 9’s in red, or else making it difficult to reach the goal without solving at least one 9. I’m not sure this is enough.
  2. But if my question is “I zoned out for the last five minutes because I was responding to my friends on snapchat, what just happened?”, I don’t think most professors would take too kindly. So it’s not true literally all questions are welcome in lectures.
  3. As an example, the 3.091 class policies document includes FAQ such as “that sounds like a lot of work, is there a shortcut?”, “but what do I need to learn to pass the tests?”, and “but I just want to pass the tests…”. Also an entire paragraph explaining why skipping the final exam makes you a terrible person, including reasons such as “how do you anything is how you do everything”, “students earning A’s are invited to apply as tutors/graders”, and “in college it’s up to you to take responsibility for your academic career”, and so on ad nauseum.

Revisiting arc midpoints in complex numbers

1. Synopsis

One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.

The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.

Theorem 1 (Arc midpoint setup for a triangle)

Let {ABC} be a triangle with circumcircle {\Gamma} and let {M_A}, {M_B}, {M_C} denote the arc midpoints of {\widehat{BC}} opposite {A}, {\widehat{CA}} opposite {B}, {\widehat{AB}} opposite {C}.

Suppose we view {\Gamma} as the unit circle in the complex plane. Then there exist complex numbers {x}, {y}, {z} such that {A = x^2}, {B = y^2}, {C = z^2}, and

\displaystyle M_A = -yz, \quad M_B = -zx, \quad M_C = -xy.

Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates {-(xy+yz+zx)} in the above notation.

Lemma 2 (The incenter is the orthocenter of opposite arc midpoints)

Let {ABC} be a triangle with circumcircle {\Gamma} and let {M_A}, {M_B}, {M_C} denote the arc midpoints of {\widehat{BC}} opposite {A}, {\widehat{CA}} opposite {B}, {\widehat{AB}} opposite {C}. Then the incenter of {\triangle ABC} coincides with the orthocenter of {\triangle M_A M_B M_C}.

Unfortunately, the proof of Theorem 1 in my textbook is wrong, and I cannot find a proof online (though I hear that Lemmas in Olympiad Geometry has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.

2. A Word of Warning

I should at once warn the reader that Theorem 1 is an existence result, and thus must be applied carefully.

To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.

Example 3 (JMO 2016, by Zuming feng)

The isosceles triangle {\triangle ABC}, with {AB=AC}, is inscribed in the circle {\omega}. Let {P} be a variable point on the arc {BC} that does not contain {A}, and let {I_B} and {I_C} denote the incenters of triangles {\triangle ABP} and {\triangle ACP}, respectively. Prove that as {P} varies, the circumcircle of triangle {\triangle PI_{B}I_{C}} passes through a fixed point.

By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc {\widehat{BC}}, as seen in the figure below. One might be tempted to write {A = x^2}, {B = y^2}, {C = z^2}, {P = t^2} and assert the two incenters are {-(xy+yt+xt)} and {-(xz+zt+xt)}, and that the fixed point is {-yz}.

This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices {A} and {P} may not be compatible. In fact, they cannot be compatible, because the arc midpoint of {\widehat{AP}} opposite {B} is different from the arc midpoint of {\widehat{AP}} opposite {C}.

In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of {\triangle P I_B I_C} passes through the midpoint of arc {\widehat{BC}} is false if {P} lies on the arc on the same side as {A}! In that case it actually passes through {A} instead. Thus the truth of the problem really depends on the fact that the quadrilateral {ABPC} is convex, and any attempt with complex numbers must take this into account to have a chance of working.

3. Proof of the theorem for triangles

Fix {ABC} now, so we require {A = x^2}, {B = y^2}, {C = z^2}. There are {2^3 = 8} choices of square roots {x}, {y}, {z} we can take (differing by a sign); we wish to show one of them works.

We pick an arbitrary choice for {x} first. Then, of the two choices of {y}, we pick the one such that {-xy = M_C}. Similarly, for the two choices of {z}, we pick the one such that {-xz = M_B}. Our goal is to show that under these conditions, we have {M_A = -yz} again.

The main trick is to now consider the arc midpoint {\widehat{BAC}}, which we denote by {L}. It is easy to see that:

Lemma 4 (The isosceles trapezoid trick)

We have {\overline{AL} \parallel \overline{M_B M_C}} (both are perpendicular to the {\angle A} bisector). Thus {A L M_B M_C} is an isosceles trapezoid, and so { A \cdot L = M_B \cdot M_C }.

Thus, we have

\displaystyle L = \frac{M_B M_C}{A} = \frac{(-xz)(-xy)}{x^2} = +yz.

Thus

\displaystyle M_A = -L = -yz

as desired.

From this we can see why the minus signs are necessary.

Exercise 5

Show that Theorem 1 becomes false if we try to use {+yz}, {+zx}, {+xy} instead of {-yz}, {-zx}, {-xy}.

4. A version for quadrilaterals

We now return to the setting of a convex quadrilateral {ABPC} that we encountered in Example 3. Suppose we preserve the variables {x}, {y}, {z} that we were given from Theorem 1, but now add a fourth complex number {t} with {P = t^2}. How are the new arc midpoints determined? The following theorem answers this question.

Theorem 6 ({xytz} setup)

Let {ABPC} be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers {x}, {y}, {z}, {t} such that {A = x^2}, {B = y^2}, {C = z^2}, {P = t^2} and:

  • The opposite arc midpoints {M_A}, {M_B}, {M_C} of triangle {ABC} are given by {-yz}, {-zx}, {-xy}, as before.
  • The midpoint of arc {\widehat{BP}} not including {A} or {C} is given by {+yt}.
  • The midpoint of arc {\widehat{CP}} not including {A} or {B} is given by {-zt}.
  • The midpoint of arc {\widehat{ABP}} is {-xt} and the midpoint of arc {\widehat{ACP}} is {+xt}.

This setup is summarized in the following figure.

Note that unlike Theorem 1, the four arcs cut out by the sides of {ABCP} do not all have the same sign (I chose {\widehat{BP}} to have coordinates {+yt}). This asymmetry is inevitable (see if you can understand why from the proof below).

Proof: We select {x}, {y}, {z} with Theorem 1. Now, pick a choice of {t} such that {+yt} is the arc midpoint of {\widehat{BP}} not containing {A} and {C}. Then the arc midpoint of {\widehat{CP}} not containing {A} or {B} is given by

\displaystyle \frac{z^2}{-yz} \cdot (+yt) = -zt.

On the other hand, the calculation of {-xt} for the midpoint of {\widehat{ABP}} follows by applying Lemma 4 again. (applied to triangle {ABP}). The midpoint of {\widehat{ACP}} is computed similarly. \Box

In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled {ABCD} in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.

Corollary 7

Let {ABCD} be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers {a}, {b}, {c}, {d} such that {A = a^2}, {B = b^2}, {C = c^2}, {D = d^2} and:

  • The midpoints of {\widehat{AB}}, {\widehat{BC}}, {\widehat{CD}}, {\widehat{DA}} cut out by the sides of {ABCD} are {-ab}, {-bc}, {-cd}, {+da}.
  • The midpoints of {\widehat{ABC}} and {\widehat{BCD}} are {+ac} and {+bd}.
  • The midpoints of {\widehat{CDA}} and {\widehat{DAB}} are {-ac} and {-bd}.

To test the newfound theorem, here is a cute easy application.

Example 8 (Japanese theorem for cyclic quadrilaterals)

In a cyclic quadrilateral {ABCD}, the incenters of {\triangle ABC}, {\triangle BCD}, {\triangle CDA}, {\triangle DAB} are the vertices of a rectangle.

An apology for HMMT 2016

Median Putnam contestants, willing to devote one of the last Saturdays before final exams to a math test, are likely to receive an advanced degree in the sciences. It is counterproductive on many levels to leave them feeling like total idiots.

— Bruce Reznick, “Some Thoughts on Writing for the Putnam”

Last February I made a big public apology for having caused one of the biggest scoring errors in HMMT history, causing a lot of changes to the list of top individual students. Pleasantly, I got some nice emails from coaches who reminded me that most students and teams do not place highly in the tournament, and at the end of the day the most important thing is that the contestants enjoyed the tournament.

So now I decided I have to apologize for 2016, too.

The story this time is that I inadvertently sent over 100 students home having solved two or fewer problems total, out of 30 individual problems. That year, I was the problem czar for HMMT February 2016, and like many HMMT problem czars before me, had vastly underestimated the difficulty of my own problems.

I think stories like this are a lot worse than people realize; contests are supposed to be a learning experience for the students, and if a teenager shows up to Massachusetts and spends an entire Saturday feeling hopeless for the entire contest, then the flight back to California is going to feel very long. Now imagine having 100 students go through this every single February.

So today I’d like to say a bit about things I’ve picked up since then that have helped me avoid making similar mistakes. I actually think people generally realize that HMMT is too hard, but are wrong about how this should be fixed. In particular, I think the common approach (and the one I took) of “make problem 1 so easy that almost nobody gets a zero” is wrong, and I’ll explain here what I think should be done instead.

1. Gettable, not gimme

I think just “easy” is the wrong way to think about the beginning problems. At ARML, the problem authors use a finer distinction which I really like:

  • A problem is gettable if nearly every contestant feels like they could have gotten the problem on a good day. (In particular, problems that require knowledege that not all contestants have are not gettable, even if they are easy with it.)
  • A problem is a gimme if nearly every contestant actually solves the problem on the contest.

The consensus is always that the early problems should be gettable but not gimme’s. You could start every contest by asking the contestant to compute the expected value of 7, but the contestants are going to notice, and it isn’t going to help anyone.

(I guess I should make the point that in order for a problem to be a “gimme”, it would have to be so easy to be almost insulting, because high accuracy on a given problem is really only possible if the level of the problem is significantly below the level of the student. So a gimme would have to be a problem that is way easier than the level of the weakest contestant — you can see why these would be bad.)

In contrast, with a gettable problem, even though some of the contestants will miss it, they’ll often miss it for a reason like 2+3=6. This is a bit unfortunate, but it is still a lot better if the contestant goes home thinking “I made a small arithmetic error, so I have to be more careful” than “there’s no way I could have gotten this, it was hopeless”.

But that brings to me to the next point:

2. At the IMO 33% of the problems are gettable

At the IMO, there are two easy problems (one each day), but there are only six problems. So a full one-third of the problems are gettable: we hope that most students attending the IMO can solve either IMO1 or IMO4, even though many will not solve both.

If you are writing HMMT or some similar contest, I think this means you should think about the opening in terms of the fraction 1/3, rather than problem 1. For example, at HMMT, I think the czars should strive instead to make the first three or four out of ten problems on each individual test gettable: they should be problems every contestant could solve, even though some of them will still miss it anyways. Under the pressure of contest, students are going to make all sorts of mistakes, and so it’s important that there are multiple gettable problems. This way, every student has two or three or four real chances to solve a problem: they’ll still miss a few, but at least they feel like they could do something.

(Every year at HMMT, when we look back at the tests in hindsight, the first reflex many czars have is to look at how many people got 0’s on each test, and hope that it’s not too many. The fact that this figure is even worth looking at is in my opinion a sign that we are doing things wrong: is 1/10 any better than 0/10, if the kid solved question 1 quickly and then spent the rest of the hour staring at the other nine?)

3. Watch the clock

The other thing I want to say is to spend some time thinking about the entire test as a whole, rather than about each problem individually.

To drive the point: I’m willing to bet that an HMMT individual test with 4 easy, 6 medium, and 0 hard problems could actually work, even at the top end of the scores. Each medium problem in isolation won’t distinguish the strongest students. But put six of them all together, and you get two effects:

  • Students will make mistakes on some of the problems, and by central limit theorem you’ll get a curve anyways.
  • Time pressure becomes significantly more important, and the strongest students will come out ahead by simply being faster.

Of course, I’ll never be able to persuade the problem czars (myself included) to not include at least one or two of those super-nice hard problems. But the point is that they’re not actually needed in situations like HMMT, when there are so many problems that it’s hard to not get a curve of scores.

One suggestion many people won’t take: if you really want to include some difficulty problems that will take a while, decrease the length of the test. If you had 3 easy, 3 medium, and 1 hard problem, I bet that could work too. One hour is really not very much time.

Actually, this has been experimentally verified. On my HMMT 2016 Geometry test, nobody solved any of problems 8-10, so the test was essentially seven problems long. The gradient of scores at the top and center still ended up being okay. The only issue was that a third of the students solved zero problems, because the easy problems were either error-prone, or else were hit-or-miss (either solved quickly or not at all). Thus that’s another thing to watch out for.