This post introduces the -adic integers , and the -adic numbers . The one-sentence description is that these are “integers/rationals carrying full mod information” (and only that information).

The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.

In this whole post, is always a prime. Much of this is based off of Chapter 6B from *Straight from the Book*.

Before really telling you what and are, let me tell you what you might expect them to do.

In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:

- Taking modulo a prime or prime , and
- Looking at the exponent .

Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo to gain something else to work with. Unfortunately, taking modulo loses some information: (the reduction is far from injective).

If we want finer control, we could consider instead taking modulo , rather than taking modulo . This can also give some new information (cubes modulo , anyone?), but it has the disadvantage that isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo .

One of the goals of -adic numbers is that we can get around these two issues I described. The -adic numbers we introduce is going to have the following properties:

- You can “take modulo for all at once”.
In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with -adic numbers, we won’t have to make that choice. An equation of -adic numbers carries enough information to take modulo .

- The numbers form a field, the nicest possible algebraic structure: makes sense. Contrast this with , which is not even an integral domain.
- It doesn’t lose as much information as taking modulo does: rather than the surjective we have an
*injective*map . - Despite this, you “ignore” some “irrelevant” data. Just like taking modulo , you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things.
Let me draw an analogy. We know that the equation

has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime , because once you take modulo you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in -adics: the above equation has a solution in for every prime .

So, you can think of -adic numbers as the right tool to use if you only really care about modulo information, but normal isn’t quite powerful enough.

To be more concrete, I’ll give a poster example now:

Here is a problem where we *clearly* only care about -type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod (try it!). The basic issue is that there is no good way to deal with the denominators modulo (in part is not even an integral domain).

However, with -adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity

Such an identity makes no sense over or for converge reasons, but it will work fine over the , which is all we need.

We now construct and . I promised earlier that a -adic integer will let you look at “all residues modulo ” at once. This definition will formalize this.

**Definition 2** **(Introducing )**

A **-adic integer** is a sequence

of residues modulo for each integer , satisfying the compatibility relations for .

The set of -adic integers forms a ring under component-wise addition and multiplication.

**Example 3** **(Some -adic integers)**

Let . Every usual integer generates a (compatible) sequence of residues modulo for each , so we can view each ordinary integer as -adic one:

On the other hand, there are sequences of residues which do not correspond to any usual integer despite satisfying compatibility relations, such as

which can be thought of as .

In this way we get an injective map

which is not surjective. So there are more -adic integers than usual integers.

(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as

where the inverse limit is taken across .)

**Exercise 4**

Check that is an integral domain.

Here is another way to think about -adic integers using “base ”. As in the example earlier, every usual integer can be written in base , for example

More generally, given any , we can write down a “base ” expansion in the sense that there are exactly choices of given . Continuing the example earlier, we would write

and in general we can write

where , such that the equation holds modulo for each . Note the expansion is infinite to the *left*, which is different from what you’re used to.

(Amusingly, negative integers also have infinite base expansions: , corresponding to .)

Thus you may often hear the advertisement that a -adic integer is an “possibly infinite base expansion”. This is correct, but later on we’ll be thinking of in a more and more “analytic” way, and so I prefer to think of this as a **“Taylor series with base ”**. Indeed, much of your intuition from generating functions (where is a field) will carry over to .

Here is one way in which your intuition from generating functions carries over:

**Proposition 5** **(Non-multiples of are all invertible)**

The number is invertible if and only if . In symbols,

Contrast this with the corresponding statement for : a generating function is invertible iff .

*Proof:* If then , so clearly not invertible. Otherwise, for all , so we can take an inverse modulo , with . As the are themselves compatible, the element is an inverse.

**Example 6** **(We have )**

We claim the earlier example is actually

Indeed, multiplying it by gives

(Compare this with the “geometric series” . We’ll actually be able to formalize this later, but not yet.)

**Remark 7** **( is an integer for )**

The earlier proposition implies that (among other things); your intuition about what is an “integer” is different here! In olympiad terms, we already knew made sense, which is why calling an “integer” in the -adics is correct, even though it doesn’t correspond to any element of .

Fun (but trickier) exercise: rational numbers correspond exactly to eventually periodic base expansions.

With this observation, here is now the definition of .

**Definition 8** **(Introducing )**

Since is an integral domain, we let denote its field of fractions. These are the **-adic numbers**.

Continuing our generating functions analogy:

This means is “**Laurent series with base **”, and in particular according to the earlier proposition we deduce:

**Proposition 9** **( looks like formal Laurent series)**

Every nonzero element of is uniquely of the form

Thus, continuing our base analogy, elements of are in bijection with “Laurent series”

for . So the base representations of elements of can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).

(Fair warning: the field has characteristic *zero*, not .)

**Remark 10** **(Warning on fraction field)**

This result implies that you shouldn’t think about elements of as (for ) in practice, even though this is the official definition (and what you’d expect from the name ). The only denominators you need are powers of .

To keep pushing the formal Laurent series analogy, is usually not thought of as quotient of generating functions but rather as “formal series with some negative exponents”. You should apply the same intuition on .

(At this point I want to make a remark about the fact , connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo , but if you do the quantity doesn’t make sense unless you know . You can’t fix this by just taking modulo since then you need to get , ad infinitum. You can work around issues like this, but the nice feature of and is that you have modulo information for “all at once”: the information of packages all the modulo information simultaneously. So you can divide by with no repercussions.)

Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for -adic purposes we only care about modulo information. So, we’ll instead think of two elements of or as “close” if they differ by a large multiple of .

For this we’ll borrow the familiar from elementary number theory.

**Definition 11** **(-adic valuation and absolute value)**

We define the **-adic valuation** in the following two equivalent ways:

- For we let be the largest such that (or if ). Then extend to all of by .
- Each can be written uniquely as for , . We let .

By convention we set . Finally, define the **-adic absolute value** by

In particular .

This fulfills the promise that and are close if they look the same modulo for large ; in that case is large and accordingly is small.

In this way, and becomes a metric space with metric given by .

In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from .

**Proposition 13** **( is an ultrametric)**

For any , we have the **strong triangle inequality**

Equality holds if (but not only if) .

However, is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in or : basically, any notion such as “continuous function”, “convergent series”, et cetera has a -adic analog. In particular, we can define what it means for an infinite sum to converge:

**Definition 14** **(Convergence notions)**

Here are some examples of -adic analogs of “real-world” notions.

- A sequence , \dots converges to a limit if .
- The infinite series converges if the sequence of partial sums , , \dots, converges to some limit.
- \dots et cetera \dots

With this definition in place, the “base ” discussion we had earlier is now true in the analytic sense: if then

Indeed, the th partial sum is divisible by , hence the partial sums approach as .

While the definitions are all the same, there are some changes in properties that should be true. For example, in convergence of partial sums is simpler:

Contrast this with in . You can think of this as a consequence of strong triangle inequality. *Proof:* By multiplying by a large enough power of , we may assume . (This isn’t actually necessary, but makes the notation nicer.)

Observe that must eventually stabilize, since for large enough we have . So let be the eventual residue modulo of for large . In the same way let be the eventual residue modulo , and so on. Then one can check we approach the limit .

Here’s a couple exercises to get you used to thinking of and as metric spaces.

**Exercise 16** **( is compact)**

Show that is not compact, but is. (For the latter, I recommend using sequential continuity.)

**Exercise 17** **(Totally disconnected)**

Show that both and are *totally disconnected*: there are no connected sets other than the empty set and singleton sets.

While we’re at it, let’s finally state the -adic analog of the geometric series formula.

**Proposition 18** **(Geometric series)**

Let with . Then

*Proof:* Note that the partial sums satisfy , and as since .

So, is really a correct convergence in . And so on.

If you buy the analogy that is generating functions with base , then all the olympiad generating functions you might be used to have -adic analogs. For example, you can prove more generally that:

**Theorem 19** **(Generalized binomial theorem)**

If and , then for any we have the series convergence

(I haven’t defined , but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of . I won’t elaborate too much on this now, since -adic exponentiation will be described in much more detail in the next post.

Note that the definition of could have been given for as well; we didn’t need to introduce it (after all, we have in olympiads already). The big important theorem I must state now is:

**Theorem 20** **( is complete)**

The space is the completion of with respect to .

This is the definition of you’ll see more frequently; one then defines in terms of (rather than vice-versa) according to

(Remark for experts: is a field with a non-Arcihmedian valuation; then is its valuation ring.)

Let me justify why this definition is philosophically nice.

Suppose you are a numerical analyst and you want to estimate the value of the sum

to within . The sum consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it *really helps* if you know about the real numbers: because then you can construct the infinite series , and deduce that , up to some small error term from the terms past , which can be bounded.

Of course, in order to have access to enough theory to prove that , you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space , where e.g. the sequence , , , , \dots is considered “not convergent” because . Instead, all analysis is done in the completion of , namely .

Now suppose you are an olympiad contestant and want to estimate the sum

to within mod (i.e. to within in ). Even though is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod ). But the space is not complete with respect to either, and thus it makes sense to work in the completion of with respect to . This is exactly .

Let’s finally solve Example~1, which asks to compute

Armed with the generalized binomial theorem, this becomes straightforward.

Using the elementary facts that and , this solves the problem.

]]>

Some of the stuff covered in this handout:

- Advice for constructing the triangle centers (hint: circumcenter goes first)
- An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
- Some mechanical suggestions such as dealing with phantom points
- Some examples of computer-generated figures

Enjoy.

]]>I worry about my geometry book. To explain why, let me tell you a story.

When I was in high school about six years ago, barycentric coordinates were nearly unknown as an olympiad technique. I only heard about it from whispers in the wind from friends who had heard of the technique and thought it might be usable. But at the time, there were nowhere where everything was written down explicitly. I had a handful of formulas online, a few helpful friends I can reach out to, and a couple example posts littered across some forums.

Seduced by the possibility of arcane power, I didn’t let this stop me. Over the spring of 2012, spring break settled in, and I spent that entire week developing the entire theory of barycentric coordinates from scratch. There were no proofs I could find online, so I had to personally reconstruct all of them. In addition, I set out to finding as many example problems as I could, but since no one had written barycentric solutions yet, I had to not only identify which problems like they might be good examples but also solve them myself to see if my guesses were correct. I even managed to prove a “new” theorem about perpendicular displacement vectors (which I did not get to name after myself).

I continued working all the way up through the summer, adding several new problems that came my way from MOP 2012. Finally, I posted a rough article with all my notes, examples, and proofs, which you can still find online. I still remember this as a sort of *magnus opus* from the first half of high school; it was an immensely rewarding learning experience.

Today, all this and much more can be yours for just $60, with any major credit or debit card.

Alas, my geometry book is just one example of ways in which the math contest scene is looking more and more like an industry. Over the years, more and more programs dedicated to training for competitions are springing up, and these programs can be quite costly. I myself run a training program now, which is even more expensive (in my defense, it’s one-on-one teaching, rather than a residential camp or group lesson).

It’s possible to imagine a situation in which the contest problems become more and more routine. In that world, math contests become an arms race. It becomes mandatory to have training in increasingly obscure techniques: everything from Popoviciu to Vieta jumping to rectangular circumhyperbolas. Students from less well-off families, or even countries without access to competition resources, become unable to compete, and are pushed to the bottom of the IMO scoreboard.

(Fortunately for me, I found out at the 2017 IMO that my geometry book actually *helped* level the international playing field, contrary to my initial expectations. It’s unfortunate that it’s not free, but it turned out that many students in other countries had until then found it nearly impossible to find suitable geometry materials. So now many more people have access to a reasonable geometry reference, rather than just the top countries with well-established training.)

The first approximation you might have now is that training is bad. But I think that’s the wrong conclusion, since, well, I have an entire previous post dedicated to explaining what I perceive as the benefits of the math contest experience. So I think **the conclusion is not that training is intrinsically bad, but rather than training must be meaningful**. That is, the students have to gain something from the experience that’s not just a +7 bonus on their next olympiad contest.

I think the message “training is bad” might be even more dangerous.

Imagine that the fashion swings the other way. The IMO jury become alarmed at the trend of train-able problems, and in response, the problems become designed specifically to antagonize trained students. The entire Geometry section of the IMO shortlist ceases to exist, because some Asian kid wrote this book that gives you too much of an advantage if you’ve read it, and besides who does geometry after high school anyways? The IMO 2014 used to be notable for having three combinatorics problems, but by 2040 the norm is to have four or five, because everyone knows combinatorics is harder to train for.

Gradually, the IMO is redesigned to become an IQ test.

The changes then begin to permeate down. The USAMO committee is overthrown, and USAMO 2050 features six linguistics questions “so that we can find out who can actually think”. Math contests as a whole become a system for identifying the best genetic talent, explicitly aimed at weeding out the students who have “just been trained”. It doesn’t matter how hard you’ve worked; we want “creativity”.

This might be great at identifying the best mathematicians each generation, but I think an IMO of this shape would be actively destructive towards the contestants and community as well. You thought math contests were bad because they’re discouraging to the kids who don’t win? What if they become redesigned to make sure that *you can’t improve your score no matter how hard you work*?

What this means is that we have a balancing act to maintain. We do not want to eliminate the role of training entirely, because the whole point of math contests is to have a learning experience that lasts longer than the two-day contest every year. But at the same time, we need to ensure the training is interesting, that it is deep and teaches skills like the ones I described before.

Paying $60 to buy a 300-page PDF is not meaningful. But spending many hours to work through the problems in that PDF might be.

In many ways this is not a novel idea. If I am trying to teach a student, and I give them a problem which is too easy, they will not learn anything from it. Conversely, if I give them a problem which is too difficult, they will get discouraged and are unlikely to learn much from their trouble. The situation with olympiad training feels the same.

This applies to the way I think about my teaching as well. I am always upset when I hear (as I have) things like “X only did well on USAMO because of Evan Chen’s class”. If that is true, then all I am doing is taking money as input and changing the results of a zero-sum game as output, which is in my opinion rather pointless (and maybe unethical).

But I really think that’s not what’s happening. Maybe I’m a good teacher, but at the end of the day I am just a guide. If my students do well, or even if they don’t do well, it is because they spent many hours on the challenges that I designed, and have learned a lot from the whole experience. The credit for any success thus lies solely through the student’s effort. And that experience, I think, is certainly not zero-sum.

]]>It’s not uncommon for technical books to include an admonition from the author that readers must do the exercises and problems. I always feel a little peculiar when I read such warnings. Will something bad happen to me if I don’t do the exercises and problems? Of course not. I’ll gain some time, but at the expense of depth of understanding. Sometimes that’s worth it. Sometimes it’s not.

— Michael Nielsen, Neural Networks and Deep Learning

I spent the first few days of my recent winter vacation transitioning all the problem sets for my students from a “traditional” format to a “point-based” format. Here’s a before and after.

Technical specification:

- The traditional problem sets used to consist of a list of 6-9 olympiad problems of varying difficulty, for which you were expected to solve all problems over the course of two weeks.
- The new point-based problem sets consist of 10-15 olympiad problems, each weighted either 2, 3, 5, or 9 points, and an explicit target goal for that problem set. There’s a spectrum of how many of the problems you need to solve depending on the topic and the version (I have multiple difficulty versions of many sets), but as a rough estimate the goal is maybe 60%-75% of the total possible points on the problem set. Usually, on each problem set there are 2-4 problems which I think are especially nice or important, and I signal this by coloring the problem weight in red.

In this post I want to talk a little bit about what motivated this change.

I guess for historical context I’ll start by talking about why I *used* to have a traditional format, although I’m mildly embarrassed at now, in hindsight.

When I first started out with designing my materials, I was actually basically always *short* on problems. Once you really get into designing olympiad materials, good problems begin to feel like tangible goods. Most problems I put on a handout are ones I’ve done personally, because otherwise, how are you supposed to know what the problem is like? This means I have to actually solve the problem, type up solution notes, and then decide how hard it is and what that problem teaches. This might take anywhere from 30 minutes to the entire afternoon, *per problem*. Now imagine you need 150 such problems to run a year’s curriculum, and you can see why the first year was so stressful. (I was very fortunate to have paid much of this cost in high school; I still remember many of the problems I did back as a student.)

So it seemed like a waste if I spent a lot of time vetting a problem and then my students didn’t do it, and as practical matter I didn’t have enough materials yet to have much leeway anyways. I told myself this would be fine: after all, if you couldn’t do a problem, all you had to do was tell me what you’ve tried, and then I’d walk you through the rest of it. So there’s no reason why you couldn’t finish the problem sets, right? (Ha. Ha. Ha.)

Now my problem bank has gotten much deeper, so I don’t have that excuse anymore. [1]

But I’ll tell you now that even before I decided to switch to points, one of the biggest headaches was always whether to add in that an eighth problem that was really nice but also difficult. (When I first started teaching, my problem sets were typically seven problems long.) If you looked at the TeX source for some of my old handouts, you’ll see lots of problems commented out with a line saying “too long already”.

Teaching OTIS made me appreciate the amount of power I have on the other side of a mentor-student relationship. Basically, when I design a problem set, I am making decisions on behalf of the student: “these are the problems that I think you should work on”. Since my kids are all great students that respect me a lot, they will basically do whatever I tell them to.

That means I used to spend many hours agonizing over that eighth problem or whether to punt it. Yes, they’ll learn a lot if they solve (or don’t solve) it, but it will also take them another two or three hours on top of everything else they’re already doing (OTIS, school, trumpet, track, dance, social, blah blah blah). Is it worth those extra hours? Is it not? I’ve lost sleep over whether I made the right choice on the nights I ended up adding that last hard problem.

But in hindsight the right answer all along was to just let the students decide for themselves, because unlike your average high-school math teacher in a room of decked-out slackers, I have the best students in the world.

As I got a deeper database this year and commented more problems out, I started thinking about point-based problem sets. But I can tell you the exact moment when I decided to switch.

On the morning of Sunday November 5, I had a traditional problem set on my desk next to a point-based one. In both cases I had figured out how to do about half the problems required. I noticed that the way the half-full glass of water looked was quite different between them. In the first case, I was freaking out about the other half of the problems I hadn’t solved yet. In the second case, I was trying to decide which of the problems would be the most fun to do next.

Then I realized that OTIS was running on the traditional system, and what I had been doing to my students all semester! So instead of doing either problem set I began the first prototypes of the points system.

I’m worried I’ll get misinterpreted as arguing that students shouldn’t work hard. This is not really the point. If you read the specification at the beginning carefully, the number of problems the students are solving is actually roughly the same in both systems.

It might be more psychological than anything else: **I want my kids to count how many problems they’ve solved, not how many problems they haven’t solved**. Every problem you solve makes you better. Every problem you try and don’t solve makes you better, too. But a problem you didn’t have time to try doesn’t make you worse.

I’ll admit to being *mildly pissed off* at high school for having built this particular mindset into all my kids. The straight-A students sitting in calculus BC aren’t counting how many questions they’ve answered correctly when checking grades. They’re counting how many points they lost. The implicit message is that if you don’t do nearly all the questions, you’re a *bad person* because you didn’t try hard enough and you *won’t learn anything this way* and *shame on you* and…

That can’t possibly be correct. Imagine two calculus teachers A and B using the same textbook. Teacher A assigns 15 questions of homework a week, teacher B assigns 25 questions. All of teacher A’s students are *failing* by B’s standards. Fortunately, that’s not actually how the world works.

For this reason I’m glad that all the olympiad kids report their performance as “I solved problems 1,2,4,5” rather than “I missed problems 3,6”.

The other wrong assumption I had about traditional problem sets was the bit about asking for help on problems you can’t solve. It turns out getting students to ask for help is a struggle. So one other hope is that with the point-based system is that if a student tries a problem, can’t solve it, and is too shy to ask, then they can switch to a different problem and read the solution later on. No need to get me involved with every single missed problem any more.

But anyways I have a hypothesis why asking for help seems so hard (though there are probably other reasons too).

You’ve all heard the teachers who remind students to always ask questions during lectures [2], because it means someone else has the same question. In other words: don’t be afraid to ask questions just because you’re afraid you’ll look dumb, because “**there are no stupid questions**“.

But I’ve **rarely heard anyone say the same thing about problem sets**.

As I’m writing this, I realize that this is actually the reason I’ve never been willing to go to office hours to ask my math professors for help on homework problems I’m stuck on. It’s *not* because I’m worried my professors will think I’m dumb. It’s because **I’m worried they’ll think I didn’t try hard enough** before I gave up and came to them for help, or even worse, that I just care about my grade. You’ve all heard the freshman biology TA’s complain about those kids that just come and ask them to check all their pset answers one by one, or that come to argue about points they got docked, or what-have-you. I didn’t want to be that guy.

Maybe this shaming is intentional if the class you’re teaching is full of slackers that don’t work unless you crack the whip. [3] But if you are teaching a math class that’s half MOPpers, I *seriously* don’t think we need guilt-trips for these kids whenever they can’t solve a USAMO3.

So for all my students, here’s my version of the message: **there are no stupid questions, and there are no lazy questions**.

- The other reason I used traditional problem sets at first was that I wanted to force the students to at least try the harder problems. This is actually my main remaining concern about switching to point-based problem sets: you could in principle always ignore the 9-point problems at the end. I tried to compensate for this by either marking some 9’s in red, or else making it difficult to reach the goal without solving at least one 9. I’m not sure this is enough.
- But if my question is “I zoned out for the last five minutes because I was responding to my friends on snapchat, what just happened?”, I don’t think most professors would take too kindly. So it’s not true literally all questions are welcome in lectures.
- As an example, the 3.091 class policies document includes FAQ such as “that sounds like a lot of work, is there a shortcut?”, “but what do I need to learn to pass the tests?”, and “but I just want to pass the tests…”. Also an entire paragraph explaining why skipping the final exam makes you a terrible person, including reasons such as “how do you anything is how you do everything”, “students earning A’s are invited to apply as tutors/graders”, and “in college it’s up to you to take responsibility for your academic career”, and so on ad nauseum.

One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.

The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.

**Theorem 1** **(Arc midpoint setup for a triangle)**

Let be a triangle with circumcircle and let , , denote the arc midpoints of opposite , opposite , opposite .

Suppose we view as the unit circle in the complex plane. Then *there exist* complex numbers , , such that , , , and

Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates in the above notation.

**Lemma 2** **(The incenter is the orthocenter of opposite arc midpoints)**

Let be a triangle with circumcircle and let , , denote the arc midpoints of opposite , opposite , opposite . Then the incenter of coincides with the orthocenter of .

Unfortunately, the proof of Theorem 1 in my textbook is wrong, and I cannot find a proof online (though I hear that *Lemmas in Olympiad Geometry* has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.

I should at once warn the reader that Theorem 1 is an *existence result*, and thus must be applied carefully.

To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.

**Example 3** **(JMO 2016, by Zuming feng)**

The isosceles triangle , with , is inscribed in the circle . Let be a variable point on the arc that does not contain , and let and denote the incenters of triangles and , respectively. Prove that as varies, the circumcircle of triangle passes through a fixed point.

By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc , as seen in the figure below. One might be tempted to write , , , and assert the two incenters are and , and that the fixed point is .

This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices and may not be compatible. In fact, they *cannot* be compatible, because the arc midpoint of opposite is different from the arc midpoint of opposite .

In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of passes through the midpoint of arc is false if lies on the arc on the same side as ! In that case it actually passes through instead. Thus the truth of the problem really depends on the fact that the quadrilateral is *convex*, and any attempt with complex numbers must take this into account to have a chance of working.

Fix now, so we require , , . There are choices of square roots , , we can take (differing by a sign); we wish to show one of them works.

We pick an arbitrary choice for first. Then, of the two choices of , we pick the one such that . Similarly, for the two choices of , we pick the one such that . Our goal is to show that under these conditions, we have again.

The main trick is to now consider the arc midpoint , which we denote by . It is easy to see that:

**Lemma 4** **(The isosceles trapezoid trick)**

We have (both are perpendicular to the bisector). Thus is an isosceles trapezoid, and so .

Thus, we have

Thus

as desired.

From this we can see why the minus signs are necessary.

We now return to the setting of a convex quadrilateral that we encountered in Example 3. Suppose we preserve the variables , , that we were given from Theorem 1, but now add a fourth complex number with . How are the new arc midpoints determined? The following theorem answers this question.

**Theorem 6** **( setup)**

Let be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers , , , such that , , , and:

- The opposite arc midpoints , , of triangle are given by , , , as before.
- The midpoint of arc not including or is given by .
- The midpoint of arc not including or is given by .
- The midpoint of arc is and the midpoint of arc is .

This setup is summarized in the following figure.

Note that unlike Theorem 1, the four arcs cut out by the sides of do not all have the same sign (I chose to have coordinates ). This asymmetry is inevitable (see if you can understand why from the proof below).

*Proof:* We select , , with Theorem 1. Now, pick a choice of such that is the arc midpoint of not containing and . Then the arc midpoint of not containing or is given by

On the other hand, the calculation of for the midpoint of follows by applying Lemma 4 again. (applied to triangle ). The midpoint of is computed similarly.

In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.

**Corollary 7**

Let be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers , , , such that , , , and:

- The midpoints of , , , cut out by the sides of are , , , .
- The midpoints of and are and .
- The midpoints of and are and .

To test the newfound theorem, here is a cute easy application.

**Example 8** **(Japanese theorem for cyclic quadrilaterals)**

In a cyclic quadrilateral , the incenters of , , , are the vertices of a rectangle.

Median Putnam contestants, willing to devote one of the last Saturdays before final exams to a math test, are likely to receive an advanced degree in the sciences. It is counterproductive on many levels to leave them feeling like total idiots.

— Bruce Reznick, “Some Thoughts on Writing for the Putnam”

Last February I made a big public apology for having caused one of the biggest scoring errors in HMMT history, causing a lot of changes to the list of top individual students. Pleasantly, I got some nice emails from coaches who reminded me that most students and teams do not place highly in the tournament, and at the end of the day the most important thing is that the contestants enjoyed the tournament.

So now I decided I have to apologize for 2016, too.

The story this time is that I inadvertently sent over 100 students home having solved two or fewer problems total, out of 30 individual problems. That year, I was the problem czar for HMMT February 2016, and like many HMMT problem czars before me, had vastly underestimated the difficulty of my own problems.

I think stories like this are a lot worse than people realize; contests are supposed to be a learning experience for the students, and if a teenager shows up to Massachusetts and spends an entire Saturday feeling hopeless for the entire contest, then the flight back to California is going to feel very long. Now imagine having 100 students go through this every single February.

So today I’d like to say a bit about things I’ve picked up since then that have helped me avoid making similar mistakes. I actually think people generally realize that HMMT is too hard, but are wrong about how this should be fixed. In particular, **I think the common approach (and the one I took) of “make problem 1 so easy that almost nobody gets a zero” is wrong**, and I’ll explain here what I think should be done instead.

I think just “easy” is the wrong way to think about the beginning problems. At ARML, the problem authors use a finer distinction which I really like:

- A problem is
**gettable**if nearly every contestant feels like they*could have*gotten the problem on a good day. (In particular, problems that require knowledege that not all contestants have are not gettable, even if they are easy with it.) - A problem is a
**gimme**if nearly every contestant actually solves the problem on the contest.

The consensus is always that **the early problems should be gettable but not gimme’s**. You could start every contest by asking the contestant to compute the expected value of 7, but the contestants are going to notice, and it isn’t going to help anyone.

(I guess I should make the point that in order for a problem to be a “gimme”, it would have to be so easy to be almost insulting, because high accuracy on a given problem is really only possible if the level of the problem is significantly below the level of the student. So a gimme would have to be a problem that is way easier than the level of the weakest contestant — you can see why these would be bad.)

In contrast, with a gettable problem, even though some of the contestants will miss it, they’ll often miss it for a reason like 2+3=6. This is a bit unfortunate, but it is still a lot better if the contestant goes home thinking “I made a small arithmetic error, so I have to be more careful” than “there’s no way I could have gotten this, it was hopeless”.

But that brings to me to the next point:

At the IMO, there are two easy problems (one each day), but there are only six problems. So a full one-third of the problems are gettable: we hope that most students attending the IMO can solve either IMO1 or IMO4, even though many will not solve both.

If you are writing HMMT or some similar contest, I think this means **you should think about the opening in terms of the fraction 1/3**, rather than problem 1. For example, at HMMT, I think the czars should strive instead to make the first three or four out of ten problems on each individual test *gettable*: they should be problems every contestant *could* solve, even though some of them will still miss it anyways. Under the pressure of contest, students are going to make all sorts of mistakes, and so it’s important that there are multiple gettable problems. This way, **every student has two or three or four real chances to solve a problem**: they’ll still miss a few, but at least they feel like they could do something.

(Every year at HMMT, when we look back at the tests in hindsight, the first reflex many czars have is to look at how many people got 0’s on each test, and hope that it’s not too many. The fact that this figure is even worth looking at is in my opinion a sign that we are doing things wrong: is 1/10 any better than 0/10, if the kid solved question 1 quickly and then spent the rest of the hour staring at the other nine?)

The other thing I want to say is to spend some time thinking about the entire test as a whole, rather than about each problem individually.

To drive the point: I’m willing to bet that **an HMMT individual test with 4 easy, 6 medium, and 0 hard problems could actually work**, even at the top end of the scores. Each medium problem in isolation won’t distinguish the strongest students. But put six of them all together, and you get two effects:

- Students will make mistakes on some of the problems, and by central limit theorem you’ll get a curve anyways.
- Time pressure becomes significantly more important, and the strongest students will come out ahead by simply being faster.

Of course, I’ll never be able to persuade the problem czars (myself included) to not include at least one or two of those super-nice hard problems. But the point is that they’re not actually needed in situations like HMMT, when there are so many problems that it’s hard to *not* get a curve of scores.

One suggestion many people won’t take: if you really want to include some difficulty problems that will take a while, decrease the length of the test. If you had 3 easy, 3 medium, and 1 hard problem, I bet that could work too. One hour is really not very much time.

Actually, this has been experimentally verified. On my HMMT 2016 Geometry test, nobody solved any of problems 8-10, so the test was essentially seven problems long. The gradient of scores at the top and center still ended up being okay. The only issue was that a third of the students solved zero problems, because the easy problems were either error-prone, or else were hit-or-miss (either solved quickly or not at all). Thus that’s another thing to watch out for.

]]>In high school I used to think that math contests were primarily meant to encourage contestants to study some math that is (much) more interesting than what’s typically shown in high school. While I still think this is one goal, and maybe it still is the primary goal in some people’s minds, I no longer believe this is the primary benefit.

My current belief is that there are two major benefits from math competitions:

- To build a social network for gifted high school students with similar interests.
- To provide a challenging experience that lets gifted students grow and develop intellectually.

I should at once disclaim that I do not claim these are the *only* purpose of mathematical olympiads. Indeed, mathematics is a beautiful subject and introducing competitors to this field of study is of course a great thing (in particular it was life-changing for me). But as I have said before, many alumni of math olympiads do not eventually become mathematicians, and so in my mind I would like to make the case that these alumni have gained a lot from the experience anyways.

Now that we have email, Facebook, Art of Problem Solving, and whatnot, the math contest community is much larger and stronger than it’s ever been in the past. For the first time, it’s really possible to stay connected with other competitors throughout the entire year, rather than just seeing each other a handful of times during contest season. There’s literally group chats of contestants all over the country where people talk about math problems or the solar eclipse or share funny pictures or inside jokes or everything else. In many ways, being part of the high school math contest community is a lot like having access to the peer group at a top-tier university, except four years earlier.

There’s some concern that a competitive culture is unhealthy for the contestants. I want to make a brief defense here.

I really do think that the contest community is good at being collaborative rather than competitive. You can imagine a world where the competitors think about contests in terms of trying to get a better score than the other person. [1] That would not be a good world. But I think by and large the community is good at thinking about it as just trying to maximize their own score. The score of the person next to you isn’t supposed to matter (and thinking about it doesn’t help, anyways).

Put more bluntly, on contest day, you have one job: *get full marks*. [2]

Because we have a culture of this shape, we now get a group of talented students all working towards the same thing, rather than against one another. That’s what makes it possible to have a self-supportive community, and what makes it possible for the contestants to really become friends with each other.

I think the strongest contestants don’t even care about the results of contests other than the few really important ones (like USAMO/IMO). It is a long-running joke that the Harvard-MIT Math Tournament is secretly just a MOP reunion, and I personally see to it that this happens every year. [3]

I’ve also heard similar sentiments about ARML:

I enjoy ARML primarily based on the social part of the contest, and many people agree with me; the highlight of ARML for some people is the long bus ride to the contest. Indeed, I think of ARML primarily as a social event, with some mathematics to make it look like the participants are actually doing something important.

(Don’t tell the parents.)

My view is that if you spend a lot of time thinking or working about anything deep, then you will learn and grow from the experience, almost regardless of what that thing is at an object level. Take chess as an example — even though chess definitely has even fewer “real-life applications” than math, if you take anyone with a 2000+ rating I don’t think many of them would think that the time they invested into the game was wasted. [4]

Olympiad mathematics seems to be no exception to this. In fact the sheer depth and difficulty of the subject probably makes it a particularly good example. [5]

I’m now going to fill this section with a bunch of examples although I don’t claim the list is exhaustive. First, here are the ones that everyone talks about and more or less agrees on:

- Learning
**how to think**, because, well, that’s how you solve a contest problem. - Learning to
**work hard**and**not give up**, because the contest is difficult and you will not win by accident; you need to actually go through a lot of training. - Dual to above,
**learning to give up**on a problem, because sometime the problem really is too hard for you and you won’t solve it even if you spend another ten or twenty or fifty hours, and you have to learn to cut your losses. There is a balancing act here that I think really is best taught by experience, rather than the standard high-school moral cheerleading where you are supposed to “never give up” or something. - But also learning to
**be humble**or to**ask for help**, which is a really hard thing for a lot of young contestants to do. - Learning to
**be patient**, not only with solving problems but with the entire journey. You usually do not improve dramatically overnight.

Here are some others I also believe, but don’t hear as often.

- Learning to
**be independent**, because odds are your high-school math teacher won’t be able to help you with USAMO problems. Training for the highest level of contests is these days almost always done more or less independently. I think having the self-motivation to do the training yourself, as well as the capacity to essentially have to design your own training (making judgments on what to work on, et cetera) is itself a valuable cross-domain skill. (I’m a little sad sometimes that by teaching I deprive my students of the opportunity to practice this. It is a cost.) - Being able to
**work neatly**, not because your parents told you to but because if you are sloppy then it will cost you points when you make small (or large) errors on IMO #1. Olympiad problems are difficult enough as is, and you do not want to let them become any harder because of your own sloppiness. (And there are definitely examples of olympiad problems which are impossible to solve if you are not organized.) - Being able to
**organize and write your thoughts well**, because some olympiad problems are complex and requires putting together more than one lemma or idea together to solve. For this to work, you need to have the skill of putting together a lot of moving parts into a single coherent argument. Bonus points here if your audience is someone you care about (as opposed to a grader), because then you have to also worry about making the presentation as clean and natural as possible.These days, whenever I solve a problem I always take the time to write it up cleanly, because in the process of doing so I nearly always find ways that the solution can be made shorter or more elegant, or at least philosophically more natural. (I also often find my solution is wrong.) So it seems that the write-up process here is not merely about presenting the same math in different ways: the underlying math really does change. [6]

**Thinking about how to learn.**For example, the Art of Problem Solving forums are often filled with questions of the form “what should I do?”. Many older users find these questions obnoxious, but I find them desirable. I think being able to spend time pondering about what makes people improve or learn well is a good trait to develop, rather than mindlessly doing one book after another.Of course, many of the questions I referred to are poor, either with no real specific direction: often the questions are essentially “what book should I read?”, or “give me a exhaustive list of everything I should know”. But I think this is inevitable because these are people’s first attempts at understanding contest training. Just like the first difficult math contest you take often goes quite badly, the first time you try to think about learning, you will probably ask questions you will be embarrassed about in five years. My hope is that as these younger users get older and wiser, the questions and thoughts become mature as well. To this end I do not mind seeing people wobble on their first steps.

- Being
**honest with your own understanding**, particularly of fundamentals. When watching experienced contestants, you often see people solving problems using advanced techniques like Brianchon’s theorem or the n-1 equal value principle or whatever. It’s tempting to think that if you learn the names and statements of all these advanced techniques then you’ll be able to apply them too. But the reality is that these techniques are advanced for a reason: they are hard to use without mastery of fundamentals.This is something I definitely struggled with as a contestant: being forced to patiently learn all the fundamentals and not worry about the fancy stuff. To give an example, the 2011 JMO featured an inequality which was routine for experienced or well-trained contestants, but “almost impossible for people who either have not seen inequalities at all or just like to compile famous names in their proofs”. I was in the latter category, and tried to make up a solution using multivariable Jensen, whatever that meant. Only when I was older did I really understand what I was missing.

- Dual to the above, once you begin to master something completely you start to
**learn what different depths of understanding feel like**, and an appreciation for just how much effort goes into developing a mastery of something. - Being able to
**think about things which are not well-defined**. This one often comes as a surprise to people, since math is a field which is known for its precision. But I still maintain that this a skill contests train for.A very simple example is a question like, “when should I use the probabilistic method?”. Yes, we know it’s good for existence questions, but can we say anything more about when we expect it to work? Well, one heuristic (not the only one) is “if a monkey could find it” — the idea that a randomly selected object “should” work. But obviously something like this can’t be subject to a (useful) formal definition that works 100% of the time, and there are plenty of contexts in which even informally this heuristic gives the wrong answer. So that’s an example of a vague and nebulous concept that’s nonetheless necessary in order to understanding the probabilistic method well.

There are much more general examples one can say. What does it mean for a problem to “feel projective”? I can’t tell you a hard set of rules; you’ll have to do a bunch of examples and gain the intuition yourself. Why do I say this problem is “rigid”? Same answer. How do you tell which parts of this problem are natural, and which are artificial? How do you react if you have the feeling the problem gives you nothing to work with? How can you tell if you are making progress on a problem? Trying to figure out partial answers to these questions, even if they can’t be put in words, will go a long way in improving the mythical intuition that everyone knows is so important.

It might not be unreasonable to say that by this point we are studying

**philosophy**, and that’s exactly what I intend. When I teach now I often make a point of referring to the “morally correct” way of thinking about things, or making a point of explaining why X*should*be true, rather than just providing a proof. I find this type of philosophy interesting in its own right, but that is not the main reason I incorporate it into my teaching. I teach the philosophy now*because it is necessary*, because you will solve fewer problems without that understanding.

But I think the most surprising benefit of math contests is that most participants won’t win. In high school everyone tells you that if you work hard you will succeed. The USAMO is a fantastic counterexample to this. Every year, there are exactly 12 winners on the USAMO. I can promise you there are far more than 12 people who work very hard every year with the hope of doing well on the USAMO. Some people think this is discouraging, but I find it desirable.

Let me tell you a story.

Back in September of 2015, I sneaked in to the parent’s talk at Math Prize for Girls, because Zuming Feng was speaking and I wanted to hear what he had to say. (The whole talk was is available on YouTube now.) The talk had a lot of different parts that I liked, but one of them struck me in particular, when he recounted something he said to one of his top students:

I really want you to work hard, but I really think if you don’t do well, if you fail, it’s better to you.

I had a hard time relating to this when I first heard it, but it makes sense if you think about it. What I’ve tried to argue is that the benefit of math contests is not that the contestant can now solve N problems on USAMO in late April, but what you gain from the entire year of practice. And so if you *hold the other 363 days fixed*, and then vary only the final outcome of the USAMO, which of success and failure is going to help a contestant develop more as a person?

For that reason I really like to think that the final lesson from high school olympiads is how to appreciate the entire journey, even in spite of the eventual outcome.

- I actually think this is one of the good arguments in favor of the new JMO/USAMO system introduced in 2010. Before this, it was not uncommon for participants in 9th and 10th grade to really only aim for solving one or two entry-level USAMO problems to qualify for MOP. To this end I think the mentality of “the cutoff will probably only be X, so give up on solving problem six” is sub-optimal.
- That’s a Zuming quote.
- Which is why I think the HMIC is actually sort of pointless from a contestant’s perspective, but it’s good logistics training for the tournament directors.
- I could be wrong about people thinking chess is a good experience, given that I don’t actually have any serious chess experience beyond knowing how the pieces move. A cursory scan of the Internet suggests otherwise (was surprised to find that Ben Franklin has an opinion on this) but it’s possible there
*are*people who think chess is a waste of time, and are merely not as vocal as the people who think math contests are a waste of time. - Relative to what many high school students work on, not compared to research or something.
- Privately, I think that working in math olympiads taught me way more about writing well than English class ever did; English class always felt to me like the skill of trying to sound like I was saying something substantial, even when I wasn’t.

This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.

Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.

**Definition 1**

For nonnegative integers , …, an *-ascending permutation* is a permutation on whose descent set is contained in . In other words the permutation ascends in blocks of length , , …, , and thus has the form

for which for all .

It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.

**Definition 2**

Let denote the set of -ascending permutations which avoid the pattern .

(The reason for using will be explained later.) In particular, if .

**Example 3**

Here is a picture of a permutation in (but not in , since one can see an increasing length subsequence shaded). We would denote it .

Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:

**Theorem 4** **(Lewis 2011)**

The sets and are in bijection with Young tableau of shape .

**Remark 5**

When , this implies , which is the set of -avoiding permutations of length , is in bijection with the Catalan numbers; so is which is the set of -avoiding *zig-zag* permutations.

Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix and ‘s. To simplify notation,

**Definition 6**

Let . Then denotes where

**Theorem 7** **(Mei, Wang 2017)**

The sets are also in bijection with Young tableau of shape .

The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:

**Problem
**

Find a direct bijection between the sets above, not involving the RSK correspondence.

This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)

At this point I should pause to mention that this notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where for each , and so it was unnecessary to think about other possibilities for : Mei and Wang’s paper use the notation . So while I’ll continue to use the notation in the blog post for readability, it will make some of the steps more obvious than they actually were.

Mei and Wang’s paper originally suggested that rather than finding a bijection for any and , it would suffice to biject

and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.

As an example, they show how to do this bijection with and . Indeed, suppose . Then is an increasing sequence of length right at the start of . So had better be the largest element in the permutation: otherwise later in the biggest element would complete an ascending permutation of length already! So removing gives a bijection between .

But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:

**Proposition 8**

Suppose . Then there is a bijection

by deleting the st element of the permutation (which must be largest one).

Once I found this proposition I rejected the initial suggestion of specializing . The “easy case” I had found told me that I could take a set and delete the single element from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections whee and were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.

I had a lucky break of wishful thinking here. In the notation with , I had found that one could replace with either or freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?

There were two immediate possibilities that came to my mind.

**Deletion**: We already showed could be changed from to for any . If we can do a similar deletion with for any , not just , then we would be done.**Swapping**: If we can show that two adjacent ‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent ‘s, but that would cause more disruption for no extra benefit.)

Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)

Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.

(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)

Yeah, I cheated. Sorry.

Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.

Here’s the plan: let . Then using a C++ computer program:

- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .

If the deletion approach is right, then I would hope and look pretty similar. On the flip side, if the swapping approach is right, then and should look close to each other instead.

It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.

Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length permutation takes time, and there are or so permutations. But when , we have , which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.

The solution? Use instead.

In a deep irony that I didn’t realize at the time, it was this moment when I introduced the notation, and for the first time allowed the to not be in . My reasoning was that since I was only doing this for heuristic reasons, I could instead work with and probably not change much about the structure of the problem, while replacing , which would run times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.

And so the new plan was:

- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .

I admit I never actually ran the enumeration with , because the route with and turned out to be even more promising than I expected. When I compared the empirical data for the sets and , I found that the number of permutations with any particular triple were equal. In other words, the **outer blocks were preserved**: the bijection

does not tamper with the outside blocks of length and .

This meant I was ready to make the following conjecture. Suppose , . There is a bijection

which only involves rearranging the elements of the th and st blocks.

At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.

Let’s call this mythical bijection . How could I figure out what it was?

Let me quickly introduce a definition.

**Definition 9**

We say two words and are *order-isomorphic* if if and only . Then order-isomorphism gives equivalence classes, and there is a canonical representative where the letters are ; this is called a *reduced* word.

**Example 10**

The words , and are order-isomorphic; the last is reduced.

Now I guessed one more property of : this should order-isomorphism.

What do I mean by this? Suppose in one context changed to ; then we would expect that in another situation we should have changing to . Indeed, we expect (empirically) to not touch surrounding outside blocks, and so it would be very strange if behaved differently due to far-away numbers it wasn’t even touching.

So actually I’ll just write

for this example, reducing the words in question.

With this hunch it’s possible to cheat with C++ again. Here’s how.

Let’s for concreteness suppose and the particular sets

Well, it turns out if you look at the data:

- The only element of which starts with and ends with is .
- The only element of which starts with and ends with is .

So that means that is changed to . Thus the empirical data shows that

In general, it might not be that clear cut. For example, if we look at the permutations starting with and , there is more than one.

- and are both in .
- and are both in in .

Thus

but we can’t tell which one goes to which (although you might be able to guess).

Fortunately, there is *lots of data*. This example narrowed down to two values, but if you look at other places you might have different data on . Since we think is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, is a bijection, so once we match either of or , we’ve matched the other.

You know what this sounds like? Perfect matchings.

So here’s the experimental procedure.

- Enumerate all permutations in and .
- Take each possible tuple , and look at the permutations that start and end with those particular four elements. Record the reductions of and for all these permutations. We call these
*input words*and*output words*, respectively. Each output word is a “candidate” of for a input word. - For each input word that appeared, take the intersection of all output words that appeared. This gives a bipartite graph , with input words being matched to their candidates.
- Find perfect matchings of the graph.

And with any luck that would tell us what is.

Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.

246|1357 => 2467|135 247|1356 => 2457|136 256|1347 => 2567|134 257|1346 => 2357|146 267|1345 => 2367|145 346|1257 => 3467|125 347|1256 => 3457|126 356|1247 => 3567|124 357|1246 => 1357|246 367|1245 => 1367|245 456|1237 => 4567|123 457|1236 => 1457|236 467|1235 => 1467|235 567|1234 => 1567|234

If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if is on the right half, then it always moves over.

Anyways, if you stare at this for an hour, you can actually figure out the exact rule:

**Claim 11**

Given an input , move if is the largest index for which , or if no such index exists.

And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.

Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.

On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact and also had the same cardinality. Huh.

It seemed too good to be true, but I played around some more, and sure enough, the cardinality of seemed to only depend on the order of the ‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection

which only involves rearranging the elements of the th and st blocks.

It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.

I re-ran the experiment I had done before, now with . (This was interesting, because the elements in question could now have either longest increasing subsequence of length , or instead of length .)

The data I obtained was:

246|13578 => 24678|135 247|13568 => 24578|136 248|13567 => 24568|137 256|13478 => 25678|134 257|13468 => 23578|146 258|13467 => 23568|147 267|13458 => 23678|145 268|13457 => 23468|157 278|13456 => 23478|156 346|12578 => 34678|125 347|12568 => 34578|126 348|12567 => 34568|127 356|12478 => 35678|124 357|12468 => 13578|246 358|12467 => 13568|247 367|12458 => 13678|245 368|12457 => 13468|257 378|12456 => 13478|256 456|12378 => 45678|123 457|12368 => 14578|236 458|12367 => 14568|237 467|12358 => 14678|235 468|12357 => 12468|357 478|12356 => 12478|356 567|12348 => 15678|234 568|12347 => 12568|347 578|12346 => 12578|346 678|12345 => 12678|345

Okay, so it looks like:

- exactly two numbers are moving each time, and
- the length of the longest run is preserved.

Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like

while preserving the length of increasing subsequences at each step.

So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:

**Theorem 12**

For any , …, , the cardinality

does not depend on the order of the ‘s.

Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.

There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection myself: it was “already there” and I found it by examining the data. In another world where did not exist, all the creativity in the world wouldn’t have changed anything.

So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.

]]>Suppose you are a math PhD student at MIT. Officially, this “costs” $50K a year in tuition. Fortunately this number is meaningless, because math PhD students serve time as teaching assistants in exchange for having the nominal sticker price waived. MIT then provides a stipend of about $25K a year for these PhD student’s living expenses. This stipend is taxable, but it’s small and you’d pay only $1K-$2K in federal taxes (about 6%).

The new GOP tax proposal strikes 26 U.S. Code 117(d) which would cause the $50K tuition waiver to *also* become taxable income: the PhD student would pay taxes on an “income” of $75K, at tax brackets of 12% and 25%. If I haven’t messed up the calculation, for our single PhD student this means **paying $10K in federal taxes out of the same $25K stipend (about 40%)**.

I think a 40% tax rate for a PhD student is a *bit* unreasonable; the remaining $15K a year is not too far from the poverty line.

(The relevant sentence is page 96, line 20 of the GOP tax bill.)

]]>**Theorem 1** **(Cayley’s Formula)**

The number of trees on labelled vertices is .

*Proof:* We are going to construct a bijection between

- Functions (of which there are ) and
- Trees on with two distinguished nodes and (possibly ).

This will imply the answer.

Let’s look at the first piece of data. We can visualize it as points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a **directed pseudoforest**. Here is an example when .

You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.

Thus a directed pseudoforest can also be specified by

- a choice of some vertices to be in cycles (blue vertices),
- a permutation on the blue vertices (magenta arrows), and
- attachments of trees to the blue vertices (grey vertices and arrows).

Now suppose we take the same information, but replace the *permutation* on the blue vertices with a *total ordering* instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case ):

This is exactly the data of a tree on the vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide).

]]>